Automatically detect and code themes
Quickly identify broad themes in files using the Auto Code Wizard. Auto code themes to answer the question What are the key themes in my files?
Text analytics is a complex process—manual coding is always going to be more accurate.
The process
Select multiple files, nodes or cases and use the Auto Code Wizard to produce results. A node matrix is created, and content is coded to theme nodes.
NVivo analyzes your material using a language pack. Themes are identified by analyzing the content and the sentence structure within it. NVivo assigns significance to some themes over others based on how frequently each theme occurs in the material being analyzed.
The themes are combined into groups and results are presented as a node for each broad idea, with child nodes for each theme within that group.
The relevant content is coded to the theme nodes that are created. The results are summarized in a node matrix which shows the nodes for each broad idea, and the number of coding references from each file.
What makes a theme?
The process detects significant noun phrases (for example, real estate development) to identify the most frequently occurring themes.
The process collects the themes and counts their mentions across all files in the set being processed.
NVivo actively filters the themes—only the most relevant themes are presented in the results. You can choose which themes to create as nodes at the end of the process.
The automated insights process may produce different results in different languages. For example, if you have a translated version of the same file, analyzing the French version in French and the English version in English may produce different results.
Theme grouping
NVivo groups themes by comparing words with the same stem, for example house, houses and housing. It then filters the themes and excludes those groups that represent a much smaller proportion of your content.
For each group, NVivo uses the most frequently shared phrase or word as the name of the node.
Themes can belong to more than one group—for example, storm water runoff may be grouped with the theme water and the theme storm—and have the same coding references.
Collections of material
You can analyze files, nodes or cases individually or as a combination of items.
If you analyze each item individually, you'll get different results compared to analyzing a group of items together.
1 Items A, B and C mention water quality once. Even though it is only mentioned once, the process identifies water as a concept that is mentioned in the majority of items being analyzed. Water quality is therefore suggested as a theme node in the results. If any of the items A, B or C were analyzed individually, this theme may not necessarily be identified as a theme. For example, if you are reviewing submissions on the same issue, and want to see what overall themes are detected, wait until you receive all the submissions and then process them together as a group.
2 Item D mentions tourism industry several times, but this will not be suggested as a theme node in the results because even though it is mentioned several times, it is only in one file in the group being analyzed. If you ran Item D individually, the theme tourism industry may be identified as a theme.
- In List View, select the nodes, cases or text files you want to auto code. Text files include documents, PDFs, datasets and transcripts—they do not have to be of the same type but they should be the same language.
NOTE If you want to select items from different folders, use the Search Folder All Files or All Nodes.
- On the Home tab, in the Coding group, click Auto Code then follow the steps on the Wizard.
Wizard step | Description |
---|---|
Choose how you would like to auto code |
Click Identify themes. You may be prompted to download and install additional files. Make sure your project text content language is set to the language of the sources you are analyzing. |
Identifying themes |
NVivo analyzes your files for commonly occurring themes. The identified themes are grouped into proposed nodes and child nodes, and sorted according to number of mentions in the files. Click the Expand buttons to view the child nodes for each theme. Choose the nodes you want to create—by default, all nodes are selected. Clear the checkboxes for the nodes you don't want created. You can merge and delete nodes later. Reorder and organize nodes |
Select how your text passages will be coded |
Choose how finely NVivo should code text passages:
The results are displayed as a node matrix in Detail View—and saved in the Node Matrices folder. The created nodes are stored in a folder called Autocoded Themes under the Nodes folder. |
Specify a location for the results |
The wizard displays your existing node structure. Select a location in the structure to place the auto created theme nodes. Click Create folder or Create node to create a new location. |
Work with the results of auto coded themes
When you auto code themes, the results are displayed in Detail View and saved as a node matrix in the Node Matrices folder. You can view the saved node matrix later if you want a record of the coding performed by the Wizard at a particular date and time. This node matrix is a static record that is not updated if you subsequently uncode some of the content.
Each time you run the auto code themes process, the results are merged with any existing auto coded themes. If you want to keep the themes separate, rename the Autocoded Themes folder before you run the Auto Code Wizard.
1Columns display the names of the parent nodes that have been coded by the Wizard.
2Rows display the files that have been coded by the Wizard.
3 Cells display the number of coding references that were created for a file (row) at a node (column). You can change the display, for example transpose the columns and rows—click Transpose in the View group, on the Matrix tab.
4 Click the Chart tab to see a visual representation of the auto coding results.
NOTE The number of node references for a file displayed in the Nodes column in List View includes the node matrix.
How can I identify theme coding references?
Auto coded theme references that were created by the Wizard are associated with the user profile 'NVivo' with the initials 'NV'.
If you have performed multiple theme auto coding operations, you will not be able to distinguish which references were created by a particular coding operation. To see the references from a particular operation, you can view the specific node matrix in the Node Matrices folder.
You can run a matrix coding query to display the coding references currently associated with the user 'NVivo'.
Review and fine-tune theme coding
Review the results to confirm that you are satisfied with the auto coding before performing other actions in your project—so that you can adjust the coding if you need to.
In the node matrix, double click a cell to see the content that was coded at the intersection of the file and node. Is the content relevant to that node? Take a look at other cells in the matrix.
Decide whether you are satisfied with the results—do you want to keep some of the coding or undo the entire auto coding operation?
If you are mostly satisfied with the results, but need to fine-tune some of the auto coding, you may want to uncode some of the references. Note that the coding reference is still displayed in the node matrix cell, even if you have uncoded it.
If you are not satisfied with the overall results, you may want to 'undo' the auto coding completely.
Review the nodes in the Autocoded Themes folder. You may want to refine the nodes by merging, moving or deleting some.
If you intend to run the Auto Code Wizard again, and want keep the new themes separate, rename the Autocoded Themes folder.
Why am I getting unexpected results from auto coding themes?
Auto coding themes uses machine-based algorithms to scan your files to identify themes. This is a complex task—manual coding is always going to be more accurate.
- Choose your context carefully—auto coding for themes takes your selection of files as a whole into account. You may get different results from a larger set of data than with a smaller set or indeed, with individual files.
- Understand the structure of your files—look for clear sentence and paragraph structure. If you choose to Code sentences, full stops are used to designate the end of sentences. Make sure that sentences in your files end with a full stop, including bullet lists and text in table cells within a document. Any full stops used to designate an abbreviation will be interpreted as the end of a sentence.
- Make sure each file is in one language, and that you process files of the same language together. The auto coding themes process can only detect one language at a time—and this is based on the text content language setting of your project.
- Minimize the presence of advertising or repeated content in your files. If you are working with web pages, capture only the main content on the page before importing into your project. For example, if you are capturing content from a news site, repeated headlines displayed in a sidebar may be detected as frequently occurring themes.