This feature is available in Plus edition. Learn more

Automatic coding using existing coding patterns

Automatic coding based on existing coding patterns is an experimental feature that you can test and try out. It is designed to speed up the coding process for large volumes of textual content.

Understand auto coding using existing coding patterns

Pattern-based auto coding enables you to do approximate 'broad-brush' coding of large volumes of text quickly, which you can then review and refine.

Before you use pattern-based auto coding, you need to start with manual 'pilot' coding of your files. For example, if your research involves analyzing 100 interviews, you could manually code the first 10 interviews. Then, you could auto code the remaining interviews based on the coding patterns from your initial coding.

When you auto code using existing patterns, NVivo compares each text passage—for example, sentence or paragraph—to the content already coded to existing nodes. If the content of the text passage is similar in wording to content already coded to a node, then the text passage will be coded to that node.

During pattern-based auto coding, words in the content to be coded are compared to the words in previously coded nodes. Stop words are ignored when comparing text content to existing nodes. Words with the same stem—for example house, houses and housing—are grouped together. Make sure your project text content languageChoose File > Project Properties, and click General and then select the language used in your sources. is set to the language of the sources you are analyzing.

When comparing the text passages to content coded to existing nodes, any earlier pattern-based coding is ignored, in order to preserve the quality of the coding patterns.

Coding references that have been auto coded based on existing coding patterns are associated with the user profile 'NVivo' with the initials 'NV'.

The following is an illustration of pattern-based auto coding:

Process of pattern-based auto coding via the Auto Code wizard.

NOTE

NVivo 12.1 (Windows) and later also takes the following into account when determining coding patterns:

  • Uncoded content—make sure that you have finished 'pilot' coding the files that you select to base the coding patterns on before running pattern-based coding.
  • Word pairs—for example 'water quality' or 'climate change'.

When can pattern-based coding be useful?

We expect pattern-based auto coding will be most useful when coding to descriptive nodes–for example, it may be able to identify that paragraph 1 is about housing and paragraph 2 is about public transport.

You may also be able to use pattern-based coding to identify passages that mention particular people (for example, politicians or leaders), places or organizations that are important to your research.

We do not recommend using pattern-based auto coding to code to nodes that represent:

  • Sentiment—for example, positive, negative. Use Sentiment Analysis instead.
  • Attitudes, tones, or emotions—for example, enthusiastic, sarcastic, happy
  • Interpretations of the data—for example distance to school increases the likelihood of truancy
  • The speaker in an interview transcript—it cannot accurately identify who was speaking

Strategies for auto coding based on existing coding patterns

Before you use pattern-based auto coding, you need to start with manual 'pilot' coding of your files. Here are some strategies to consider when pilot coding:

  • The quality of the pilot coding will influence the quality of the results of auto coding.
  • If your pilot coding is too small or narrowly-focused, you may get poor results (or no results).
  • Coding smaller passages—for example, a sentence—may achieve better results than coding larger passages.
  • You may achieve better results by coding only the text that is relevant to a particular theme. For example, if a paragraph contains two ideas, only code the part that is relevant to the theme.

If your project involves capturing datasets iteratively, you can pilot code the initial dataset. For example, if you capture Twitter data at multiple intervals or gather responses from SurveyMonkey with multiple collectors, then you can manually pilot code the initial data and then auto code the subsequent data.

If your files contain responses to questions on a range of topics or issues, you may get better results with pattern-based coding if you auto code the responses to one question at a time using specific thematic nodes that relate to that question. For example, if you have a dataset containing 1000 responses to a survey about public policy, you could:

  1. Gather the responses into a node for each question. You can do this by auto coding a dataset at nodes for selected columns.
  2. Open the node for a particular question—for example Views on council funding—and for the first 50 responses, manually 'code on' to a group of thematic nodes (animals, libraries, parks, public health, recycling).
  3. Use pattern-based coding to auto code the question node Views on council funding to the specific thematic nodes for that question (animals, libraries, parks, public health, recycling).

The selections you make in the Auto Code Wizard can have an impact on the quality of the results.

  • Be selective about the nodes you choose to auto code to—for example, only use certain thematic nodes. Pattern-based auto coding works best with thematic nodes rather than relationships.
  • Experiment with the slider. If you choose 'Less', then NVivo applies stricter criteria when deciding whether to code the content.
  • Review the list of terms that are used to assess coding similarity. Having a targeted list of terms can improve the accuracy of pattern-based coding.

Review the results of auto coding. If you are not satisfied with the results, 'undo' the Auto Code action, or revert to a backup copy of your project. Consider doing more pilot coding or adjusting the slider in the Auto Code Wizard next time you auto code.

Auto code using existing coding patterns

IMPORTANT

  • Auto coding using existing coding patterns can perform a large amount of coding very quickly. It is a good idea to make a backup copy of the project before you start.
  • If you are working in a server project, you may want to open the project exclusively before you auto code—this ensures that you can 'undo' the auto code if you are not satisfied with the results.
  1. In List View, select the items you want to auto code. You can select files, nodes or cases. Files do not have to be of the same file type. If you want to select items from different folders, you can use a set or search folder.
  2. On the Home tab, in the Coding group, click Auto Code.
  3. Follow the steps on the Auto Code Wizard.
Wizard step Description

Choose how you would like to auto code

Click Use existing coding patterns.

Automatically code text passages

Choose what you want to code at based on the files you've already coded to by clicking Select and choosing the relevant project items for each of the following:

  • Select the nodes or relationships that you would like to code at
  • Select coded files that you want to base the coding patterns on—these files should contain the manual 'pilot' coding

Adjust the slider to influence how much coding NVivo will create, for example if you choose:

Less—NVivo uses a higher threshold to determine similarity when comparing each text passage to each node.

More—NVivo uses a lower threshold to determine similarity when comparing each text passage to each node.

Checking existing coding patterns

NVivo checks the existing coding patterns in the nodes you have selected to code at to determine their suitability for pattern-based coding.

Once the check is complete, you will receive feedback on the suitability of your nodes. If any issues are detected, NVivo displays a warning message—for example, to indicate that there is insufficient coding at a node.

You can click the Expand buttons to view the nodes for each message. Nodes with warnings will not be used to code at due to the issues detected. However, if you still want to code at a node, you can select it again in this step of the Wizard.

Select how your text passages will be coded

Choose how finely NVivo should code text passages:

  • Code sentences if you want individual sentences to be coded.
  • Code paragraphs if you want entire paragraphs to be coded
  • Code entire cell for datasets, transcripts and logs if you want entire cells to be coded for datasets, transcripts and picture logs. For other file types, entire paragraphs are coded.

NOTE

  • If your text content language is Chinese or Japanese, you will not be able to select Code sentences.
  • By default, the results are saved as a node matrix in the Node Matrices folder. You can clear the Save auto code results in the Node Matrices folder check box. However, if you clear this check box, the results will be displayed as a temporary node matrix that you will not be able to save.

Working with the results of auto coding

When you auto code using existing coding patterns, the results are displayed in Detail View, and by default, the results are saved as a node matrix in the Node Matrices folder. You can view the saved node matrix later if you want a record of the coding performed by the Wizard at a particular date and time. This node matrix is a static record that is not updated if you subsequently uncode some of the content.

Matrix displaying the results of pattern-based auto coding.

1 Columns display the names of the nodes that have been coded to by the Wizard.

2 Rows display the files that have been coded by the Wizard.

3 Cells display the number of coding references that were created for a file (row) at a node (column). You can change the display, for example transpose the columns and rows—click Transpose in the View group, on the Matrix tab.

4 Click the Chart tab to see a visual representation of the auto coding results.

Reviewing and fine-tuning auto coding

It is a good idea to review the coding to check the relevance of coding references. Pattern-based coding is a complex task—for example, the meaning of a word varies depending on the context in which it appears, so you may see coding references that are unrelated to the node.

You may want to confirm that you are satisfied with the auto coding before performing other actions in your project, so that you can undo the auto coding if you need to. The 'undo' function can reverse up to five recent actions. Alternatively, if you made a backup copy of your project prior to auto coding, then you can restore the backup copy.

Here is an approach you may want to take to reviewing the results:

  1. Review what has been coded. Double click a cell in the matrix to see the content that was coded at the intersection of the file and node. Is the content relevant to that node? Take a look at other cells in the matrix.
  2. Decide whether you are satisfied with the results—do you want to keep some of the coding or undo the entire auto coding operation?
    • If you are mostly satisfied with the results, but need to fine-tune some of the auto coding, you may want to uncode some of the references. The coding reference is still displayed in Detail View for the cell, even if you have uncoded it. If you want to view the coding references resulting from pattern-based coding—excluding uncoded references—refer to How can I identify pattern-based coding references?
    • If you are not satisfied with the overall results, you may want to 'undo' the auto coding completely. You may want to refer to Strategies for auto coding based on existing coding patterns before trying again.

Why am I getting unexpected results from pattern-based coding?

Pattern-based auto coding uses machine-learning algorithms to look for existing coding patterns in nodes you have previously coded to in your project.  The coding patterns are then used to determine further coding. You may get unexpected results because the algorithms can be influenced by:

  • The quality of the coding in your project (irrelevant content may influence the results)
  • The amount of coding in your project
  • The presence of advertising in your files. If you are working with web pages, capture only the main content on the page before importing into your project.
  • Words with multiple meanings—the meaning of a word can vary depending on the context
  • The uniqueness of the words in the nodes—if your nodes predominantly contain the same words, then it is more difficult for the algorithms to identify patterns

How can I identify pattern-based coding references?

Coding references that were created by the Wizard based on coding patterns are associated with the user profile 'NVivo' with the initials 'NV'.

If you have performed multiple pattern-based coding operations, you will not be able to distinguish which references were created by a particular pattern-coding operation. To see the references from a particular operation, you can view the specific node matrix in the Node Matrices folder.

You can run a matrix coding query to display the coding references currently associated with the user 'NVivo'. For example,

  • If you have used pattern-based coding on your files, then display the files in rows and display the nodes you coded at in columns. On the Columns tab, when choosing your columns, only display coding by the user 'NVivo'.
  • If you have used pattern-based coding operations to 'code on' from individual question nodes, then display the question nodes in rows and display the nodes you coded at in columns. On the Columns tab, when choosing your columns, only display coding by the user 'NVivo'.

Other ways that you can identify pattern-based coding references:

  • In a file or node, display coding stripes. You can use sub-stripes to see the users who did the coding.
  • In a node, filter the content to only display coding associated with the user 'NVivo'.
  • Run a predefined report—for example, the Coding Summary by Code or Coding Summary by File—and filter it on the user who did the coding.