(This feature is only available in NVivo installations with coding enhancements enabled.)

Automatic coding using existing coding patterns

Automatic coding based on existing coding patterns is an experimental feature that you can test and try out. It is designed to speed up the coding process for large volumes of textual content.

Understand autocoding using existing coding patterns

Pattern-based autocoding enables you to do approximate 'broad-brush' coding of large volumes of text quickly, which you can then review and refine.

Before you use pattern-based autocoding, you need to start with manual 'pilot' coding of your files. For example, if your research involves analyzing 100 interviews, you could manually code the first 10 interviews. Then, you could autocode the remaining interviews based on the coding patterns from your initial coding.

When you autocode using existing patterns, NVivo compares each text passage—for example, sentence or paragraph—to the content already coded to existing codes. If the content of the text passage is similar in wording to content already coded to a code, then the text passage will be coded to that code.

During pattern-based autocoding, words in the content to be coded are compared to the words in previously coded codes. Stop words are ignored when comparing text content to existing codes. Words with the same stem—for example house, houses and housing—are grouped together. Make sure your project text content language is set to the language of the sources you are analyzing.

When comparing the text passages to content coded to existing codes, any earlier pattern-based coding is ignored, in order to preserve the quality of the coding patterns.

Coding references that have been autocoded based on existing coding patterns are associated with the user profile 'NVivo' with the initials 'NV'.

The following is an illustration of pattern-based autocoding:

Process of pattern-based auto coding via the Auto Code wizard.

NOTE

NVivo (Release 1) also takes the following into account when determining coding patterns:

  • Uncoded content—make sure that you have finished 'pilot' coding the files that you select to base the coding patterns on before running pattern-based coding.
  • Word pairs—for example 'water quality' or 'climate change'.

When can pattern-based coding be useful?

We expect pattern-based autocoding will be most useful when coding to descriptive codes–for example, it may be able to identify that paragraph 1 is about housing and paragraph 2 is about public transport.

You may also be able to use pattern-based coding to identify passages that mention particular people (for example, politicians or leaders), places or organizations that are important to your research.

We do not recommend using pattern-based autocoding to code to codes that represent:

  • Sentiment—for example, positive, negative. Use Sentiment Analysis instead.
  • Attitudes, tones, or emotions—for example, enthusiastic, sarcastic, happy
  • Interpretations of the data—for example distance to school increases the likelihood of truancy
  • The speaker in an interview transcript—it cannot accurately identify who was speaking

Strategies for autocoding based on existing coding patterns

Before you use pattern-based autocoding, you need to start with manual 'pilot' coding of your files. Here are some strategies to consider when pilot coding:

  • The quality of the pilot coding will influence the quality of the results of autocoding.
  • If your pilot coding is too small or narrowly-focused, you may get poor results (or no results).
  • Coding smaller passages—for example, a sentence—may achieve better results than coding larger passages.
  • You may achieve better results by coding only the text that is relevant to a particular theme. For example, if a paragraph contains two ideas, only code the part that is relevant to the theme.

If your project involves capturing datasets iteratively, you can pilot code the initial dataset. For example, if you capture Twitter data at multiple intervals or gather responses from SurveyMonkey with multiple collectors, then you can manually pilot code the initial data and then autocode the subsequent data.

If your files contain responses to questions on a range of topics or issues, you may get better results with pattern-based coding if you autocode the responses to one question at a time using specific thematic codes that relate to that question. For example, if you have a dataset containing 1000 responses to a survey about public policy, you could:

  1. Gather the responses into a code for each question. You can do this by autocoding a dataset at codes for selected columns.
  2. Open the code for a particular question—for example Views on council funding—and for the first 50 responses, manually 'code on' to a group of thematic codes (animals, libraries, parks, public health, recycling).
  3. Use pattern-based coding to autocode the question code Views on council funding to the specific thematic codes for that question (animals, libraries, parks, public health, recycling).

The selections you make in the Autocode Wizard can have an impact on the quality of the results.

  • Be selective about the codes you choose to autocode to—for example, only use certain thematic codes. Pattern-based autocoding works best with thematic codes rather than relationships.
  • Experiment with the slider. If you choose 'Less', then NVivo applies stricter criteria when deciding whether to code the content.
  • Review the list of terms that are used to assess coding similarity. Having a targeted list of terms can improve the accuracy of pattern-based coding.

Review the results of autocoding. If you are not satisfied with the results, 'undo' the Autocode action, or revert to a backup copy of your project. Consider doing more pilot coding or adjusting the slider in the Autocode Wizard next time you autocode.

Autocode using existing coding patterns

IMPORTANT

  • Autocoding using existing coding patterns can perform a large amount of coding very quickly. It is a good idea to make a backup copy of the project before you start.
  • If you are working in an NVivo Collaboration Server project, you may want to open the project exclusively before you autocode—this ensures that you can 'undo' the autocode if you are not satisfied with the results.
  1. In List View, select the items you want to autocode. You can select files, codes or cases. Files do not have to be of the same file type. If you want to select items from different folders, you can use a set.
  2. On the Home tab, in the Coding group, click Autocode.
  3. Follow the steps on the Autocode Wizard.
Wizard step Description

Choose how you would like to autocode

Click Use existing coding patterns.

Automatically code text passages

Choose what you want to code to based on the files you've already coded to by clicking Select and choosing the relevant project items for each of the following:

  • Select the codes or relationships that you would like to code to
  • Select coded files that you want to base the coding patterns on—these files should contain the manual 'pilot' coding

Adjust the slider to influence how much coding NVivo will create, for example if you choose:

Less—NVivo uses a higher threshold to determine similarity when comparing each text passage to each code.

More—NVivo uses a lower threshold to determine similarity when comparing each text passage to each code.

Checking existing coding patterns

NVivo checks the existing coding patterns in the codes you have selected to code to to determine their suitability for pattern-based coding.

Once the check is complete, you will receive feedback on the suitability of your codes. If any issues are detected, NVivo displays a warning message—for example, to indicate that there is insufficient coding to a code.

You can click the Expand buttons to view the codes for each message. Codes with warnings will not be used to code to due to the issues detected. However, if you still want to code to a code, you can select it again in this step of the Wizard.

Select how your text passages will be coded

Choose how finely NVivo should code text passages:

  • Code sentences if you want individual sentences to be coded.
  • Code paragraphs if you want entire paragraphs to be coded
  • Code entire cell for datasets, transcripts and logs if you want entire cells to be coded for datasets, transcripts and picture logs. For other file types, entire paragraphs are coded.

NOTE

  • If your text content language is Chinese or Japanese, you will not be able to select Code sentences.
  • By default, the results are saved as a coding matrix in the Coding Matrices folder. You can clear the Save autocode results in the Coding Matrices folder check box. However, if you clear this check box, the results will be displayed as a temporary coding matrix that you will not be able to save.

Working with the results of autocoding

When you autocode using existing coding patterns, the results are displayed in Detail View, and by default, the results are saved as a coding matrix in the Coding Matrices folder. You can view the saved coding matrix later if you want a record of the coding performed by the Wizard at a particular date and time. This coding matrix is a static record that is not updated if you subsequently uncode some of the content.

Matrix displaying the results of pattern-based auto coding.

1 Columns display the names of the codes that have been coded to by the Wizard.

2 Rows display the files that have been coded by the Wizard.

3 Cells display the number of coding references that were created for a file (row) at a code (column). You can change the display, for example transpose the columns and rows—click Transpose in the View group, on the Matrix tab.

4 Click the Chart tab to see a visual representation of the autocoding results.

Reviewing and fine-tuning autocoding

It is a good idea to review the coding to check the relevance of coding references. Pattern-based coding is a complex task—for example, the meaning of a word varies depending on the context in which it appears, so you may see coding references that are unrelated to the code.

You may want to confirm that you are satisfied with the autocoding before performing other actions in your project, so that you can undo the autocoding if you need to. The 'undo' function can reverse up to five recent actions. Alternatively, if you made a backup copy of your project prior to autocoding, then you can restore the backup copy.

Here is an approach you may want to take to reviewing the results:

  1. Review what has been coded. Double click a cell in the matrix to see the content that was coded to the intersection of the file and code. Is the content relevant to that code? Take a look at other cells in the matrix.
  2. Decide whether you are satisfied with the results—do you want to keep some of the coding or undo the entire autocoding operation?
    • If you are mostly satisfied with the results, but need to fine-tune some of the autocoding, you may want to uncode some of the references. The coding reference is still displayed in Detail View for the cell, even if you have uncoded it. If you want to view the coding references resulting from pattern-based coding—excluding uncoded references—refer to How can I identify pattern-based coding references?
    • If you are not satisfied with the overall results, you may want to 'undo' the autocoding completely. You may want to refer to Strategies for autocoding based on existing coding patterns before trying again.

Why am I getting unexpected results from pattern-based coding?

Pattern-based autocoding uses machine-learning algorithms to look for existing coding patterns in codes you have previously coded to in your project.  The coding patterns are then used to determine further coding. You may get unexpected results because the algorithms can be influenced by:

  • The quality of the coding in your project (irrelevant content may influence the results)
  • The amount of coding in your project
  • The presence of advertising in your files. If you are working with web pages, capture only the main content on the page before importing into your project.
  • Words with multiple meanings—the meaning of a word can vary depending on the context
  • The uniqueness of the words in the codes—if your codes predominantly contain the same words, then it is more difficult for the algorithms to identify patterns

How can I identify pattern-based coding references?

Coding references that were created by the Wizard based on coding patterns are associated with the user profile 'NVivo' with the initials 'NV'.

If you have performed multiple pattern-based coding operations, you will not be able to distinguish which references were created by a particular pattern-coding operation. To see the references from a particular operation, you can view the specific coding matrix in the Coding Matrices folder.

You can run a matrix coding query to display the coding references currently associated with the user 'NVivo'. For example,

  • If you have used pattern-based coding on your files, then display the files in rows and display the codes you coded to in columns. On the Columns tab, when choosing your columns, only display coding by the user 'NVivo'.
  • If you have used pattern-based coding operations to 'code on' from individual question codes, then display the question codes in rows and display the codes you coded to in columns. On the Columns tab, when choosing your columns, only display coding by the user 'NVivo'.

Other ways that you can identify pattern-based coding references:

  • In a file or code, display coding stripes. You can use sub-stripes to see the users who did the coding.
  • In a code, filter the content to only display coding associated with the user 'NVivo'.
  • Run a predefined report—for example, the Coding Summary by Code or Coding Summary by File—and filter it on the user who did the coding.