Coding comparison query
The coding comparison query compares coding by two users to measure 'inter-rater reliability'—the degree of coding agreement between them. Agreement is measured by two statistical methods:
- Percentage agreement: the number of content units on which coders agree (to code or not to code), divided by the total number of units—reported as a percentage.
- Cohen's kappa coefficient: a statistical measure that takes into account the amount of agreement expected by chance—expressed as a decimal in the range –1 to 1 (where values ≤ 0 indicate no agreement, and 1 indicates perfect agreement).
You can select specific nodes
It is possible to compare coding between groups of users, however, standardly, comparisons are made between individual coders. If comparing groups, content is taken to be coded if at least one member of a group coded it. Coders are identified by their NVivo user profiles User profiles.
Both text coding and region coding can be compared. These are treated separately, producing separate results:
- All coding in documents, datasets, externals, memos, codes and cases is text coding.
- All coding in picture, audio and video files is region coding, using pixel ranges (rectangles defined by their top-left and bottom-right corners) or timespans. Note:
- If you code transcript text in an audio or video file, the timespan of the entire transcript row is treated as having been coded. So, for example, two coders who code different sentences in a row will be reported as in agreement for the entire row timespan.
- If you code text in the description/notes of a picture file all the pixels in the picture are treated as having been coded.
- PDFs can have both region (pixel range) and text coding.
Text coding uses text characters as the unit of comparison.
You can save query configuration settings to run the same query at a later time, when more coding has been done. Queries are saved under Search / Queries in the Navigation View.
You cannot save query results in NVivo, however you can export them to other applications (e.g. Excel) to save. Export query results
Run a coding comparison
- On the Explore tab, select Coding Comparison. The Coding Comparison Query dialog box opens.
- For User Group A and User Group B, select the users whose coding you want to compare (recommended to compare individual coders). If you include more than 1 user in a group, any coding carried out by any of the users in the group counts towards the comparison.
- Use the At field to select the nodes and/or cases you want to compare coding for:
- All Nodes: coding to all nodes and cases in the selected files
- Selected Nodes: coding to selected nodes and/or cases in the selected files
- Codes and Cases in Selected Static Sets: all the nodes and cases included in selected sets
- Cases Assigned to Selected Classifications: all the cases with selected case classifications
- Codes and Cases in Selected Search Folders: all the nodes and cases included in selected dynamic sets
NOTE: If you include an aggregate node in the scope of a query, content coded to it or its direct children is included in the results. Aggregate nodes (gather all content in a parent node)
- In the Scope field select the files in which you want to compare the coding:
- Files & Externals: all data files and externals, but not memos
- Selected Items: selected data files, externals and/or memos
- Items in Selected Folders: all data files, externals and/or memos in selected folders
- Files, Externals & Memos in Selected Sets: all data files, externals and memos included in selected sets
- Files Assigned to Selected Classifications: all data files, externals and memos with selected file classifications
- Files, Externals & Memos in Selected Search Folders: all the data files, externals and memos included in selected search folders
- Select Display kappa coefficient and/or percentage agreement to include in the results (you must select at least one).
- Select Text coding and/or Region coding for the type/s of coding you want to compare:
- All coding in documents, externals, memos, nodes and cases is text
- All coding in picture, audio and video files is region coding, using pixel ranges or timespans
- PDFs can have both region (pixel range) and text coding.
- To save the query settings, check Add to project at the top of the dialog box. Name the query and optionally provide a description.
- Click Run.
Query results are displayed in the Detail View (see below).
Results for text and region coding are shown on different tabs in the Detail View.
Each row shows data for one node or case in one file. Results for single nodes or cases across all files in the query are not shown, nor an overall value for all nodes or cases and files. You can calculate these independently by exporting the results data Calculating across multiple files and/or nodes).
To view the content that a row in the results table refers to, right-click in the row and select Open Node/Sentiment/Relationship/Case or Open File .
1 The node, sentiment, relationship or case being compared.
2 The name of the file, and its location, where the node or case was coded.
3 The file size, measured as follows:
- Documents, datasets, memos and externals = number of characters
- PDFs = number of pages and number of characters
- Media file = duration in minutes/seconds/tenths of seconds
- Picture = the total number of pixels expressed as height multiplied by width
4 The kappa coefficient—shown only if you selected Display Kappa Coefficient.
5 The green columns show percentage agreement (shown only if you selected Display percentage agreement):
- Agreement = sum of columns A and B and Not A and Not B
- A and B = the percentage of content coded to the selected node by both Group A and Group B
- Not A and Not B = the percentage of content coded by neither Group A nor Group B
6 The red columns show percentage disagreement (shown only if you selected Display percentage agreement):
- Disagreement Column = sums of columns A and Not B and B and Not A
- A and Not B = the percentage of content coded by Group A and not by Group B
- B and Not A = the percentage of content coded by Group B and not by Group A
How is percentage agreement calculated?
NVivo calculates percentage agreement for each combination of node or case and file.
Percentage agreement is the percentage of file content (measured in characters,
For example, in a document with 1000 characters where:
- 800 characters have not been coded by either user
- 50 characters have been coded by both users, and
- 150 characters have been coded by only one user
the percentage agreement is (800 + 50) ÷ 1000 = 85%, because both users 'agree' about 850 of the characters.
How is the Cohen kappa coefficient calculated?
Cohen’s kappa is widely used to quantify the level of inter-rater agreement between two raters (i.e. coders). The formula calculates the agreement between two coders and then adjusts for agreement that would happen by chance.
The formula is: κ = P0 – Pe / 1 – Pe
where P0 is the amount of agreement between two coders (equivalent to the 'percentage agreement' calculated by NVivo) and Pe is the probability of chance agreement.
The formula can be illustrated by the following table, where:
- Pyy is the proportion of content that both coders assigned to a node
- Pyn is the proportion that coder 1 assigned to the node and coder 2 did not
- Pny is the proportion that coder 2 assigned to the node and coder 1 did not
- Pnn is the proportion that neither coder coded to the node.
The sum of these proportions is 1: Pyy + Pyn + Pny + Pnn = 1
|Assigned node||Did not assign node|
|Coder A||Assigned node||Pyy||Pyn|
|Did not assign node||Pny||Pnn|
The observed agreement, P0 is: Pyy + Pnn
The probability of chance agreement, Pe is: (Pyy + Pyn) × (Pyy + Pny) + (Pny + Pnn) × (Pyn + Pnn)
Two coders coded 100 excerpts in the proportions shown in the table:
|Assigned node||Did not assign node|
|Coder A||Assigned node||Pyy = 40/100 = 0.4||Pyn = 20/100 = 0.2|
|Did not assign node||Pny = 10/100 = 0.1||Pnn = 30/100 = 0.3|
The observed agreement P0 is: Pyy + Pnn = 0.4 + 0.3 = 0.7
(Both coders coded the same 40 excerpts and didn't code the same 30 excerpts, so they agree on 70 of the 100 excerpts.)
The probability of chance agreement is:
Pe = (Pyy + Pyn) × (Pyy + Pny) + (Pny + Pnn) × (Pyn + Pnn)
= (0.4 + 0.2) × (0.4 + 0.1) + (0.1 + 0.3) × (0.2 + 0.3)
= (0.6 × 0.5) + (0.4 × 0.5)
= 0.3 + 0.2
Inserting the values for P0 and Pe in the formula:
κ = P0 – Pe / 1 – Pe
= 0.7 – 0.5 / 1 – 0.5
= 0.2 / 0.5
The kappa coefficient for this example is 0.4, indicating a 'fair' or 'moderate' degree of inter-coder agreement (see next section).
You can see further examples of kappa coefficient calculations by downloading this Excel spreadsheet: Coding Comparison Calculation Examples spreadsheet.
Interpreting kappa coefficients
If two users are in complete agreement about which content to code in a file, then the kappa coefficient is 1. If there is no agreement other than what could be expected by chance, the kappa coefficient is ≤ 0. Values between 0 and 1 indicate partial agreement.
Different authors have suggested different guidelines for interpreting kappa values, for example (from Xie, 2013):
|Landis & Koch (1977)||Altman, DG (1991)||Fleiss et al (2003)|
|κ||Strength of agreement||κ||Strength of agreement||κ||Strength of agreement|
|0.81–1.00||excellent||0.81–1.00||very good||0.75–1.00||very good|
|0.61–0.80||substantial||0.61–0.80||good||0.41–0.75||fair to good|
Kappa vs. percent agreement
Kappa values can be low when percentage agreement is high. For example, if two users code different small sections of a file leaving most content uncoded, the percentage agreement is high, because there is high agreement on content that should not be coded. But this situation is likely to occur by chance (i.e. if the coding was random), and so the kappa coefficient is low.
Conversely, if most of a file is not coded but there is agreement on the content that is coded, then percentage agreement is again high, but now the kappa value, too, is high, because this situation is unlikely to occur by chance.
All kappa coefficients are 0 or 1
If all the kappa values in a query are 0 or 1 it may indicate that one of the two users being compared has not coded any of the selected files to the selected nodes, i.e. you may have selected the wrong files, codes, or coders for the query.
If one user’s work has been imported from another project it may indicate that their coding was not imported. When merging projects with the intention of running coding comparisons, ensure that all documents and codes in the projects (including coding structures) match properly. When configuring import:
- select to import 'All' project items, or 'Selected (including content)' with 'Coding' selected.
- select 'Merge into existing item” for duplicate items
NORA MCDONALD, SARITA SCHOENEBECK, ANDREA FORTE (2019). Reliability and inter-rater reliability in qualitative research: Norms and guidelines for cscw and hci practice. ACM Hum.-Comput. Interact, 39(39), Article 39, 23.
McHugh, M. L. (2014). Interrater reliability; the kappa statistic. In X. Lin, C. Genest, D. L. Banks, G. Molenberghs, D. W. Scott, & J.-L. Wang (Eds.), Past, present, and future of statistical science (pp. 359–372). Chapman and Hall/CRC. https://doi.org/10.1201/b16720-37
The New Stack. (2020). Cohen’s kappa: What it is, when to use it, and how to avoid its pitfalls - the new stack. https://thenewstack.io/cohens-kappa-what-it-is-when-to-use-it-and-how-to-avoid-its-pitfalls/
Vries, H. de, Elliott, M. N., Kanouse, D. E., & Teleki, S. S. (2008). Using pooled kappa to summarize interrater agreement across many items. Field Methods, 20(3), 272–282. https://doi.org/10.1177/1525822X08317166
Xie, Q. (2013) Agree or Disagree? A Demonstration of An Alternative Statistic to Cohen’s Kappa for Measuring the Extent and Reliability of Agreement between Observers. Conference presentation at Federal Committee on Statistical Methodology Research Conference, Washington, DC, November 4-6 https://nces.ed.gov/FCSM/pdf/J4_Xie_2013FCSM.pdf
NVivo calculates percentage agreement and kappa coefficients for each combination of node or case and file. It does not calculate values for a single node or case across all the files in a query's scope, nor overall values for all the nodes/cases in all the files. To get these values, export the results into another application, such as Excel, to calculate yourself Export query results.
Before calculating the additional values decide how you want to weight the files—that is, treat each file equally or weight them according to the amount of codable content they contain.
To help understand the calculations, download the Coding Comparison Calculation Examples spreadsheet, which has four worked examples using spreadsheet formulas:
- Average figures for a single node across 3 files (weighting each file equally)
- Average figures for a single node across 3 files (weighting each file according to its size)
- Average figures for 5 nodes across 3 files (weighting each file equally)
- Average figures for 5 nodes across 3 files (weighting each file according to its size)