How are cluster analysis diagrams generated?

To measure the similarity between each pair of items that will appear in a cluster diagram, NVivo first builds a table where:

  • The rows are the files, nodes or words that will appear in the diagram.
  • The columns and cells depend on which characteristic you’ve chosen to cluster by.
Table rows Clustered by Table columns Table cells
Sources Word similarity Each different word that appears in the text of the files The number of times the column’s word appears in the row’s file
Coding similarity Each node that codes the files’ content 1 if the column’s node codes the row’s file, 0 otherwise
Attribute value similarity Each different attribute value of the files (e.g. Book:Year = 2010) 1 if the row’s file has the column’s attribute value, 0 otherwise
Nodes Word similarity Each different word that appears in the text of the nodes The number of times the column’s word appears in the row’s node
Coding similarity Each file coded by the row’s node 1 if the column’s file is coded by the row’s node, 0 otherwise
Attribute value similarity Each different attribute value of the nodes (e.g. Person:Sex = Female) 1 if the row’s node has the column’s attribute value, 0 otherwise
Words (top 100 words in Word Frequency query results)

N/A

Each file or node that the query searches in The number of times the row’s word appears in the column’s file or node

NVivo then calculates a similarity index between each pair of items (each pair of rows in the table) using the similarity metric you’ve selected.

  • Pearson correlation coefficient (-1 = least similar, 1 = most similar). For more information, refer to the Wikipedia article Pearson product-moment correlation coefficient.
  • Jaccard’s coefficient (0 = least similar, 1 = most similar). For more information, refer to the Wikipedia article Jaccard index.
  • Sørensen’s coefficient (0 = least similar, 1 = most similar). For more information, refer to the Wikipedia article Sørensen similarity index.

Forming clusters

Using the calculated similarity index between each pair of items, NVivo groups the items into a number of clusters (10 by default), using the complete linkage (farthest neighbor) hierarchical clustering algorithm. For more information, refer to the Wikipedia article Complete-linkage clustering.

Generating a dendrogram

By default the results of the cluster analysis are displayed as a dendrogram, which is generated using the same complete linkage (farthest neighbor) hierarchical clustering technique that is used to form the clusters.

Generating a cluster map

The cluster analysis results can also be displayed as a 2D or 3D cluster map, where the items in the cluster analysis are represented as points in space.

The cluster map is generated using an iterative multidimensional scaling algorithm. Initially, the items are placed randomly as data points in a square or cube, and then a series of iterations are performed to optimize the positions of the items. The optimal distance between each pair of items is defined as 1.1 minus the similarity index between the items. At each iteration, the actual distance between each pair of items is compared to the optimal distance between them, and the data points are moved closer together or further apart accordingly. The algorithm ends when an optimal configuration is reached that cannot be improved by further movement of the data points.