Cluster analysis diagrams
Cluster analysis is an exploratory technique that you can use to visualize patterns in your project by grouping files or nodes that share similar words, similar attribute values, or are coded similarly by nodes. Cluster analysis diagrams provide a graphical representation of files or nodes to make it easy to see similarities and differences. Files or nodes in the cluster analysis diagram that appear close together are more similar than those that are far apart.
You can use cluster analysis diagrams to visualize:
- The similarities and differences across your files—for example, how similar are the submissions from the various community members?
- The similarities and differences across your nodes—for example, how similar is the coding at rising sea levels, flood control, soil erosion, and land reclamation?
- The demographic spread of your survey respondents based on attribute value.
The Diagram tab displays the visual representation of your data.
The Summary tab displays the similarity index values used to generate the diagram.
1 Items compared—each possible pair of selected items is listed as a row in the table.
2 Similarity Index—displays a value that indicates the degree of similarity for each pair of items based on the similarity metric selected. Items with a high similarity index (maximum=1) indicate a strong similarity and are displayed closer together on the cluster analysis diagram.
Create a cluster analysis diagram
- On the Explore tab in the Diagrams group, click Cluster Analysis.
- Follow the steps in the Cluster Analysis Wizard, and then click Finish.
You can also view Word Frequency query results as a cluster analysis diagram. This type of cluster analysis diagram displays the most frequently occurring words in the selected files or nodes.
Cluster by word, coding or attribute value similarity
The files or nodes in a cluster analysis diagram, can be clustered by word similarity, coding similarity or attribute value similarity.
|Word similarity||The words contained in the selected files or
nodes are compared.
Files or nodes that have a higher degree of similarity based on the occurrence and frequency of words are shown clustered together. Files or nodes that have a lower degree of similarity based on the occurrence and frequency of words are displayed further apart.Stop words are excluded when using this measure of similarity. Text content language and stop words
|Coding similarity||The coding at the selected files or nodes is compared. Files or nodes that have been coded similarly are clustered together on the cluster analysis diagram. Files or nodes that have been coded differently are displayed further apart on the cluster analysis diagram.|
|Attribute value similarity||The attribute values of the selected files or nodes are compared. Files or nodes that have similar attribute values are clustered together on the cluster analysis diagram. Files or nodes that have different attribute values are displayed further apart on the cluster analysis diagram.|
Selecting a similarity metric
A similarity metric is a statistical method used to calculate correlation between items. When you create a cluster analysis diagram using the Cluster Analysis Wizard, you can choose from the following similarity metrics:
Working with data in other languages
The language used in your data has no impact on the results for cluster analysis by coding or attribute value similarity.
For cluster analysis by word similarity, NVivo will exclude any defined ‘stop words’ from the similarity calculation. When you are working with content in other languages, stop words will improve the outcome of your cluster analysis by excluding similarity based on words which convey less meaning. This will reduce the chance that documents will have a high similarity coefficient based predominantly on these words. To check which stop words apply to your content, you can view the Stop Words list.
For example, if you are working with content in Turkish, you might like to:
- Set the text content language to ‘Other’.
- Add appropriate Turkish words to the Stop Words list. For examples of what words might be appropriate, take a look at the existing stop words provided in other languages.
Visualize patterns in social media datasets
Cluster analysis enables you to compare similarity of words in social media datasets. For example, you can visualize the similarities and differences across users in a:
- Facebook dataset You may discover new insights—for example, how similar are the posts or comments from various users?
- Twitter dataset You may find other Twitter users that share similar views to a Twitter account you are researching.