Correspondence Analysis (CATA and categorical data)
Purpose
To visualise and summarise analyse tabular data and to
highlight the patterns of association in two way tables. It is widely used for
mapping pure qualitative variables – e.g cluster by demographic use.
This is an example of typical data that can be mapped – this
could come from a CATA study, counting up the number of checks, or it could be
any other categorical table of counts:
The analysis is performed on count data from a
cross tabulation of two categorical variables but the data must be input as a column per
categorical variable of interest and this should be interval or text data.
In EyeOpenR this can be done on standard
categorical data or CATA data and will work if there is a small amount of
missing data.
Background
Correspondence analysis tells you about the following:
- Similarities between row categories
- Similarities between column categories
- Associations between row and column categories
The analysis gives a visualisation of the row and column
categories highlighting the chi-squared associations in the two way table.
The degree of associations are quantified by the Chi-Squared
statistics. The eigenvalues measure the proportion of the inertia explained in
each dimension. The dimensions are then plotted on x and y axes in order to
visualise them, and are interpreted in a similar way to PCA maps.
(Inertia=chi=squared/N)
Correspondence analysis is used to find coordinates of
successive axes that try to recover as much of the inertia as possible
Options
- Dimensions to visualise: here you choose which
set (1v2, 1v3 or 2v3) you wish to see shown as plots.
- Clustering: You can choose to perform cluster
analysis on the products.
- Determine clusters: The number of clusters can
be calculated automatically or specified by the user.
- Number of Decimals for Values: Required number
of decimals for values given in the results.
- Number of Decimals for P-Values: Required number
of decimals for any p-values given in the results.
Results and Interpretation
The output gives maps. We
interpret the maps in a similar way to PCA maps. The rows and columns of the
chi-squared components (associations) are plotted on the same plot. We
interpret this as follows:
• Row
points close together have similar profiles across the columns.• Column
points close together have similar profiles across the rows.
• Row
and column points in the same direction from the centre show a relatively high
positive association.• Row
and column points in opposite directions
show a negative association.
• Unlike
PCA there is no direct interpretation of distances between row and
column points.
Be careful – Particularly last point above. We have seen lots
of misinterpretations of the plots – they are not point/vector plots as in PCA.
In addition, if you have any attributes that are sparse in CATA data, these may
be overweighted in the analysis, so should be removed before mapping.
The outputs given are as follows (for both CA on regular
categorical data and for CATA data):
1.
The Frequency tab shows the counts associated
with each Product and Attribute, as a table product attribute. The numbers in
the cells are counts for each.2.
The Eigenvalues tab provide the percent variance
associated with each of the calculated dimensions, both individually per
dimension and also as a cumulative total.
3.
The Products tab gives the product coordinates (as
they would appear on the CA map) together with the associated Contribution
(Contrib) and Squared Cosine (cos2) values. Within this tab there is also
the factor map of the products, which is also available to download.4.
The Variables tab gives the coordinates or the
attributes(variables) together with the associated Contribution (Contrib) and
Squared Cosine (cos2) values. Within this tab there is also the factor
map of the variables/attributes, which is also available to download.
5.
The CA Graph tab is the classic correspondence
analysis map showing the association between the Products and Variables
(interpretation notes above).
6.
The Cluster tab gives information on the
clustering if it has been performed (as it is an option). Within this the
Cluster Info tab shows the cluster number that each product has been grouped
in. The Cluster Description tab highlights which Variables/Attributes are
associated with each cluster of products, where they are showing as
statistically significant at the 5% level and can be characterized. Note: When
there are <5 products, a 2 cluster solution is forced.
- R packages: FactoMiner
Further info on these packages can be found in the R documentation in the following locations:
- CA {FactoMineR}
- HCPC {FactoMineR}
The analysis is based on the CA on the contingency table, using the CA {FactoMineR} function (Chi2 distance). The clustering that is performed on top of it (if asked) is based on the HCPC{function} and is performed on the rows (AHC + K-means). Since CA is sensitive to sparse attributes, a filter can be applied based on the number of time a word has been selected.
References
McEwan, J. A., Schlich, P. (1991), Correspondence Analysis
in Sensory Evaluation, Food Quality & Preference 3, 23-36
Related Articles
Correspondence Analysis (CATA and categorical data)
Purpose To visualise and summarise analyse tabular data and to highlight the patterns of association in two way tables. It is widely used for mapping pure qualitative variables – e.g cluster by demographic use. This is an example of typical data that ...
Check All That Apply (CATA)
Introduction The "Check-All-That-Apply" (CATA) method is utilized in sensory evaluation to collect information regarding the sensory characteristics of a product. In this method, participants are presented with a predetermined list of sensory ...
How can I analyse my data?
In EyeQuestion there are multiple options to analyse the project data. When you go to the export page in the project you will find a dropdown menu called "Analysis". Auto reports Via the option for Auto reports EyeQuestion will analyse the data and ...
Penalty Analysis
Purpose To provide a penalty analysis of a consumer data set, that is to investigate how liking or acceptability of product decreases when product attributes are not at the optimal intensity. Data Format Example dataset: Consumer.xlsx Note: for ...
Cochran and McNemar test (CATA)
Purpose Cochran and McNemar tests are used to test for differences between products when the data has been collected through a ‘Check All That Apply’ (CATA) design. Using a CATA method for sensory research means that the responses collected are ...