Nursing Research and Factor Analysis
What Is Factor Analysis
Factor
analysis is a multivariate technique for determining the underlying structure
and dimensionality of a set of variables. By analyzing intercorrelations among
variables, factor analysis shows which variables cluster together to form
unidimensional constructs. It is useful in elucidating the underlying meaning
of concepts.
However, it involves a higher degree of subjective interpretation
than is common with most other statistical methods. In nursing research, factor
analysis is commonly used for instrument development ( Ferketich & Muller,
1990), theory development, and data reduction.
Therefore, factor analysis is
used for identifying the number, nature, and importance of factors, comparing
factor solutions for different groups, estimating scores on factors, and
testing theories (Nunnally & Bernstein, 1994).
Types of Factor Analysis
There
are two major types of factor analysis: exploratory and confirmatory. In
exploratory factor analysis, the data are described and summarized by grouping
together related variables. The variables may or may not be selected with a
particular purpose in mind.
Exploratory factor analysis is commonly used in the
early stages of research, when it provides a method for consolidating variables
and generating hypotheses about underlying processes that affect the clustering
of the variables.
Confirmatory factor analysis is used in later stages of
research for theory testing related to latent processes or to examine
hypothesized differences in latent processes among groups of subjects. In
confirmatory factor analysis, the variables are carefully and specifically
selected to reveal underlying processes or associations.
Variables Characteristics
The
raw data should be at or applicable to the interval level, such as the data
obtained with Likert-type measures. Next, a number of assumptions relating to
the sample, variables, and factors should be met.
First, the sample size must
be sufficiently large to avoid erroneous interpretations of random differences
in the magnitude of correlation coefficients.
As a rule of thumb, a minimum of
five cases for each observed variable is recommended however, Knapp and Brown
(1995) reported that ratios as low as three subjects per variable may be
acceptable. Others generally recommend that 100 to 200 is advisable (Nunnally
& Bernstein, 1994).
Second,
the variables should be normally distributed, with no substantial evidence of
skewness or kurtosis. Third, scatterplots should indicate that the associations
between pairs of variables should be linear.
Fourth, outliers among cases
should be identified and their influence reduced either by transformation or by
arbitrarily replacing the outlying value with a less extreme score.
Fifth,
instances of multicollinearity and singularity of the variables should be
deleted after examining to see if the determinant of the correlation matrix or
eigenvalues associated with some factors approach zero. In addition, a squared
multiple correlation equal to 1 indicates singularity; and if any of the
squared multiple correlations are close to 1, multicollinearity exists.
Sixth,
outliers among variables, indicated by low squared multiple correlation with
all other variables and low correlations with all important factors, suggest
the need for cautious interpretation and possible elimination of the variables
from the analysis.
Seventh, there should be adequate factorability within the
correlation matrix, which is indicated by several sizable correlations between
pairs of variables that exceed .30. Finally, screening is important for
identifying outlying cases among the factors.
If such outliers can be
identified by large Mahala Nobis distances (estimated as chi square values)
from the location of the case in the space defined by the factors to the
centroid of all cases in the same space, factor analysis is not considered
appropriate.
Considering the Variables
When
planning for factor analysis, the first step is to identify a theoretical model
that will guide the statistical model ( Ferketich & Muller, 1990). The next
step is to select the psychometric measurement model, either classic or
neoclassical, that will reflect the nature of measurement error.
The classic
model assumes that all measurement error is random and that all variance is
unique to individual variables and not shared with other variables or factors.
The neoclassic model recognizes both random and systematic measurement error,
which may reflect common variance that is attributable to unmeasured or latent
factors.
The selection of the classic or neoclassical model influences whether
the researcher chooses principal-components analysis or common factor analysis
( Ferkerich & Mullerlly)
Mathematical Description of Analysis
Mathematically
speaking, factor analysis generates factors that are linear combinations of
variables. The first step in factor analysis is factor extraction, which
involves the removal of as much variance as possible through the successive
creation of linear combinations that are orthogonal (unrelated) to previously
created combinations.
The principal-components method of extraction is widely
used for analyzing all the variance in the variables. However, other methods of
factor extraction, which analyze common factor variance ( ie ., variance that
is shared with other variables), include the principal-factors method, the
alpha method, and the maximum likelihood method (Nunnally & Bernstein,
1994).
Various criteria have been used to determine how many factors account
for a substantial amount of variance in the data set. One criterion is to
accept only those factors with an eigenvalue equal to or greater than 1.0
(Guttman, 1954).
An eigenvalue is a standardized index of the amount of the
variance extracted by each factor. Another approach is to use a screen test to
identify sharp discontinuities in the eigenvalues for successive factors
(Cattell, 1966).
Outcomes of Analysis
Factor
extraction results in a factor matrix that shows the relationship between the
original variables and the factors by means of factor loadings. The factor
loadings, when squared, equal the variance in the variable accounted for by the
factor.
For all of the extracted factors, the sum of the squared loadings for
the variables represents the communality (shared variance) of the variables.
The sum of a factor’s squared loadings for all variables equals that factor’s
eigenvalue (Nunnally & Bernstein, 1994).
Because
the initial factor matrix may be difficult to interpret, factor rotation is
commonly used when more than one factor emerges. Factor rotation involves the
movement of the reference axes within the factor space so that the variables
align with a single factor (Nunnally & Bernstein, 1994).
Orthogonal
rotation keeps the reference axes at right angles and results in factors that
are uncorrelated. Orthogonal rotation is usually performed through a method
known as varimax, but other methods (quartic max and equal max) are also
available. Oblique rotation allows the reference axes to rotate into acute or
oblique angles, thus resulting in correlated factors (Nunnally &
Bernstein).
When oblique rotation is used, there are two resulting matrices: a
pattern matrix that reveals partial regression coefficients between variables
and factors and a structure matrix that shows variable to factor correlations,
Factors are interpreted by examining the pattern and magnitude of the factor
loadings in the rotated factor matrix (orthogonal rotation) or pattern matrix
(oblique rotation).
Ideally, there are one or more marker variables, variables
with a very high loading none and only one factor (Nunnally & Bernstein,
1994), that can help in the interpretation and naming of factors. Generally,
factor loadings of .30 and higher are large enough to be meaningful (Nunnally
& Bernstein).
Once a factor is interpreted and labeled, researchers usually
determine factor scores, which are scores on the abstract dimension defined by
the factor.
Replication
of factor solutions in subsequent analysis with different populations gives
increased credibility to the findings. Comparisons between factor-analytic
solutions can be made by visual inspection of the factor loadings or by using
formal statistical procedures, such as the computation of Cattell’s salient similarity
index and the use of confirmatory factor analysis (Gorsuch, 1983).