Testing Assumptions for Statistical Tests, performing additional peripheral analyses, Principal Analysis, The Substantive Data Analysis Plan, Substantive Analysis, Interpret, Credibility of the regulation Of Results, meaning of the results, Interpreting Hypothesized Results, Interpreting Un-Hypothesized Significant Results, Interpreting mixed results, Generalizability of the results, Implications of the results.
Overview of Statistical Tests
Testing Assumptions for Statistical Tests, most statistical tests are based on several assumptions conditions that are presumed to be true and, when violated, can lead to misleading or invalid results. For example, parametric tests assume that variables are distributed normally. Frequency distributions, scatter plots, and other assessment procedures provide researchers with information about whether underlying assumptions for statistical tests have been violated.
Frequency distributions can reveal whether the normality assumption is tenable; Graphic displays of data values indicate whether the distribution is severely skewed, multimodal, too peaked (leptokurtic), or too flat (platykurtic). There are also statistical indexes of skewness or peakness that statistical programs can compute to determine whether the shape of the distribution deviates significantly from normality. Performing Data Transformations Raw data entered directly onto a computer file often need to be modified or transformed before hypotheses can be tested.
Various data transformations can easily be handled through commands to the computer. All statistical software packages can create new variables through arithmetic manipulations of variables in the original data set. We present a few examples of such transformations, covering a range of realistic situations.
• Performing item reversals.
Testing Assumptions for Statistical Tests. Sometimes response codes to certain variables need to be reversed (i.e., high values becoming low, and vice versa) so that items can be combined in a composite scale. For example, the widely used CES-D scale consists of 20 items, 16 of which are statements indicating negative feelings in the prior week (e.g., item 9 states, “I thought my life had been a failure”), and four of which indicate positive feelings (e.g., item 8 states, “I felt hopeful about the future”). The positively worded items must be reversed before items are added together.
CES-D items have four response options, from 1 (rarely felt that way) to 4 (felt that way most days). To reverse an item (i.e., to convert a 4 to a 1, and so on), the raw value of the item is subtracted from the maximum possible value, plus 1. In SPSS, this can be accomplished through the “Compute” command , which could be used to set the value of a new variable to 5 minus the value of the original variable; for example, a new variable CESD8R could be computed as the value of 5 CESD8, where CESD8 is the original value of item 8 on the CESD scale.
• Constructing scales.
Transformations are also used to construct composite scale variables, using responses to individual items. Commands for creating such scales in statistical software packages are straightforward, using algebraic conventions. In SPSS, the “Compute” command could again be used to create a new variable; for example, a new variable STRESS could be set equal to Q1 Q2 Q3 Q4 Q5.
• Performing counts.
Sometimes composite indexes are created when researchers want a cumulative tally of the occurrence of some attribute. For example, suppose we asked people to indicate which types of illegal drug they had used in the past month, from a list of 10 options. Use of each drug would be answered independently in a yes (coded 1) or no (coded 2) fashion. We could then create a variable indicating the number of different drugs used. In SPSS, the “Count” command would be used, creating a new variable (e.g., DRUGS) equal to the sum of all the “1” codes for the 10 drug items. Note that counting is the approach used to create missing values flags, described earlier in this chapter.
• Recording variables.
Testing Assumptions for Statistical Tests. Other transformations involve recoding values to change the nature of the original data. For example, in some analyses, an infant’s original birth weight (entered on the computer file in grams) might be used as a dependent variable. In other analyses, however, the researcher might be interested in comparing the subsequent morbidity of low-birth-weight versus normal-birth-weight infants. For example, in SPSS, the “Recode Into Different Variable” command could be used to recode the original variable (BWEIGHT) into a new dichotomous variable with a code of 1 for a low birth-weight infant and a code of 2 for a normal birth-weight infant, based on whether BWEIGHT was less than 2500 grams.
• Meeting statistical assumptions.
Transformations also can be undertaken to render data appropriate for statistical tests. For example, if a distribution is markedly non-normal, a transformation can sometimes be done to make parametric procedures appropriate. A logarithmic transformation, for example, tends to normalize distributions. In SPSS, the “Compute” command could be used to normalize the distribution of values on family income (INCOME), for instance, by computing a new variable (eg, INCLOG) set equal to the natural log of the values on INCOME. Discussions of the use of transformations for changing the characteristics of a distribution can be found in Dixon and Massey (1983) and Ferketich and Verran (1994).
• Creating dummy variables.
Data transformations may be needed to convert codes for multivariate statistics. For example, for dichotomous variables, researchers most often use a 0-1 code (rather than say, a 1-2 code) to facilitate interpretation of regression coefficients. Thus, if the original codes for gender were 1 for women and 2 for men, men could be recoded to 0 for a regression analysis.
Performing additional peripheral analyses
Depending on the study, additional peripheral analyzes may be needed before proceeding to substantive analyses. It is impossible to catalog all such analyses, but a few examples are provided to alert readers to the kinds of issues that need to be given some thought.
• Data pooling.
Researchers sometimes obtain data from more than one source or from more than one type of subject. For example, to enhance the generalizability of their findings, researchers sometimes draw subjects from multiple sites, or may recruit subjects with different medical conditions. The risk in doing this is that subjects may not really be drawn from the same population, and so it is wise in such situations to determine whether pooling of data (combining data for all subjects) is warranted. This involves comparing the different subsets of subjects (i.e., subjects from the different sites, and so on) in terms of key research variables
• Cohort effects.
Nurse researchers sometimes need to gather data over an extended period of time to achieve adequate sample sizes. This can result in cohort effects, that is, differences in outcomes or subject characteristics over time. This might occur because sample characteristics evolve over time or because of changes in the community, in families, in health care, and so on.
If the research involves an experimental treatment, it may also be that the treatment itself is modified for example, if early program experience is used to improve the treatment or if those administering the treatment simply get better at doing it. Thus, researchers with a long period of sample intake should consider testing for cohort effects because such effects can confound the results or even mask existing relationships. This activity usually involves examining correlations between entry dates and key research variables.
• Ordering effects.
When a crossover design is used (i.e., subjects are randomly assigned to different orders of treatments), researchers should assess whether outcomes are different for people in the different treatment-order groups.
• Manipulation checks.
In testing an intervention, the primary research question is whether the treatment was effective in achieving the intended outcome. Researchers sometimes also want to know whether the intended treatment was, in fact, received. Subjects may perceive a treatment, or respond to it, in unanticipated ways, and this can influence treatment effectiveness. Therefore, researchers sometimes build in mechanisms to test whether the treatment was actually in place.
For example, suppose we were testing the effect of noise levels on stress, exposing two groups of subjects to two different noise levels in a laboratory setting. As a manipulation check, we could ask subjects to rate how noisy they perceived the settings to be. If subjects did not rate the noise levels in the two settings differently, it would probably affect our interpretation of the results—particularly if stress in the two groups turned out not to be significantly different.
Principal Analysis
At this point in the analysis process, researchers have a cleaned data set, with missing data problems resolved and needed transformations completed; they also have some understanding of data quality and the extent of biases. They can now proceed with more substantive data analyses.
The Substantive Data Analysis Plan
In many studies, researchers collect data on dozens, and often hundreds, of variables. They cannot realistically analyze every variable in relation to all others, and so a plan to guide data analysis must be developed. Research hypotheses and questions provide only broad and general direction. One approach is to prepare a list of the analyzes to be undertaken, specifying both the variables and the statistical test to be used. Another approach is to develop table shells. Table shells are layouts of how researchers envision presenting the research findings in a report, without any numbers in the table.
Once a table shell has been prepared, researchers can undertake analyzes to fill in the table entries. This table guided a series of ANCOVAs that compared experimental and control groups in terms of several indicators of emotional well-being, after controlling for various characteristics measured at random assignment. The completed table that eventually appeared in the research report was somewhat different than this table shell (e.g., another outcome variable was added). Researchers do not need to adhere rigidly to table shells, but they provide an excellent mechanism for organizing the analysis of large amounts of data.
Substantive Analysis
The next step is to perform the actual substantive analyses, typically beginning with descriptive analyses. For example, researchers usually develop a descriptive profile of the sample, and often look descriptively at correlations among variables. These initial analyzes may suggest further analyzes or further data transformations that were not originally envisioned. They also give researchers an opportunity to become familiar with their data.
Researchers then perform statistical analyzes to test their hypotheses. Researchers whose data analysis plan calls for multivariate analyzes (e.g., multivariate analysis of variance [MANOVA]) may proceed directly to their final analyses, but they may begin with various bivariate analyzes (e.g., a series of analyzes of variance [ANOVAs]). The primary statistical analyzes are complete when all the research questions are addressed and, if relevant, when all table shells have the applicable numbers in them
Interpretation Of Results
The analysis of research data provides the results of the study. These results need to be evaluated and interpreted, giving consideration to the aims of the project, its theoretical basis, the existing body of related research knowledge, and limitations of the research methods adopted. The interpretive task involves a consideration of five aspects of the results:
(1) their credibility
(2) their meaning
(3) their importance
(4) the extent to which they can be generalized
(5) their implications.
Credibility of the results
One of the first interpretive tasks is assessing whether the results are accurate. This assessment, in turn, requires a careful analysis of the study’s methodologic and conceptual limitations. Regardless of whether one’s hypotheses are supported, the validity and meaning of the results depend on a full understanding of the study’s strengths and shortcomings. Such an assessment relies heavily on researchers’ critical thinking skills and on their ability to be reasonably objective.
Researchers should carefully evaluate the major methodological decisions they made in planning and executing the study and consider whether different decisions might have yielded different results. In assessing the credibility of results, researchers seek to assemble different types of evidence. One type of evidence comes from prior research on the topic. Investigators should examine whether their results are consistent with those of other studies; if there are discrepancies, a careful analysis of the reasons for any differences should be undertaken.
Type of Error
Evidence can often be developed through peripheral data analyses, some of which were discussed earlier in this chapter. For example, researchers can have greater confidence in the accuracy of their findings if they have established that their measures are reliable and have ruled out biases. Another recommended strategy is to conduct a power analysis. Researchers can also determine the actual power of their analyses, to determine the probability of having committed a Type II error.
It is especially useful to perform a power analysis when the results of statistical tests are not statistically significant. For example, suppose we were testing the effectiveness of an intervention to reduce patients’ pain. The sample of 200 subjects (100 subjects each in an experimental and a control group) are compared in terms of pain scores, using a t-test.
Suppose further that the mean pain score for the experimental group was 7.90 (standard deviation [SD] 1.3), whereas the mean for the control group was 8.29 (SD 1.3), indicating lower pain among experimental subjects. Although the results are in the hypothesized direction, the t-test was nonsignificant. We can provide a context for interpreting the accuracy of the nonsignificant results by performing a power analysis.
Meaning of the Results
In qualitative studies, interpretation and analysis occur virtually simultaneously. In quantitative studies, however, results are in the form of test statistics and probability levels, to which researchers need to attach meaning. This sometime involves supplementary analyzes that were not originally planned. For example, if research findings are contrary to the hypotheses, other information in the data set sometimes can be examined to help researchers understand what the findings mean. In this section, we discuss the interpretation of various research outcomes within a hypothesis testing context.
Interpreting Hypothesized Results
Interpreting results is easiest when hypotheses are supported. Such an interpretation has been partly accomplished beforehand because researchers have already brought together prior findings, a theoretical framework, and logical reasoning in developing the hypotheses. This groundwork forms the context within which more specific interpretations are made. Naturally, researchers are gratified when the results of many hours of effort support their predictions.
There is a decided preference on the part of individual researchers, advisers, and journal reviewers for studies whose hypotheses have been supported. This preference is understandable, but it is important not to let personal preferences interfere with the critical appraisal appropriate to all interpretive situations. A few caveats should be kept in mind. First, it is best to be conservative in drawing conclusions from the data. It may be tempting to go beyond the data in developing explanations for what results mean, but this should be avoided.
Examples
An example might help to explain what we mean by “going beyond” the data. Suppose we hypothesized that pregnant women’s anxiety level about labor and delivery is correlated with the number of children they have already borne. The data reveal that a significant negative relationship between anxiety levels and parity (r .40) exists. We conclude that increased experience with childbirth results in decreased anxiety. Is this conclusion supported by the data? The conclusion appears to be logical, but in fact, there is nothing in the data that leads directly to this interpretation.
An important, indeed critical, research precept is correlation does not prove causation. The finding that two variables are related offers no evidence suggesting which of the two variables—if either—caused the other. In our example, perhaps causality runs in the opposite direction, that is, that a woman’s anxiety level influences how many children she bears. Or perhaps a third variable not examined in the study, such as the woman’s relationship with her husband, causes or influences both anxiety and number of children. Alternative explanations for the findings should always be considered and, if possible, tested directly.
Interpretations
If competing interpretations can be ruled out, so much the better, but every angle should be examined to see if one’s own explanation has been given adequate competition. Empirical evidence supporting research hypotheses never constitutes proof of their veracity. Hypothesis testing is probabilistic. There is always a possibility that observed relationships resulted from chance.
Researchers must be tentative about their results and about interpretations of them. In summary, even when the results are in line with expectations, researchers should draw conclusions with restraint and should give due consideration to limitations identified in assessing the accuracy of the results Interpreting Nonsignificant Results Failure to reject a null hypothesis is problematic from an interpretative point of view.
Statistical procedures are geared toward disconfirmation of the null hypothesis. Failure to reject a null hypothesis can occur for many reasons, and researchers do not know which one applies. The null hypothesis could actually be true, for example. The nonsignificant result, in this case, accurately reflects the absence of a relationship among research variables. On the other hand, the null hypothesis could be false, in which case a Type II error has been committed.
Errors and Hypothesis
Retaining a false null hypothesis can result from such problems as poor internal validity, an anomalous sample, a weak statistical procedure, unreliable measures, or too small a sample. Unless the researcher has special justification for attributing the nonsignificant findings to one of these factors, interpreting such results is tricky. We suspect that failure to reject null hypotheses is often a consequence of insufficient power, usually reflecting too small a sample size.
For this reason, conducting a power analysis can help researchers in interpreting nonsignificant results, as indicated earlier. In any event, researchers are never justified in interpreting a retained null hypothesis as proof of the absence of relationships among variables. Nonsignificant results provide no evidence of the truth or the falsity of the hypothesis. Thus, if the research hypothesis is that there are no group differences or no relationships, traditional hypothesis testing procedures will not permit the required inferences.
When significant results are not obtained, there may be a tendency to be overcritical of the research methods and under critical of the theory or reasoning on which hypotheses were based. This is understandable: It is easier to say, “My ideas were sound, I just didn’t use the right approach,” than to admit to faulty reasoning. It is important to look for and identify flaws in the research methods, but it is equally important to search for theoretical shortcomings. The result of such endeavors should be recommendations for how the methods, the theory, or an experimental intervention could be improved.
Interpreting Un-Hypothesized Significant Results
Unhypothesized significant results can occur in two situations. The first involves finding relationships that were not considered while designing the study. For example, in examining correlations among variables in the data set, we might notice that two variables that were not central to our research questions were significantly correlated and interesting. To interpret this finding, we would need to evaluate whether the relationship is real or spurious.
There may be information in the data set that sheds light on this issue, but we might also need to consult the literature to determine if other investigators have observed similar relationships. The second situation is more perplexing: obtaining results opposite to those hypothesized. For instance, we might hypothesize that individualized teaching about AIDS risks is more effective than group instruction, but the results might indicate that the group method was better. Or a positive relationship might be predicted between a nurse’s age and level of job satisfaction, but a negative relationship might be found. It is, of course, unethical to alter a hypothesis after the results are “in.”
Moral of Researcher
Some researchers see such situations as awkward or embarrassing, but there is little basis for such feelings. The purpose of research is not to corroborate researchers’ notions, but to arrive at truth and enhance understanding. There is no such thing as a study whose results “came out the wrong way,” if the “wrong way” is the truth. When significant findings are opposite to what was hypothesized, it is less likely that the methods are flawed than that the reasoning or theory is incorrect.
As always, the interpretation of the findings should involve comparisons with other research, a consideration of alternate theories, and a critical scrutiny of data collection and analysis procedures. The result of such an examination should be a tentative explanation for the unexpected findings, together with suggestions for how such explanations could be tested in other research projects.
Interpreting mixed results
Interpretation is often complicated by mixed results: Some hypotheses are supported by the data, whereas others are not. Or a hypothesis may be accepted when one measure of the dependent variable is used but rejected with a different measure. When only some results run counter to a theoretical position or conceptual scheme, the research methods are the first aspect of the study deserving critical scrutiny.
Differences in the validity and reliability of the various measures may account for such discrepancies, for example. On the other hand, mixed results may suggest that a theory needs to be qualified, or that certain constructs within the theory need to be re-conceptualized. Mixed results sometimes present opportunities for making conceptual advances because efforts to make sense of disparate pieces of evidence may lead to key breakthroughs.
Importance of the results
In quantitative studies, results that support the researcher’s hypotheses are described as significant. A careful analysis of study results involves an evaluation of whether, in addition to being statistically significant, they are important. Attaining statistical significance does not necessarily mean that the results are meaningful to nurses and their clients. Statistical significance indicates that the results were unlikely to be a function of chance.
This means that observed group differences or relationships were probably real, but not necessarily important. With large samples, even modest relationships are statistically significant. For instance, with a sample of 500, a correlation coefficient of .10 is significant at the .05 level, but a relationship this weak may have little practical value. Researchers must pay attention to the numerical values obtained in an analysis in addition to significance levels when assessing the importance of the findings.
Significance Value
Conversely, the absence of statistically significant results does not mean that the results are unimportant although because of the difficulty in interpreting nonsignificant results, the case is more complex. Suppose we compared two alternative procedures for making a clinical assessment (e.g., body temperature). Suppose further that we retained the null hypothesis, that is, found no statistically significant differences between the two methods.
If a power analysis revealed an extremely low probability of a Type II error (e.g., power .99, a 1% risk of a Type II error), we might be justified in concluding that the two procedures yield equally accurate assessments. If one of these procedures is more efficient or less painful than the other, nonsignificant findings could indeed be clinically important.
Generalizability of the results
Researchers should also assess the generalizability of their results. Researchers are rarely interested in discovering relationships among variables for a specific group of people at a specific point in time. The aim of research is typically to reveal relationships for broad groups of people. If a new nursing intervention is found to be successful, others will want to adopt it.
Therefore, an important interpretive question is whether the intervention will “work” or whether the relationships will “hold” in other settings, with other people. Part of the interpretive process involves asking the question, “To what groups, environments, and conditions can the results of the study reasonably be applied?”
Implications of the results
Once researchers have drawn conclusions about the credibility, meaning, importance, and generalizability of the results, they are in a good position to make recommendations for using and building on the study findings. They should consider the implications with respect to future research, theory development, and nursing practice. Study results are often used as a springboard for additional research, and researchers themselves often can readily recommend “next steps.” Armed with an understanding of the study’s limitations and strengths, researchers can pave the way for new studies that would avoid known pitfalls or capitalize on known strengths.
Moreover, researchers are in a good position to assess how a new study might move a topic area forward. Is a replication needed, and, if so, with what groups? If observed relationships are significant, what do we need to know next for the information to be maximally useful? For studies based on a theoretical or conceptual model, researchers should also consider the study’s theoretical implications.
Research results should be used to document support for the theory, suggest ways in which the theory ought to be modified, or discredit the theory as a useful approach for studying the topic under investigation. Finally, researchers should carefully consider the implications of the findings for nursing practice and nursing education. How do the results contribute to a base of evidence to improve nursing? Specific suggestions for implementing the results of the study in a real nursing context are extremely valuable in the utilization process,
Conclusion
- Researchers who collect quantitative data typically progress through a series of steps in the analysis and interpretation of their data. The careful researcher lays out a data analysis plan in advance to guide that progress.
- Quantitative data must be converted to a form amenable to computer analysis through coding, which typically transforms all research data into numbers. Special codes need to be developed to code missing values.
- Researchers typically document decisions about coding, variable naming, and variable location in a codebook.
Data entry is an error-prone process that requires verification and cleaning. Cleaning involves
(1) a check for outliers (values that lie outside the normal range of values) and wild codes (codes that are not legitimate)
(2) consistency checks.
- An important early task in analyzing data involves taking steps to evaluate and address missing data problems. These steps include deleting cases with missing values (i.e., listwise deletion), deleting variables with missing values, substitution of mean values, estimation of missing values, and selective pairwise deletion of cases. Researchers strive to achieve a rectangular matrix of data (valid information on all variables for all cases), and these strategies help researchers to attain this goal.
- Raw data entered directly onto a computer file often need to be transformed for analysis. Examples of data transformations include reversing of the coding of items, combining individual variables to form composite scales, recoding the values of a variable, altering data for the purpose of meeting statistical assumptions, and creating dichotomous dummy variables for multivariate analyses.
Steps In Evaluation
- Before the main analyzes can proceed, researchers usually undertake additional steps to assess data quality and to maximize the value of the data. These steps include evaluating the reliability of measures, examining the distribution of values on key variables for any anomalies, and analyzing the magnitude and direction of any biases.
- Sometimes peripheral analyzes involve tests to determine whether pooling of subjects is war ranted, tests for cohort effects or ordering effects, and manipulation checks.
- Once the data are fully prepared for substantive analysis, researchers should develop a formal analysis plan to reduce the temptation to go on a “fishing expedition.” One approach is to develop table shells, that is, fully laid-out tables without any numbers in them.
The interpretation of research findings typically involves five subtasks:
(1) analyzing the credibility of the results
(2) searching for underlying meaning
(3) considering the importance of the results
(4) analyzing the generalizability of the findings
(5) assessing the implications of the study regarding future research, theory development, and nursing practice.