Quantitative Data Analysis and Strategies. Designing and Implementing a Quantitative Analysis Strategy ,Phases In the Analysis of Quantitative Data, Pre Analysis Phase, Coding Quantitative Data, Pre coded data, Code Uncategorized Data ,Code missing values, Entering, Verifying, and Cleaning Data
The successful analysis of quantitative data requires careful planning and attention to detail. This chapter provides an overview of steps that are normally taken in designing and implementing a data analysis plan. The final phase of data analysis, interpreting the results, also is discussed.
Phases In the Analysis of Quantitative Data
The data analysis process varies from one project to another. With small, simple sets of data, researchers may be able to proceed quickly from data collection to data analysis. In most cases, however, intermediate steps are necessary. Progress in analyzing quantitative data is not always as linear as this figure suggests, but the figure provides a framework for discussing various steps in the analytical process.
Pre-Analysis Phase
The first set of steps, as the Pre-Analysis Phase, involves various clerical and administrative tasks. These might include log data in and maintaining appropriate administrative records, reviewing data forms for completeness and legibility, taking steps to retrieve pieces of missing information, and assigning identification (ID) numbers.
Another task involves selecting a statistical software package for doing the data analyses. Once these tasks have been performed, researchers typically must code the data and enter them onto computer files to create a data set (the total collection of data for all sample members) for analysis.
Coding Quantitative Data
Computers usually cannot process data in the form they are collected. Coding is the process of transforming data into symbols compatible with computer analysis. Coding Inherently Quantitative Variables Certain variables are inherently quantitative (e.g., age, body temperature) and do not normally require coding. Researchers may, however, ask for information of this type in a way that does call for some coding.
If a researcher asks respondents to indicate whether they are younger than 30 years of age, between the ages of 30 and 49 years, or 50 years or older, then responses would have to be coded. When responses to questions such as age, height, and so forth are obtained in their full form, the information should not be reduced to coded categories for data entry purposes; this can be accomplished later, if desired. Even with “naturally” quantitative data, researchers should inspect and edit their data.
All responses should be of the same form and precision. For example, in entering a person’s height onto a computer file, researchers would need to decide whether to use feet and inches or to convert the information entirely to inches. Whichever method is adopted, it must be used consistently for all subjects. There must also be consistency in the method of handling information reported with different degrees of precision (e.g., coding a response such as 5 feet 21 ⁄2 inches).
Pre-Coded Data
Most data from structured instruments can be pre-coded (i.e., codes assigned even before data are collected). For example, closed-ended questions with fixed response alternatives can be preassigned a numeric code, as in the following: From what type of program did you receive your basic nursing preparation?
1. Diploma school
2. Associate degree program
3. Baccalaureate degree program
Nurses who received their nursing preparation from a diploma school would be coded 1 for this variable, and so on. Codes are often arbitrary, as in the case of a variable such as gender. Whether a female subject is coded 1 or 2 has no bearing on subsequent analyzes as long as female subjects are consistently assigned one code and male subjects another. Other variables such as ordinal-level variables, have a less arbitrary coding scheme, as in the following example: How often do you take a nap?
1. Almost never 2. Once or twice a year 3. Three to 11 times a year 4. Once a month or more often
Respondents sometimes can check off more than one response to a question, as in the following: To which of the following journals do you subscribe?
Applied Nursing Research
Canadian Journal of Nursing Research
Clinical Nursing Research ( ) Nursing Research
Qualitative health research
Research in Nursing & Health
Western Journal of Nursing Research
With questions of this type, it is not appropriate to adopt a 1-2-3-4-5-6-7 code because respondents may check several, or none, of the responses.
The correct procedure for such questions is to treat each journal separately. In other words, researchers would code responses as though the item were seven separate questions, that is, “Do you subscribe to Applied Nursing Research? Do you subscribe to the Canadian Journal of Nursing Research?” and so on. A check mark beside a journal would be treated as a “yes.” The question would yield seven dichotomous variables, with one code (eg., 1) signifying “yes” and another code (eg, 2) signifying “no.”
Code Uncategorized Data
Qualitative data from open-ended questions, unstructured observations, and other narrative forms must be coded if they are going to be used in quantitative analysis. Sometimes researchers can develop codes for such variables ahead of time. For instance, a question might ask, “What is your occupation?” In this case, it might be possible to predict major job categories (e.g, professional, managerial, clerical
Usually, however, unstructured data are collected specifically because it is difficult to anticipate the kind of information that will be obtained. In such situations, codes are developed after the data are collected. Researchers typically begin by reviewing a sizable portion of the data to get a feel for the content, and then develop a category scheme. The scheme should reflect both theoretical and analytical goals as well as the substance of the information.
The amount of detail in the category scheme may vary, but too much detail is usually better than too little detail. In developing such a coding scheme, the only rule is that the categories should be both mutually exclusive and collectively exhaustive. Precise coding instructions should be developed and documented in a coding manual. Coders, like observers and interviewers, must be properly trained. Intercoder reliability checks are strongly recommended.
Code missing values
A code should be assigned to each variable for every sample member, even if no response is available. Missing values can be of various types. A person responding to an interview question may be undecided, refuse to answer, or say, “Don’t know.” When skip patterns are used, there is missing information for those questions that are irrelevant to some sample members.
In observational studies, an observer may get distracted during a 10-second sampling frame, may be undecided about an appropriate categorization, or may observe behavior not listed on the observation schedule. It is sometimes important to distinguish between various types of missing data with different codes (e.g., distinguishing refusals and “don’t knows”). In other cases, a single missing values code may suffice. This decision must be made with the conceptual aims of the research in mind.
Researchers often strive to code missing data in a similar fashion for all or most variables. If a nonresponse is coded as a 4 on variable 1, a 6 on variable 2, a 5 on variable 3, and so forth, there is a greater risk of error than if a uniform code is adopted. The choice of what code to use for missing data is fairly arbitrary, but numeric codes must be ones that have not been used for actual pieces of information.
Many researchers follow the convention of coding missing data as 9 because this value is out of the range of real codes for most variables. Others use blanks, periods, or negative values to indicate missing information.
Entering, Verifying, and Cleaning Data
Coded data must be entered onto a computer file for analysis, and then verified and cleaned. This section provides an overview of these procedures, but technological advances make it inevitable that the information we provide will need to be updated.