Issues Related To Testing, Grading and Other Evaluation Concepts
Social, Ethical and Legal Issues
Educational testing and assessment have grown in use and importance for students in general and nursing students in particular over the last decade. One only has to read the newspapers and watch television to appreciate the prevalence of testing and assessment in contemporary American society.
With policies such as No Child Left Behind, mandatory high school graduation tests in some states, and the emphasis on standardized achievement tests in many schools, testing and assessment have taken a prominent role in the educational system. From the moment of birth, when we are weighed, measured, and rated according to the Apgar scale, throughout all of our educational and work experiences, and even in our personal and social lives, we are used to being tested and evaluated.
In addition, nursing and other professional disciplines have come under increasing public pressure to be accountable for the quality of educational programs and the competency of their practitioners; thus testing and assessment often are used to provide evidence of quality and competence. With the increasing use of assessment and testing come intensified interest and concern about fairness, appropriateness, and impact. This topic discusses selected social, ethical, and legal issues related to testing and assessment practices in nursing education.
Social Issues
Testing has tremendous social impact because test scores can have positive and negative consequences for individuals. Tests can provide information to assist in decision making; Some of these decisions have more importance to society and to individuals than other decisions. The license of drivers is a good example. Written and performance tests provide information for deciding who may drive a vehicle.
Society has a vested interest in the outcome because a bad decision can affect the safety of a great many people. Licensure to drive a vehicle also may be an important issue to an individual; Some jobs require the employee to drive a car or truck, so a person who lacks a valid operator’s license will not have access to these employment opportunities. Tests also are used to help place individuals into occupational roles.
These placement decisions have important implications because a person’s occupation to some extent determines status and economic and political power. Because modern society depends heavily on scientific knowledge and technical competence, occupational role selection is based on a significant degree on what individuals know and can do. Therefore, by controlling who enters certain educational programs, institutions have a role in determining the possible career path of an individual.
The way in which schools should select candidates for occupational roles is a matter of controversy, however. Some individuals and groups hold the view that schools should provide equal opportunity and access to educational programs. Others believe that equal opportunity is not sufficient to allow some groups of people to overcome discrimination and oppression that has handicapped their ability and opportunity.
Decisions about which individuals should be admitted to a nursing education program are important because of the nursing profession’s commitment to the good of society and to the health and welfare of current and future patients (American Nurses Association, 2003). Nursing faculties must select individuals for admission to nursing programs who are likely to practice nursing competently and safely; Tests frequently are used to assist educators in selecting candidates for admission.
Improper use of testing or the misinterpretation of test scores can result in two types of poor admission decisions. If an individual is selected who is later found to be incompetent to practice nursing safely, the public might be at risk; if an individual who would be competent to practice nursing is not admitted, that individual is denied access to an occupational role. The use of testing in employment situations and for the purpose of professional certification can produce similar results.
Employers have a stake in making these decisions because they are responsible for ensuring the competence of their employees. Tests for employment, to ensure competencies at the end of orientation, and to certify continuing knowledge and skills are important not only to the employee but also to the employer. Through assessments such as these, the employer certifies that the individual is competent for the role.
Selection decisions therefore have social implications for individuals, institutions, and society as a whole. Although educational and occupational uses of testing are growing in frequency and importance, the public often expresses concerns about testing. Some of these concerns are rational and relevant; others are unjustified.
Test Bias
One common concern is that tests are biased or unfair to certain groups of test-takers. A major purpose of testing is to discriminate among people, that is, to identify important differences among them with regard to their knowledge, skills, or attitudes. To the extent that differences in scores represent real differences in achievement of objectives, this discrimination is not necessarily unfair.
Bias can occur, however, when scores from an assessment are misinterpreted, or conclusions are drawn about performance that goes well beyond the assessment. For example, if a test is found to discriminate between men and women on variables that are not relevant to educational or occupational success, it would be unfair to use that test to select applicants for admission to a program or for a job.
Thus, the question of test bias really is one of validity measurement, the degree to which inferences about test results are justifiable in relation to the purpose and intended use of the test (Miller, Linn, & Gronlund , 2009; Nitko & Brookhart , 2007 ). Test bias also has been defined as the differential validity of a test score for a group of test-takers. With test bias, a given score does not have the same meaning for all students who took that test.
The teacher may interpret a low test score to mean inadequate knowledge of the content, but there may be a relevant subgroup of individuals, for example , students with learning disabilities, for whom that score interpretation is not accurate. The test score may be low for a student with a learning disability because he or she did not have enough time to complete the exam, not because of a lack of knowledge about the content.
Individual test items also can discriminate against subgroups of test-takers, such as students from ethnic minority groups; this is termed differential item functioning ( Wessling , 2003). Test items are considered to function differentially when students of different subgroups but of equal ability, as evidenced by equal total test scores, perform differently on the item.
Item bias exists in two forms, cultural bias and linguistic/structural bias ( Boscher , 2003). A culturally biased item contains references to a particular culture and is more likely to be answered incorrectly by students from a minority group. An example of a culturally biased test item follows:
1. While discussing her health patterns with the nurse, a patient says that she enjoys all of the following leisure activities. Which one is an aerobic activity?
a. Attending ballet performances
b. Cultivating house plants
c. Line dancing
d. Singing in the church choir
The correct answer is “line dancing,” but students for whom English is a second language (ESL), students from cultural minority groups, and even domestic students from certain regions of the country may be unfamiliar with this term and Therefore may not select this response. In this case, an incorrect response may mean that the student is unfamiliar with this type of dancing, not that the student is unable to differentiate between aerobic and nonaerobic activities.
As discussed cultural bias of this type contributes to construct-irrelevant variance that can reduce measurement validity (Boscher & Bowles, 2008; Miller et al., 2009). The Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, & National Council on Measurement in Education, 1999) specify that test developers should “reduce threats to the reliability and validity of test score inferences that may arise from language differences ” (p. 97).
Careful peer review of test items for discernible bias allows the teacher to reword items to remove references to American or English literature, music, art, history, customs, or regional terminology that are not essential to the nursing content being tested. The inclusion of jokes, puns, and other forms of humor also may contribute to cultural bias because these forms of expression may not be interpreted correctly by ESL students.
It is appropriate, however, to include content related to cultural differences that are essential to safe nursing practice. Students and graduate nurses must be culturally competent if they are to meet the needs of patients from a variety of cultures. A test item with linguistic/structural bias is poorly written. It may be lengthy, unclear, or awkwardly worded, interfering with the student’s understanding of the teacher’s intent (Boscher , 2003).
Structurally biased items create problems for all students, but they are more likely to discriminate against ESL students or those with learning disabilities. Additionally, students from minority cultures may be less likely than dominant-culture students to ask the test proctor to clarify a poorly written item, usually because it is inappropriate to question a teacher in certain cultures.
Following the general rules for writing test items in this topic will help the teacher to avoid structural bias. An assessment practice that helps to protect students from potential bias is anonymous or blinded scoring and grading. The importance of essay scoring items and written assignments anonymously was discussed earlier in the topic. Anonymous grading also can be used for an entire course.
The process is similar to that of peer review of manuscripts and grant proposals: the teacher is unaware of the student’s identity until the end of the course. Students choose a number or are randomly assigned an anonymous grading system number at the beginning of a course. That number is recorded on every test, quiz, written assignment, and other assessments during the semester, and scores are recorded according to these code numbers.
The teacher does not know the identity of the students until the end of the course. This method of grading prevents the influence of a teacher’s previous impressions of a student on the scoring of a test or written assignment.
Grade and Test Score Inflation
Another common criticism of testing concerns the general trend toward inflation of test scores and grades at all educational levels. Scanlan and Care (2004, 2008) found that grade inflation occurred throughout their university but more so in their nursing program, and that inflated clinical practice grades give students an unrealistic perspective of their ability to practice nursing safely.
Grade inflation distorts the meaning of test scores, making it difficult for teachers to use them wisely in decision making. If an A is intended to represent exceptional or superior performance, then all students cannot earn A’s because if everyone is exceptional, then no one is.
With grade inflation all grades are compressed near the top, which makes it difficult to discriminate among students (Scanlan & Care; Walsh & Seldomridge , 2005). When there is no distribution of scores or grades, there is little value in testing. Most faculty members believe that grade inflation exists, but that their own assessment methods do not contribute to it (Scanlan & Care, 2008). Issues common to the problem of grade inflation include:
■ students ‘ expectations related to the belief that they are consumers of the educational program;
■ institutional policies related to late course withdrawal dates and mandatory faculty evaluation;
■ increase in number of older students who bring more life experiences to the nursing education program and approach learning activities with more focus;
■ faculty beliefs about the effect of grading on student self-esteem, what constitutes satisfactory performance, and the subjective nature of grading;
■ clinical grading issues; and
■ the increasing use of part-time faculty members in nursing education programs (Scanlan & Care, 2008).
The relationship between the last two factors is especially relevant in nursing education. Most part-time faculty members teach in the clinical area, and many are skilled assistants with little or no formal academic preparation for the role of educator. Nursing faculty members are reluctant to assign failing grades in clinical courses, giving students the benefit of the doubt especially in beginning courses.
This belief is easily communicated to part-time faculty members, who may have additional concerns about their job security because most of them are hired on limited-term contracts. Where student evaluation of faculty members is mandatory, part-time teachers may be unwilling to assign lower clinical grades because of possible repercussions related to continued employment in that role (Scanlan & Care, 2008).
Additionally, grading discrepancies between theory and related clinical courses occur frequently. Scanlan and Care (2004) found a wide discrepancy between grades awarded in theory courses and grades in clinical courses. Especially in nursing education programs where clinical practice is assigned a letter grade (instead of a pass–fail or similar grading system), higher clinical grades tend to inflate the overall grade point average.
This discrepancy is difficult to explain or defend on the basis of the assumption that theory informs clinical practice; why would a student with a grade of C in a theory course be likely to earn an A grade in the corresponding clinical course? Clinical grade inflation of this sort may result in more students with marginal ability “slipping through the cracks” and failing the final clinical of the nursing education program, or graduating only to fail the NCLEX (Scanlan & Care, 2008).
Clinical grading also may be governed by the “rule of C,” where the D grade is virtually eliminated as a grading option because of program policies that require a minimum grade of C to pass a clinical course. As previously mentioned, faculty members who are reluctant to assign failing grades to students then may award C grades to students with marginal performance, and the B grade becomes the symbol for average or acceptable performance.
This grade compression (only three grade levels instead of five) contributes to grade inflation (Walsh & Seldomridge , 2005). Another factor contributing to grade inflation is the increasing pressure of accountability for educational outcomes. When the effectiveness of a teacher’s instruction is judged on the basis of students’ test performance, the teacher may “teach to the test.”
Teaching to the test may involve using actual test items as practice exercises, distributing copies of a previously used test for review and then using the same test, or focusing exclusively on test content in teaching. Because regulatory and accreditation standards for nursing education programs commonly include expectations of an acceptable first time NCLEX pass rate for graduates each year, and the quality of graduate nursing programs is judged by graduates’ pass rates on certification exams, these test results have significant implications for the educational institutions as well as the individual test-takers.
When faculty members and educational programs are judged by how well their graduates perform on these high-stakes assessments, “direct preparation for the tests and assessments is likely to enter into classroom activities and thereby distort the curriculum” (Miller et al., 2009, p.14).
It is important, however, to distinguish between teaching to the test and purposeful teaching of content to be sampled by the test and the practice of relevant test-taking skills. However, nursing faculty members who understand the NCLEX test plan and ensure that their nursing curricula include content and learning activities that will enable students to be successful on the NCLEX are not taught to the test.
Effect of Tests and Grades on Self-Esteem
Some critics of tests claim that testing results in emotional or psychological harm to students. The concern is that tests threaten students and make them anxious, fearful, and discouraged, resulting in harm to their self-esteem. There is no empirical evidence to support these claims. Feelings of anxiety about an upcoming test are both normal and helpful to the extent that they motivate students to prepare thoroughly so as to demonstrate their best performance.
Because testing is a common life event, learning how to cope with these challenges is a necessary part of student development. Nitko and Brookhart (2007) identified three types of test-anxious students:
(a) students who have poor study skills and become anxious prior to a test because they do not understand the content that will be tested
(b) students who have good study skills and understand the content but fear they will do poorly no matter how much they prepare for the exam
(c) students who believe that they have good study skills but in essence do not
If teachers can identify why students are anxious about testing, they can direct them to specific resources such as those on study skills, test-taking strategies, and techniques to reduce their stress. Most nursing students will benefit from developing good test-taking skills, particularly learners who are anxious. For example, students should be told to follow the directions time during the test, answer easy items first, and check their answers (Kubiszyn & Borich , 2003).
Arranging the test with the easy items first often helps relieve anxiety as students begin the test. Because highly anxious students are easily distracted (Nitko & Brookhart , 2007), the teacher should ensure quiet during the testing session. Goonan (2003) provided general guidelines for the teacher to intervene with students who have test anxiety:
- Identify the problem to be certain it is test anxiety and not a learning disability or a problem such as depression.
- Encourage more than the usual test preparation.
- Encourage the student to develop study skills (eg, outlining material) and good study habits (eg, how to organize the material to learn it and how to manage time).
- Guide the student to outside resources as needed.
- Suggest desensitization strategies such as taking timed practice tests and relaxation techniques.
Although it is probably true that a certain level of self-esteem is necessary before a student will attempt the challenges associated with nursing education, high self-esteem is not essential to perform well on a test. In fact, when students are able to perform at their best, their self-esteem is enhanced.
An important part of a teacher’s role is to prepare students to do well on tests by helping them improve their study and test-taking skills and to learn to manage their anxiety. Carefully, read the item stems and questions without rushing to avoid misreading critical information, read each option for multiple-choice items before choosing one, and manage your anxiety.
Testing as a Means of Social Control
All societies sanction some form of social control of behavior; Some teachers use the threat of tests and the implied threat of low test grades to control student behavior. In an attempt to motivate students to prepare for and attend class, a teacher may decide to give unannounced tests; the student who is absent that day will earn a score of zero, and the student who does not do the assigned readings will likely earn a low score.
This practice is unfair to students because they need sufficient time to prepare for a test to demonstrate their maximum performance. Using tests in a punitive, threatening, or vindictive way is unethical (Nitko & Brookhart , 2007).
Ethical Issues
Ethical standards make it possible for nurses and patients to achieve understanding of and respect for each other (Husted & Husted, 2007). These standards should also govern the relationships of teachers and students. Contemporary bioethical standards include those of autonomy, freedom, veracity, privacy, beneficence, no maleficence, and fidelity. Several of these standards are discussed here as they apply to common issues in testing and evaluation.
The standards of privacy, autonomy, and veracity relate to the ownership and security of tests and test results. Some of the questions that have been raised are: Who owns the test? Who owns the test results? Who has or should have access to the test results?
Should test takers have access to standardized test items and their own responses? Because educational institutions and employers started using standardized tests to make decisions about admission and employment, the public has been concerned about the potential discriminatory use of test results.
The result of this public concern was the passage of federal and state “Truth in Testing” laws, requiring greater access to tests and test results. Some of these laws require publishers of standardized tests to supply copies of the test, the answer key, and the test-taker’s own responses on request, allowing the student to verify the accuracy of the test score.
Test-takers have the right to expect that certain information about them will be held in confidence. Teachers, therefore, have an obligation to maintain a privacy standard regarding students’ test scores. Such practices as public posting of test scores and grades should be examined in light of this privacy standard. Teachers should not post assessment results if individual students’ identities can be linked with their results; for this reason, many educational programs do not allow scores to be posted with student names or identification numbers.
During posttest discussions, teachers should not ask students to raise their hands to indicate if they answered an item correctly or incorrectly; this practice can be considered an invasion of students’ privacy (Nitko & Brookhart, 2007). An additional privacy concern relates to the practice of keeping student records that include test scores and other assessment results.
Questions often arise about who should have access to these files and the information they contain. Access to a student’s test scores and other assessment results is limited by laws such as the Family Educational Rights and Privacy Act of 1974 (FERPA). This federal law gives students certain rights with respect to their educational records. For example, they can review their education records maintained by the school and request that the school correct records they believe to be inaccurate or misleading.
Schools must have written permission from the student to release information from the student’s record except in selected situations such as accreditation or for program assessment purposes (US Department of Education, nd ). The FERPA limits access to a student’s records to those who have legitimate rights to the information to meet the educational needs of the student.
This law also specifies that a student’s assessment results may not be transferred to another institution without written authorization from the student. In addition to these limits on access to student records, teachers should assure that the information in the records is accurate and should correct errors when they are discovered. Files should be purged of anecdotal material when this information is no longer needed (Nitko & Brookhart , 2007).
Another way to violate students’ privacy is to share confidential information about their assessment results with other teachers. To a certain extent, a teacher should communicate information about a student’s strengths and weaknesses to other teachers to help them meet that student’s learning needs. In most cases, however, this information can be communicated through student records to which other teachers have legitimate access.
Informal conversations about students, especially if those conversations center on the teacher’s impressions and judgments rather than on verifiable data such as test scores, can be constructed as gossip. Test results sometimes are used for research and program evaluation purposes. As long as students’ identities are not revealed, their scores usually can be used for these purposes (Nitko & Brookhart , 2007).
One way to assure that this use of test results is ethical is to announce to the students when they enter an educational program that test results occasionally will be used to assess program effectiveness. Students may be asked for their informed consent for their scores to be used, or their consent may be implied by their voluntary participation in optional program evaluation activities.
For example, if a questionnaire about student satisfaction with the program is distributed or emailed to students, those who wish to participate simply complete the questionnaire and return it; no written consent form is required. In many institutions of higher education, however, this use of test results may require review by the Institutional Review Board. The ethical principle of fidelity requires faithfulness in relationships and matters of trust (Bosek & Savage, 2007; Husted & Husted, 2007).
In nursing education programs, adherence to this principle requires that faculty members act in the best interest of students. By virtue of their education, experience, and academic position, faculty members hold power over their students. They have the ability to influence students’ progress through the nursing education program and their ability to gain employment after graduation.
Violations of professional boundaries may occur and affect students’ ability to trust faculty members. Teachers who have personal relationships with students may be accused of awarding grades based on favoritism, or conversely, may be accused of using failing grades to retaliate against students who rebuff a sexual or emotional advance (Bosek & Savage).
Standards for Ethical Testing Practice
Several codes of ethical conduct in using tests and other assessments have been published by professional associations. These include the Code of Fair Testing Practices in Education (Joint Committee on Testing Practices, 2004) and the Code of Professional Responsibilities in Educational Measurement (National Council on Measurement in Education [NCME], 1995). These are reproduced in Appendices A and B.
The Standards for Educational and Psychological Testing (American Educational Research Association, American Psychological Association, & NCME, 1999) describe standards for test construction, administration, scoring, and reporting; supporting documentation for tests; fairness in testing; and a range of testing applications. The Standards also address testing individuals with disabilities and different linguistic backgrounds. Common elements of these codes and standards are:
■ Teachers are responsible for the quality of the tests they develop and for selecting tests that are appropriate for the intended use.
■ Test administration procedures must be fair to all students and protect their safety, health, and welfare.
■ Teachers are responsible for the accurate scoring of tests and reporting test results to students in a timely manner.
■ Students should receive prompt and meaningful feedback.
■ Test results should be interpreted and used in valid ways.
■ Teachers also must communicate test results accurately and anticipate the consequences of using results to minimize negative results to students (Nitko & Brookhart , 2007).
Legal Aspects Of Evaluation
It is beyond the scope of this topic to interpret laws that affect the use of tests and other assessments, and the authors are not qualified to give legal advice to teachers concerning their evaluation practices. However, it is appropriate to discuss a few legal issues to provide guidance to teachers in using tests. A number of issues have been raised in the courts by students claiming violations of their rights by testing programs.
These issues include race or gender discrimination, violation of due process, unfairness of particular tests, various psychometric aspects such as measurement validity and reliability, and accommodations for students with disabilities (Nitko & Brookhart , 2007).
Evaluation of Students With Disabilities
The Americans with Disabilities Act (ADA) of 1990 has influenced testing and evaluation practices in nursing education and employment settings. This law prohibits discrimination against qualified individuals with disabilities. A qualified individual with a disability is defined as a person with a physical or mental impairment that substantially limits major life activities.
Qualified individuals with disabilities meet the requirements for admission to and participation in a nursing program. Nursing education programs have a legal and an ethical obligation to accept and educate qualified individuals with disabilities (Carroll, 2004). It is up to the nursing education program to provide reasonable accommodations, additional services and aids as needed, and removal of barriers (Carroll).
This does not mean that institutions lower their standards to comply with the ADA. The ADA requires teachers to make reasonable accommodations for disabled students to assess them properly. Such accommodations may include oral testing, computer testing, modified answer format, extended time for exams, test readers or sign language interpreters, a private testing area, or the use of large type for printed tests (Nitko & Brookhart , 2007).
NCLEX policies permit test-takers with documented learning disabilities to have extended testing time as well as other reasonable accommodations, if approved by the board of nursing in the states in which they apply for initial licensure (National Council of State Boards of Nursing, 2008 ). This approval is usually granted only when the educational institution has verified the documentation of a disability and students’ use of accommodations during the nursing education program.
Because English language proficiency is required for competent nursing practice in the United States of America, persons who speak English as a second language are not considered to be qualified persons with disabilities. A number of concerns have been raised regarding the provision of reasonable testing accommodations for students with disabilities.
One issue is the validity of the test result interpretations if the test was administered under standard conditions for one group of students and under accommodating conditions for other students. The privacy rights of students with disabilities are another issue: Should the use of accommodating conditions be noted along with the student’s test score?
Such a notation would identify the student as disabled to anyone who had access to the record. There are no easy answers to such questions. In general, faculty members should be guided by accommodation policies developed by their institution and have any additional policies reviewed by legal counsel to ensure compliance with the ADA.
Conclusion
Educational testing and assessment are growing in use and importance for society in general and for nursing in particular. Nursing has come under increasing public pressure to be accountable for the quality of educational programs and the competency of its practitioners, and testing and assessment often are used to provide evidence of quality and competence. With the increasing use of assessment and testing come intensified interest in and concern about fairness, appropriateness, and impact.
The social impact of testing can have positive and negative consequences for individuals. Tests can provide information to assist in decision making, such as selecting individuals for admission to education programs or for employment. The way in which selection decisions are made can be a matter of controversy, however, regarding equality of opportunity and access to educational programs and jobs.
The public often expresses concerns about testing. Common criticisms of tests include: tests are biased or unfair to some groups of test takers; test scores have little meaning because of grade inflation; testing causes emotional or psychological harm to students; and tests are sometimes used in a punitive, threatening, or vindictive way. By understanding and applying codes for the responsible and ethical use of tests, teachers can assure the proper use of assessment procedures and the valid interpretation of test results.
Teachers must be responsible for the quality of the tests they develop and for selecting tests that are appropriate for the intended use. The Americans with Disabilities Act of 1990 has implications for the proper assessment of students with physical and mental disabilities. This law requires educational programs to make reasonable testing accommodations for qualified individuals with learning as well as physical disabilities.