Test Planning and General Rules for Writing Tests Items
Writing the Test Items
After developing the test blueprint, the teacher should begin to write the test items that correspond to each cell. Regardless of the selected item formats, the teacher should consider some general factors that contribute to the quality of the test items.
General Rules for Writing Test Items
1.Every item should measure something important
If a test blueprint is designed and used as described in the previous section, each test item will measure an important objective or content area. Without using a blueprint, teachers often write test items that test trivial or obscure knowledge. Sometimes the teacher’s intent is to determine whether the students have read assigned materials; However, if the content is not important information, it wastes the teacher’s time to write the item and wastes the students’ time to read it and respond to it.
Similarly, it is not necessary to write “filler” items to meet a targeted number; a test with 98 well-written items that measure important objectives will work as well as or better than one with 98 good items and 2 meaningless ones. Although the reliability of test results is related to the length of the assessment, this rule presumes that items added to a test to increase the number of tasks would be of the same quality as those that are already part of the test.
Adding items that are so easy that every student will answer the questions correctly, or so difficult that every student will answer them incorrectly, will not improve the reliability estimate (Miller et al., 2009). In fact, students who know the content might well regard a test item that measures trivial knowledge with annoyance or even suspicion, believing that it is meant to trick them into answering incorrectly. There is no reason other than ease of mentally calculating a percentage score for setting an absolute target number of points on a test at 100.
2. Every item should have a correct answer
The correct answer should be one that would be agreed on by experts (Miller et al., 2009). This may seem obvious, but the rule is frequently violated because of the teacher’s failure to make a distinction between fact and belief. In some cases, the correct or best answer to a test item might be a matter of opinion, and unless a particular authority is cited in the item, students might justifiably argue a different response than the one the teacher expected.
For example, one answer to the question, “When does life begin?” might be “When the kids leave home and the dog dies.” If the intent of the question was to measure understanding of when a fetus becomes viable, this is not the correct answer, although if the latter was the teacher’s intent, the question was poorly worded.
There are a variety of opinions and beliefs about the concept of viability; a better way to word this question is, “According to the standards of the American College of Obstetricians and Gynecologists, at what gestational age does a fetus become viable?” If a test item asks the student to state an opinion about an issue and to support that position with evidence, that is a different matter.
That type of item should not be scored as correct or incorrect, but with variable credit based on the completeness of the response, rationale given for the position taken, or the soundness of the student’s reasoning ( Nitko & Brookhart, 2007).
3. Use simple, clear, concise, precise, grammatically correct language
Students who read the test item need to know exactly what task is required of them. Wording a test item clearly is often difficult because of the inherent abstractness and imprecision of language, and it is a challenge to use simple words and sentence structure when writing about highly technical and complex material.
The teacher should include enough detail in the test item to communicate the intent of the item but without extraneous words or complex syntax that only serve to increase the reading time. Additionally, grammatical errors may provide unintentional clues to the correct response for the test wise but unprepared student and, at best, annoy the well-prepared student.
This rule is particularly important when testing students for whom English is a second language or non-native speakers (NNSs). Bosher and Bowles (2008) found that in a majority of cases, linguistic modification of test items improved NNSs’ comprehension of nursing exam items. The process of linguistic modification or simplification maintains key content area vocabulary but reduces the semantic and syntactic complexity of written English.
Linguistic structures such as passive voice constructions, long question phrases, conditional and subordinate clauses, negation, and grammatical errors are particularly difficult for NNSs to understand, and they require more time to read and process (Bosher & Bowles).
Although arguments might be made that no accommodation is made for NNSs on the NCLEX, consideration of measurement validity must take into account that any test that employs language is at least partially a measure of language skills (American Educational Research Association, 1999; Miller et al., 2009).
The following item stem, adapted from an example given by Bosher and Bowles (2008), illustrates the effect of linguistic simplification: Original stem: A patient with chronic pain treated over a period of months with an oral form of morphine tells you that she is concerned because she has had to gradually increase the amount of medication she takes to achieve pain control.
Your response should include:
Linguistically simplified stem: A patient has chronic pain. She is treated over a period of months with an oral form of morphine. She tells the nurse that she is concerned because she has gradually needed more medication to achieve the same level of pain control. How should the nurse respond? (Bosher & Bowles, p. 168).
Note that the same content is emphasized, but that the revised example contains four short simple sentences and ends with a question to be answered rather than a completion format. Given growing concerns that even native English speakers are entering postsecondary programs with poorer reading skills, such linguistic modification should benefit all students.
4. Avoid using jargon, slang, or unnecessary abbreviations
Health care professionals frequently use jargon, abbreviations, and acronyms in their practice environment; in some ways, it allows them to communicate more quickly, if not more effectively, with others who understand the same language. Informal language in a test item, however, may fail to communicate the intent of the item accurately. Because most students are somewhat anxious when taking tests, they may fail to interpret an abbreviation correctly for the context in which it is used.
For example, does MI mean myocardial infarction, mitral insufficiency, or Michigan? Of course, if the intent of the test item is to measure students’ ability to define commonly used abbreviations, it would be appropriate to use the abbreviation in the item and ask for the definition, or give the definition and ask the student to supply the abbreviation.
Slang almost always conveys the impression that the item-writer does not take the job seriously. As noted previously, slang, jargon, abbreviations, and acronyms contribute to linguistic complexity especially for NNSs. Additionally, growing alarm about health care errors attributed to poor communication, including the overuse of abbreviations, suggests that nurse educators should set positive examples for their students by using only abbreviations generally approved for use in clinical settings.
5. Try to use positive wording
It is difficult to explain this rule without using negative wording, but in general, avoid including words like no, not, and except in the test item. As noted previously, negation contributes to linguistic complexity that interferes with the test performance of NNSs.
The use of negative wording is especially confusing in true–false items. If using a negative form is unavoidable, underline the negative word or phrase, or use bold text and all uppercase letters to draw the student’s attention to it. It is best to avoid asking students to identify the incorrect response, as in the following example:
1.Which of the following is NOT an indication that a skin lesion is a Stage IV pressure ulcer?
- Blistering*
- Sinus tracts
- Tissue necrosis
- Undermining
The structure of this item reinforces the wrong answer and may lead to confusion when a student attempts to recall the correct information at a later time. A better way to word the item is:
2. Which of the following is an indication that a skin lesion is a Stage II pressure ulcer?
- Blistering
- Sinus tracts
- Tissue necrosis
- Undermining
6. No item should contain irrelevant clues to the correct answer
This is a common error among inexperienced test-item writers. Students who are good test-takers can usually identify such an item and use its flaws to improve their chances of guessing the correct answer when they do not know it.
Irrelevant clues include a multiple-choice stem that is grammatically inconsistent with one or more of the options, a word in the stem that is repeated in the correct option, using qualifiers such as “always” or “never” in incorrect responses, placing the correct response in a consistent position among a set of options, or consistently making true statements longer than false statements (Miller et al., 2009; Nitko & Brookhart, 2007). Such items contribute little to the validity of test results because they may not measure what students actually know, but how well they are able to guess the correct answers.
- No item should depend on another item for meaning or for the correct answer. In other words, if a student answers one item incorrectly, he or she will likely answer the related item incorrectly. An example of such a relationship between two completion items follows:
- Which insulin should be used for emergency treatment of ketoacidosis?
- What is the onset of action for the insulin in Item 1?
In this example, Item 2 is dependent on Item 1 for its meaning. Students who supply the wrong answer to Item 1 are unlikely to supply a correct answer to Item 2. Items should be worded in such a way as to make them independent of each other. However, a series of test items can be developed to relate to a context such as a case study, database, diagram, graph, or other interpretive material.
Items that are linked to this material are called interpretive or context-dependent items, and they do not violate this general rule for writing test items because they are linked to a common stimulus, not to each other.
7. Extract Nonrelevant Items
Eliminate extraneous information unless the purpose of the item is to determine whether students can distinguish between relevant and irrelevant data.
Avoid the use of patient names in clinical scenarios; this information adds unnecessarily to reading time, it may distract from the purpose of the item, and it may introduce cultural bias. However, some items are designed to measure whether a student can evaluate the relevance of clinical data and use only pertinent information in arriving at the answer. In this case, extraneous data (but not patient names) may be included.
8. Arrange for a critique of the items
The best source of this criticism is a colleague who teaches the same content area or at least someone who is skilled in the technical aspects of item writing. If no one is available to critique the test items, the teacher who developed them should set them aside for a few days. This will allow the teacher to review the items with a fresh perspective to identify lack of clarity or faulty technical construction.
9. Prepare more items than the test blueprint specifies
This will allow for replacement items for those discarded in the review process. The fortunate teacher who does not need to use many replacement items can use the remainder to begin an item bank for future tests.