Test Items Format, Response Items, Scoring Procedure and Test Blueprints
Test Item Formats
Some students may be particularly adept at answering essay items; others may prefer multiple-choice items. However, tests should be designed to provide information about students’ knowledge or abilities, not about their skill in taking certain types of tests. A test with a variety of item formats provides students with multiple ways to demonstrate their competence ( Nitko & Brookhart, 2007).
Selection Criteria for Item Formats
Teachers should select item formats for their tests based on a variety of factors, such as the learning outcomes to be evaluated, the specific skill to be measured, and the ability level of the students. Some objectives are better measured with certain item formats.
For example, if the instructional objective specifies that the student will be able to “discuss the comparative and advantages of breast- and bottle feeding,” a multiple-choice item would be inappropriate because it would not allow the teacher to evaluate the student’s ability to organize and express ideas on this topic. An essay item would be a better choice for this purpose.
Essay items provide opportunities for students to formulate their own responses, drawing on prior learning, and to express their ideas in writing; These often are desired outcomes of nursing education programs. The teacher’s time constraints for constructing the test may affect the choice of item format. In general, essay items take less time to write than multiple-choice items, but they are more difficult and time-consuming to score.
A teacher who has little time to prepare a test and therefore chooses an essay format, assuming that this choice is also appropriate for the objectives to be tested, must plan for considerable time after the test is given to score it. In nursing programs, faculty members often develop multiple-choice items as the predominant, if not exclusive, item format because for a number of years, licensure and certifications contained only multiple-choice items.
Although this type of test item provides essential practice for students in preparation for taking such high-stakes examinations, it negates the principle of selecting the most appropriate type of test item for the outcome and content to be evaluated. In addition, it limits variety in testing and creativity in evaluating student learning.
Although practice with multiple-choice questions is critical, other types of test items and evaluation strategies are also appropriate for measuring student learning in nursing. In fact, although the majority of NCLEX examination items currently are four-option multiple choice, the item pools now contain other formats such as completion and multiple response (National Council on State Boards of Nursing, 2007).
It is clear from this example that nurse educators should not limit their selection of item formats based on the myth that learners must be tested exclusively with the item format most frequently used on a licensure or certification test. On the other hand, each change of item format on a test requires a change of task for students.
Therefore, the number of different item formats to include on a test also depends on the length of the test and the level of the learner. It is generally recommended that teachers use no more than three item formats on a test. Shorter assessments, such as a 10-item quiz, may be limited to a single item format.
Objectively and Subjectively Scored Items Another powerful and persistent myth is that some item formats evaluate students more objectively than do other formats. Although it is common to describe true–false, matching, and multiple-choice items as “objective,” objectivity refers to the way items are scored, not to the type of item or their content (Miller et al., 2009).
Objectivity means that once the scoring key is prepared, it is possible for multiple teachers on the same occasion or the same teacher on multiple occasions to arrive at the same score. Subjectively scored items, like essay items (and short answer items, to a lesser extent), require the judgment of the scorer to determine the degree of correctness and therefore are subject to more variability in scoring.
Selected-Response and Constructed-Items
Another way of classifying test items is to identify them by the type of response required of the test-taker (Miller et al., 2009). Selected-response (or “choice”) items require the test-taker to select the correct or best answer from among options provided by the teacher. In this category are item formats such as true–false, matching exercises, and multiple choices.
Constructed-response (or “supply”) formats require the learner to supply an answer, and may be classified further as limited response (or short response) and extended response. These are the short answer and essay formats. Exhibit 3.2 depicts this schema for classifying test item formats and the variations of each type.
Scoring Procedures
Decisions about what scoring procedure or procedures to use are somewhat dependent on the choice of item formats. Student responses to short-answer, numerical-calculation, and essay items, for instance, usually must be hand-scored, whether they are recorded directly on the test itself, on a separate answer sheet, or in a booklet. Answers to objective test items such as multiple-choice, true–false, and matching may also be recorded on the test itself or on a separate answer sheet.
Scannable answer sheets greatly increase the speed of objective scoring procedures and have the additional advantage of allowing computer-generated item analysis reports to be produced. The teacher should decide if the time and resources available for scoring a test suggest that hand scoring or electronic scoring would be preferable. In any case, this decision alone should not influence the choice of test-item format.
Test Blueprint and Its Elements
Most people would not think of building a house without blueprints. In fact, the word “house” denotes diverse attributes to different individuals. For this reason, a potential homeowner would not purchase a lot, call a builder, and say only, “Build a house for me on my lot.”
The builder might think that a proper house consists of a two-story brick colonial with four bedrooms, three baths, and a formal dining room, whereas the homeowner had a three-bedroom ranch with two baths, an eat-in kitchen, and a great room with a fireplace in mind.
Similarly, the word “test” might mean different things to different teachers; Students and their teachers might have widely varied expectations about what the test will contain. The best way to avoid misunderstanding regarding the nature of a test and to ensure that the teacher will be able to make valid judgments about the test scores is to develop a test blueprint, also known as a test plan or a table of specifications, before “ building” the test itself.
The elements of a test blueprint include:
(a) a list of the major topics or instructional objectives that the test will cover.
(b) the level of complexity of the task to be assessed.
(c) the emphasis each topic will have, indicated by number or percentage of items or points.
Exhibit 3.3 is an example of a test blueprint for a unit test on nursing care during normal pregnancy that illustrates each of these elements. The row headings along the left margin of the example are the content areas that will be tested. In this case, the content is indicated by a general outline of topics. Teachers may find that a more detailed outline of content or a list of the relevant objectives is more useful for a given purpose and population.
Some teachers combine a content outline and a list of objectives; in this case, an additional column of objectives would be inserted before or after the content list. The column headings across the top of the example are taken from the taxonomy of cognitive objectives (Bloom, Engelhart, Furst, Hill, & Krathwohl, 1956). Because the test blueprint is a tool to be used by the teacher, it can be modified in any way that makes sense to the user.
Accordingly, the teacher who prepared this blueprint chose to use only selected levels of the taxonomy. Other teachers might include all levels or different levels of Bloom’s taxonomy, or use a different taxonomy. The body of the test blueprint is a grid formed by the intersections of content topics and cognitive levels. Each of the cells of the grid has the potential to represent one or more test items that could be developed.
The numbers in the cells of the sample test blueprint represent the number of points on the test that will relate to it; some teachers prefer to indicate numbers of items or the percentage of points or items represented by each cell.
The percentage is a better indicator of the amount of emphasis to be given to each content area (Miller et al., 2009), but the number of items or points may be more helpful to the teacher in writing actual test items. It is not necessary to write test items for each cell; the teacher’s judgment concerning the appropriate emphasis and balance of content governs the decision about which cells should be filled and how many items should be written for each.
Rigorous classification of items into these cells is also unnecessary and, in fact, impossible; the way in which the content is actually taught may affect whether the related test items will be written at the application or comprehension level, for example.
For this reason, the actual test may deviate slightly from the specifications for certain cells, but the overall balance of emphasis between the test and the actual instruction should be very similar (Miller et al.; Nitko & Brookhart, 2007). Once developed, the test blueprint serves several important functions.
First, it is a useful tool for guiding the work of the item writer so that sufficient items are developed at the appropriate level to test important content areas and objectives. Without a test blueprint, teachers often use ease of construction as a greater consideration in writing test items, resulting in tests with a limited and biased sample of learning tasks that may omit outcomes of greater importance that are more difficult to measure (Miller et al. , 2007).
Using test blueprints also helps teachers to be accountable for the educational outcomes they produce. The test blueprint can be used as evidence for judging the validity of the resulting test scores. Another important use of the test blueprint is to inform students about the nature of the test and how they should prepare for it.
Although the content covered in class and assigned readings should give students a general idea of the content areas to be tested, students often lack a clear sense of the cognitive levels at which they will be tested on this material. Although it might be argued that the instructional objectives might give students a clue as to the level at which they will be tested, teachers often forget that students are not as sophisticated in interpreting objectives as teachers are.
Also, some teachers are good at writing objectives that specify a reasonable expectation of performance, but their test items may in fact test higher or lower performance levels. Students need to know the level at which they will be tested because that knowledge will affect how they prepare for the test, not necessarily how much they prepare.
They should prepare differently for items that test their ability to apply information than for items that test their ability to synthesize information. Some teachers worry that if the test blueprint is shared with students, they will not study the content areas that would contribute less to their overall test scores, preferring to concentrate their time and energy on the most important areas of emphasis.
If this indeed is the outcome, is it necessarily harmful? Lacking any guidance from the teacher, students may unwisely spend equal amounts of time reviewing all content areas. In fact, professional experience reveals that some knowledge is more important for use in practice than other knowledge. Even if they are good critical thinkers, students may be unable to discriminate more important content from that which is less important because they lack the practical experience to make this distinction.
Withholding information about the content emphasis of the test from students might be perceived as an attempt to threaten or punish them for perceived shortcomings such as failure to attend class, failure to read what was assigned, or failure to discern the teacher’s priorities. Such a use of testing would be considered unethical.
The best time to share the test blueprint with students is at the beginning of the course or unit of study. If students are unfamiliar with the use of a test blueprint, the teacher may need to explain the concept as well as discuss how it might be useful to the students in planning their preparation for the test. Of course, if the teacher subsequently makes modifications in the blueprint after writing the test items, those changes should also be shared with the students ( Nitko & Brookhart, 2007).