For questions or information:
Improving Your Test Questions
There are two general categories of test items: (1) objective items which require students to select the correct response from several alternatives or to supply a word or short phrase to answer a question or complete a statement; and (2) subjective or essay items which permit the student to organize and present an original answer. Objective items include multiple-choice, true-false, matching and completion, while subjective items include short-answer essay, extended-response essay, problem solving and performance test items. For some instructional purposes one or the other item types may prove more efficient and appropriate. To begin out discussion of the relative merits of each type of test item, test your knowledge of these two item types by answering the following questions.
1. | TRUE | Essay items are generally easier and less time consuming to construct than are most objective test items. Technically correct and content appropriate multiple-choice and true-false test items require an extensive amount of time to write and revise. For example, a professional item writer produces only 9-10 good multiple-choice items in a day's time. |
2. | ? | According to research findings it is still undetermined whether or not essay tests require or facilitate more thorough (or even different) student study preparation. |
3. | TRUE | Writing skills do affect a student's ability to communicate the correct "factual" information through an essay response. Consequently, students with good writing skills have an advantage over students who have difficulty expressing themselves through writing. |
4. | FALSE | Essays do not teach a student how to write but they can emphasize the importance of being able to communicate through writing. Constant use of essay tests may encourage the knowledgeable but poor writing student to improve his/her writing ability in order to improve performance. |
5. | TRUE | Essays are more subjective in nature due to their susceptibility to scoring influences. Different readers can rate identical responses differently, the same reader can rate the same paper differently over time, the handwriting, neatness or punctuation can unintentionally affect a paper's grade and the lack of anonymity can affect the grading process. While impossible to eliminate, scoring influences or biases can be minimized through procedures discussed later in this guide. |
6. | ? | Both item types encourage some form of guessing. Multiple-choice, true-false and matching items can be correctly answered through blind guessing, yet essay items can be responded to satisfactorily through well written bluffing. |
7. | TRUE | Due to the extent of time required by the student to respond to an essay question, only a few essay questions can be included on a classroom exam. Consequently, a larger number of objective items can be tested in the same amount of time, thus enabling the test to cover more content. |
8. | TRUE | Both item types can measure similar content or learning objectives. Research has shown that students respond almost identically to essay and objective test items covering the same content. Studies 1 by Sax & Collet (1968) and Paterson (1926) conducted forty-two years apart reached the same conclusion: ". there seems to be no escape from the conclusions that the two types of exams are measuring identical things" (Paterson, 1926, p. 246). This conclusion should not be surprising; after all, a well written essay item requires that the student (1) have a store of knowledge, (2) be able to relate facts and principles, and (3) be able to organize such information into a coherent and logical written expression, whereas an objective test item requires that the student (1) have a store of knowledge, (2) be able to relate facts and principles, and (3) be able to organize such information into a coherent and logical choice among several alternatives. |
9. | TRUE | Both objective and essay test items are good devices for measuring student achievement. However, as seen in the previous quiz answers, there are particular measurement situations where one item type is more appropriate than the other. Following is a set of recommendations for using either objective or essay test items: (Adapted from Robert L. Ebel, Essentials of Educational Measurement, 1972, p. 144). |
1 Sax, G., & Collet, L. S. (1968). An empirical comparison of the effects of recall and multiple-choice tests on student achievement. Journal of Educational Measurement, 5(2), 169–173. doi:10.1111/j.1745-3984.1968.tb00622.x
Paterson, D. G. (1926). Do new and old type examinations measure different mental functions? School and Society, 24, 246–248.
Essay tests are especially appropriate when:
Objective tests are especially appropriate when:
Either essay or objective tests can be used to:
In addition to the preceding suggestions, it is important to realize that certain item types are better suited than others for measuring particular learning objectives. For example, learning objectives requiring the student to demonstrate or to show, may be better measured by performance test items, whereas objectives requiring the student to explain or to describe may be better measured by essay test items. The matching of learning objective expectations with certain item types can help you select an appropriate kind of test item for your classroom exam as well as provide a higher degree of test validity (i.e., testing what is supposed to be tested). To further illustrate, several sample learning objectives and appropriate test items are provided on the following page.
Learning Objectives | Most Suitable Test Item |
---|---|
The student will be able to categorize and name the parts of the human skeletal system. | Objective Test Item (M-C, T-F, Matching) |
The student will be able to critique and appraise another student's English composition on the basis of its organization. | Essay Test Item (Extended-Response) |
The student will demonstrate safe laboratory skills. | Performance Test Item |
The student will be able to cite four examples of satire that Twain uses in Huckleberry Finn. | Essay Test Item (Short-Answer) |
After you have decided to use either an objective, essay or both objective and essay exam, the next step is to select the kind(s) of objective or essay item that you wish to include on the exam. To help you make such a choice, the different kinds of objective and essay items are presented in the following section. The various kinds of items are briefly described and compared to one another in terms of their advantages and limitations for use. Also presented is a set of general suggestions for the construction of each item variation.
The multiple-choice item consists of two parts: (a) the stem, which identifies the question or problem and (b) the response alternatives. Students are asked to select the one alternative that best completes the statement or answers the question. For example:
Multiple-choice items can provide.
1. When possible, state the stem as a direct question rather than as an incomplete statement. | |
Undesirable: | Alloys are ordinarily produced by. |
Desirable: | How are allows ordinarily produced? |
2. Present a definite, explicit and singular question or problem in the stem. | |
Undesirable: | Psychology. |
Desirable: | The science of mind and behavior is called. |
3. Eliminate excessive verbiage or irrelevant information from the stem. | |
Undesirable: | While ironing her formal, Jane burned her hand accidently on the hot iron. This was due to a transfer of heat by. |
Desirable: | Which of the following ways of heat transfer explains why Jane's hand was burned after she touched a hot iron? |
4. Include in the stem any word(s) that might otherwise be repeated in each alternative. | |
Undesirable: | In national elections in the United States the President is officially |
a. chosen by the people. |
b. chosen by members of Congress. |
c. chosen by the House of Representatives. |
*d. chosen by the Electoral College |
a. the people. |
b. members of Congress. |
c. the House of Representatives. |
*d. the Electoral College |
5. Use negatively stated stems sparingly. When used, underline and/or capitalize the negative word. | |
Undesirable: | Which of the following is not cited as an accomplishment of the Kennedy administration? |
Desirable: | Which of the following is NOT cited as an accomplishment of the Kennedy administration? |
6. Make all alternatives plausible and attractive to the less knowledgeable or skillful student. | |
What process is most nearly the opposite of photosynthesis? | |
Undesirable | Desirable |
a. Digestion | a. Digestion |
b. Relaxation | b. Assimilation |
*c. Respiration | *c. Respiration |
d. Exertion | d. Catabolism |
7. Make the alternatives grammatically parallel with each other, and consistent with the stem. | |
Undesirable: | What would do most to advance the application of atomic discoveries to medicine? |
*a. Standardized techniques for treatment of patients. |
b. Train the average doctor to apply radioactive treatments. |
c. Remove the restriction on the use of radioactive substances. |
d. Establishing hospitals staffed by highly trained radioactive therapy specialists. |
*a. Development of standardized techniques for treatment of patients. |
b. Training of the average doctor in application of radioactive treatments. |
c. Removal of restriction on the use of radioactive substances. |
d. Addition of trained radioactive therapy specialists to hospital staffs. |
8. Make the alternatives mutually exclusive. | |
Undesirable: | The daily minimum required amount of milk that a 10 year old child should drink is |
a. 1-2 glasses. |
*b. 2-3 glasses. |
*c. 3-4 glasses. |
d. at least 4 glasses. |
a. 1 glass. |
b. 2 glasses. |
*c. 3 glasses. |
d. 4 glasses. |
9. When possible, present alternatives in some logical order (e.g., chronological, most to least, alphabetical). | |
At 7 a.m. two trucks leave a diner and travel north. One truck averages 42 miles per hour and the other truck averages 38 miles per hour. At what time will they be 24 miles apart? | |
Undesirable | Desirable |
a. 6 p.m. | a. 1 a.m. |
b. 9 p.m. | b. 6 a.m. |
c. 1 a.m. | c. 9 a.m. |
*d. 1 p.m. | *d. 1 p.m. |
e. 6 a.m. | e. 6 p.m. |
10. Be sure there is only one correct or best response to the item. | |
Undesirable: | The two most desired characteristics in a classroom test are validity and |
a. precision. |
*b. reliability. |
c. objectivity. |
*d. consistency. |
a. precision. |
*b. reliability. |
c. objectivity. |
d. standardization. |
11. Make alternatives approximately equal in length. | |
Undesirable: | The most general cause of low individual incomes in the United States is |
*a. lack of valuable productive services to sell. |
b. unwillingness to work. |
c. automation. |
d. inflation. |
*a. A lack of valuable productive services to sell. |
b. The population's overall unwillingness to work. |
c. The nation's increased reliance on automation. |
d. an increasing national level of inflation. |
12. Avoid irrelevant clues such as grammatical structure, well known verbal associations or connections between stem and answer. | |
Undesirable: (grammatical clue) | A chain of islands is called an: |
*a. archipelago. |
b. peninsula. |
c. continent. |
d. isthmus. |
a. measurement. |
*b. correlation. |
c. testing. |
d. error. |
a. the length of the reservoir behind the dam. |
b. the volume of water behind the dam. |
*c. the height of water behind the dam. |
d. the strength of the reinforcing wall. |
13. Use at least four alternatives for each item to lower the probability of getting the item correct by guessing.
14. Randomly distribute the correct response among the alternative positions throughout the test having approximately the same proportion of alternatives a, b, c, d and e as the correct response.
15. Use the alternatives "none of the above" and "all of the above" sparingly. When used, such alternatives should occasionally be used as the correct response.
A true-false item can be written in one of three forms: simple, complex, or compound. Answers can consist of only two choices (simple), more than two choices (complex), or two choices plus a conditional completion response (compound). An example of each type of true-false item follows:
The acquisition of morality is a developmental process. | True | False |
The acquisition of morality is a developmental process. | True | False |
The acquisition of morality is a developmental process. | True | False |
True-False items can provide.
1. Base true-false items upon statements that are absolutely true or false, without qualifications or exceptions. | |
Undesirable: | Nearsightedness is hereditary in origin. |
Desirable: | Geneticists and eye specialists believe that the predisposition to nearsightedness is hereditary. |
2. Express the item statement as simply and as clearly as possible. | |
Undesirable: | When you see a highway with a marker that reads, "Interstate 80" you know that the construction and upkeep of that road is built and maintained by the state and federal government. |
Desirable: | The construction and maintenance of interstate highways is provided by both state and federal governments. |
3. Express a single idea in each test item. | |
Undesirable: | Water will boil at a higher temperature if the atmospheric pressure on its surface is increased and more heat is applied to the container. |
Desirable: | Water will boil at a higher temperature if the atmospheric pressure on its surface is increased. |
and/or | |
Water will boil at a higher temperature if more heat is applied to the container. |
4. Include enough background information and qualifications so that the ability to respond correctly to the item does not depend on some special, uncommon knowledge. | |
Undesirable: | The second principle of education is that the individual gathers knowledge. |
Desirable: | According to John Dewey, the second principle of education is that the individual gathers knowledge. |
5. Avoid lifting statements from the text, lecture or other materials so that memory alone will not permit a correct answer. | |
Undesirable: | For every action there is an opposite and equal reaction. |
Desirable: | If you were to stand in a canoe and throw a life jacket forward to another canoe, chances are your canoe would jerk backward. |
6. Avoid using negatively stated item statements. | |
Undesirable: | The Supreme Court is not composed of nine justices. |
Desirable: | The Supreme is composed of nine justices. |
7. Avoid the use of unfamiliar vocabulary. | |
Undesirable: | According to some politicians, the raison d'etre for capital punishment is retribution. |
Desirable: | According to some politicians, justification for the existence of capital punishment is retribution. |
8. Avoid the use of specific determiners which would permit a test-wise but unprepared examinee to respond correctly. Specific determiners refer to sweeping terms like "all," "always," "none," "never," "impossible," "inevitable," etc. Statements including such terms are likely to be false. On the other hand, statements using qualifying determiners such as "usually," "sometimes," "often," etc., are likely to be true. When statements do require the use of specific determiners, make sure they appear in both true and false items. | |
Undesirable: | All sessions of Congress are called by the President. (F) |
The Supreme Court is frequently required to rule on the constitutionality of a law. (T) | |
An objective test is generally easier to score than an essay test. (T) | |
Desirable: | (When specific determiners are used reverse the expected outcomes.) |
The sum of the angles of a triangle is always 180°. (T) | |
Each molecule of a given compound is chemically the same as every other molecule of that compound. (T) | |
The galvanometer is the instrument usually used for the metering of electrical energy used in a home. (F) |
9. False items tend to discriminate more highly than true items. Therefore, use more false items than true items (but no more than 15% additional false items). |
In general, matching items consist of a column of stimuli presented on the left side of the exam page and a column of responses placed on the right side of the page. Students are required to match the response associated with a given stimulus. For example:
Directions: | On the line to the left of each factual statement, write the letter of the principle which bests explains the statement's occurrence. Each principle may be used more than once. |
1. ___ Fossils of primates first appear in the Cenozoic rock strata, while trilobite remains are found in the Proterozoic rocks.
2. ___ The Arctic and Antarctic regions are sparsely populated.
3. ___ Plants have no nervous system.
4. ___ Large coal beds exist in Alaska.
1. Include directions which clearly state the basis for matching the stimuli with the responses. Explain whether or not a response can be used more than once and indicate where to write the answer. | ||
Undesirable: | Directions: | Match the following. |
Desirable: | Directions: | On the line to the left of each identifying location and characteristics in Column I, write the letter of the country in Column II that is best defined. Each country in Column II may be used more than once. |
2. ___ Discovered Radium
4. ___ Year of the 1st Nuclear Fission by Man
4. ___ Sulfuric Acid
1. ___ Hunting for reasons to support one's beliefs.
2. ___ Accepting the values and norms of others as one's own even when they are contrary to previously held values.
3. ___ Attributing to others one's own unacceptable impulses, thoughts and desires.
4. ___ Ignoring disagreeable situations, topics, sights.
e. Denial of Reality
a. Denial of reality
1. ___ Igneous rocks are formed
2. ___ The formation of coal requires
3. ___ A geode is filled
4. ___ Feldspar is classified as
5. Keep matching items brief, limiting the list of stimuli to under 10.
6. Include more responses than stimuli to help prevent answering through the process of elimination.
7. When possible, reduce the amount of reading time by including only short phrases or single words in the response list.
The completion item requires the student to answer a question or to finish an incomplete statement by filling in a blank with the correct word or phrase. For example,
According to Freud, personality is made up of three major systems, the _________, the ________ and the ________.
1. Omit only significant words from the statement. | |
Undesirable: | Every atom has a central (core) called a nucleus. |
Desirable: | Every atom has a central core called a(n) (nucleus) . |
2. Do not omit so many words from the statement that the intended meaning is lost. | |
Undesirable: | The were to Egypt as the were to Persia and as were to the early tribes of Israel. |
Desirable: | The Pharaohs were to Egypt as the were to Persia and as were to the early tribes of Israel. |
3. Avoid grammatical or other clues to the correct response. | |
Undesirable: | Most of the United States' libraries are organized according to the (Dewey) decimal system. |
Desirable: | Which organizational system is used by most of the United States' libraries? (Dewey decimal) |
4. Be sure there is only one correct response. | |
Undesirable: | Trees which shed their leaves annually are (seed-bearing, common) . |
Desirable: | Trees which shed their leaves annually are called (deciduous) . |
5. Make the blanks of equal length. | |
Undesirable: | In Greek mythology, Vulcan was the son of (Jupiter) and (Juno) . |
Desirable: | In Greek mythology, Vulcan was the son of (Jupiter) and (Juno) . |
6. When possible, delete words at the end of the statement after the student has been presented a clearly defined problem. | |
Undesirable: | (122.5) is the molecular weight of KClO3. |
Desirable: | The molecular weight of KClO3 is (122.5) . |
7. Avoid lifting statements directly from the text, lecture or other sources.
8. Limit the required response to a single word or phrase.
The essay test is probably the most popular of all types of teacher-made tests. In general, a classroom essay test consists of a small number of questions to which the student is expected to demonstrate his/her ability to (a) recall factual knowledge, (b) organize this knowledge and (c) present the knowledge in a logical, integrated answer to the question. An essay test item can be classified as either an extended-response essay item or a short-answer essay item. The latter calls for a more restricted or limited answer in terms of form or scope. An example of each type of essay item follows.
Explain the difference between the S-R (Stimulus-Response) and the S-O-R (Stimulus-Organism-Response) theories of personality. Include in your answer (a) brief descriptions of both theories, (b) supporters of both theories and (c) research methods used to study each of the two theories. (10 pts. 20 minutes)
Identify research methods used to study the S-R (Stimulus-Response) and S-O-R (Stimulus-Organism-Response) theories of personality. (5 pts. 10 minutes)
1. Prepare essay items that elicit the type of behavior you want to measure. | |
Learning Objective: | The student will be able to explain how the normal curve serves as a statistical model. |
Undesirable: | Describe a normal curve in terms of: symmetry, modality, kurtosis and skewness. |
Desirable: | Briefly explain how the normal curve serves as a statistical model for estimation and hypothesis testing. |
2. Phrase each item so that the student's task is clearly indicated. | |
Undesirable: | Discuss the economic factors which led to the stock market crash of 1929. |
Desirable: | Identify the three major economic conditions which led to the stock market crash of 1929. Discuss briefly each condition in correct chronological sequence and in one paragraph indicate how the three factors were inter-related. |
3. Indicate for each item a point value or weight and an estimated time limit for answering. | |
Undesirable: | Compare the writings of Bret Harte and Mark Twain in terms of settings, depth of characterization, and dialogue styles of their main characters. |
Desirable: | Compare the writings of Bret Harte and Mark Twain in terms of settings, depth of characterization, and dialogue styles of their main characters. (10 points 20 minutes) |
4. Ask questions that will elicit responses on which experts could agree that one answer is better than another.
5. Avoid giving the student a choice among optional items as this greatly reduces the reliability of the test.
6. It is generally recommended for classroom examinations to administer several short-answer items rather than only one or two extended-response items.
ANALYTICAL SCORING : | Each answer is compared to an ideal answer and points are assigned for the inclusion of necessary elements. Grades are based on the number of accumulated points either absolutely (i.e., A=10 or more points, B=6-9 pts., etc.) or relatively (A=top 15% scores, B=next 30% of scores, etc.) |
GLOBAL QUALITY : | Each answer is read and assigned a score (e.g., grade, total points) based either on the total quality of the response or on the total quality of the response relative to other student answers. |
"Americans are a mixed-up people with no sense of ethical values. Everyone knows that baseball is far less necessary than food and steel, yet they pay ball players a lot more than farmers and steelworkers." WHY? Use 3-4 sentences to indicate how an economist would explain the above situation.
Necessary Elements to be Included in Response | Points |
---|---|
Salaries are based on demand relative to supply of such services. | 3 |
Excellent ball players are rare. | 2 |
Ball clubs have a high demand for excellent players. | 2 |
Clarity of Response | 2 |
9 pts. |
Assign scores or grades on the overall quality of the written response as compared to an ideal answer. Or, compare the overall quality of a response to other student responses by sorting the papers into three stacks:
Below Average | Average | Above Average |
Below Average | Average | Above Average | ||||||
Below Avg. | Avg. | Above Avg. | Below Avg. | Avg. | Above Avg. | Below Avg. | Avg. | Above Avg. |
Another form of a subjective test item is the problem solving or computational exam question. Such items present the student with a problem situation or task and require a demonstration of work procedures and a correct solution, or just a correct solution. This kind of test item is classified as a subjective type of item due to the procedures used to score item responses. Instructors can assign full or partial credit to either correct or incorrect solutions depending on the quality and kind of work procedures presented. An example of a problem solving test item follows.
It was calculated that 75 men could complete a strip on a new highway in 70 days. When work was scheduled to commence, it was found necessary to send 25 men on another road project. How many days longer will it take to complete the strip? Show your work for full or partial credit.
Problem solving items.
Problem solving items.
1. Clearly identify and explain the problem. | |
Undesirable: | During a car crash, the car slows down at the rate of 490 m/sec2. What is the magnitude and direction of the force acting on a 100-kg driver? |
Desirable: | During a car crash, the car slows down at the rate of 490 m/sec2. Using the car as a frame of reference, what is the magnitude and direction of the gram force acting on a 100-kg driver? |
2. Provide directions which clearly inform the student of the type of response called for. | |
Undesirable: | An American tourist in Paris finds that he weighs 70 kilograms. When he left the United States he weighed 144 pounds. What was his net change in weight? |
Desirable: | An American tourist in Paris finds that he weighs 70 kilograms. When he left the United States he weighed 144 pounds. What was his net weight change in pounds? |
3. State in the directions whether or not the student must show his/her work procedures for full or partial credit. | |
Undesirable: | A double concave lens is made of glass with n = 1.50. If the radii of curvature of the two lens surfaces are both 30.0 cm, what is the focal length of the lens? |
Desirable: | A double concave lens is made of glass with n = 1.50. If the radii of curvature of the two lens surfaces are both 30.0 cm, what is the focal length of the lens? Show your work to receive full or partial credit. |
4. Clearly separate item parts and indicate their point values. | |
A man leaves his home and drives to a convention at an average rate of 50 miles per hour. Upon arrival, he finds a telegram advising him to return at once. He catches a plane that takes him back at an average rate of 300 miles per hour. | |
Undesirable: | If the total traveling time was 1 3/4 hours, how long did it take him to fly back? How far from his home was the convention? |
Desirable: | If the total traveling time was 1 3/4 hours: (1) How long did it take him to fly back? (1 pt.) (2) How far from his home was the convention? (1 pt.) Show your work for full or partial credit. |
5. Use figures, conditions and situations which create a realistic problem. | |
Undesirable: | An automobile weighing 2,840 N (about 640 pounds) is traveling at a speed of 300 miles per hour. What is the car's kinetic energy? Show your work. (2 pts.) |
Desirable: | An automobile weighing 14,200 N (about 3200 pounds) is traveling at a speed of 12m/sec. What is the car's kinetic energy? Show your work. (2 pts.) |
6. Ask questions that elicit responses on which experts could agree that one solution and one or more work procedures are better than others.
7. Work through each problem before classroom administration to double-check accuracy.
A performance test item is designed to assess the ability of a student to perform correctly in a simulated situation (i.e., a situation in which the student will be ultimately expected to apply his/her learning). The concept of simulation is central in performance testing; a performance test will simulate to some degree a real life situation to accomplish the assessment. In theory, a performance test could be constructed for any skill and real life situation. In practice, most performance tests have been developed for the assessment of vocational, managerial, administrative, leadership, communication, interpersonal and physical education skills in various simulated situations. An illustrative example of a performance test item is provided below.
Assume that some of the instructional objectives of an urban planning course include the development of the student's ability to effectively use the principles covered in the course in various "real life" situations common for an urban planning professional. A performance test item could measure this development by presenting the student with a specific situation which represents a "real life" situation. For example,
An urban planning board makes a last minute request for the professional to act as consultant and critique a written proposal which is to be considered in a board meeting that very evening. The professional arrives before the meeting and has one hour to analyze the written proposal and prepare his critique. The critique presentation is then made verbally during the board meeting; reactions of members of the board or the audience include requests for explanation of specific points or informed attacks on the positions taken by the professional.
The performance test designed to simulate this situation would require that the student to be tested role play the professional's part, while students or faculty act the other roles in the situation. Various aspects of the "professional's" performance would then be observed and rated by several judges with the necessary background. The ratings could then be used both to provide the student with a diagnosis of his/her strengths and weaknesses and to contribute to an overall summary evaluation of the student's abilities.
Performance test items.
Performance test items.
This section presents two methods for collecting feedback on the quality of your test items. The two methods include using self-review checklists and student evaluation of test item quality. You can use the information gathered from either method to identify strengths and weaknesses in your item writing.
EVALUATE YOUR TEST ITEMS BY CHECKING THE SUGGESTIONS WHICH YOU FEEL YOU HAVE FOLLOWED.
____ | When possible, stated the stem as a direct question rather than as an incomplete statement. |
____ | Presented a definite, explicit and singular question or problem in the stem. |
____ | Eliminated excessive verbiage or irrelevant information from the stem. |
____ | Included in the stem any word(s) that might have otherwise been repeated in each alternative. |
____ | Used negatively stated stems sparingly. When used, underlined and/or capitalized the negative word(s). |
____ | Made all alternatives plausible and attractive to the less knowledgeable or skillful student. |
____ | Made the alternatives grammatically parallel with each other, and consistent with the stem. |
____ | Made the alternatives mutually exclusive. |
____ | When possible, presented alternatives in some logical order (e.g., chronologically, most to least). |
____ | Made sure there was only one correct or best response per item. |
____ | Made alternatives approximately equal in length. |
____ | Avoided irrelevant clues such as grammatical structure, well known verbal associations or connections between stem and answer. |
____ | Used at least four alternatives for each item. |
____ | Randomly distributed the correct response among the alternative positions throughout the test having approximately the same proportion of alternatives a, b, c, d, and e as the correct response. |
____ | Used the alternatives "none of the above" and "all of the above" sparingly. When used, such alternatives were occasionally the correct response. |
____ | Based true-false items upon statements that are absolutely true or false, without qualifications or exceptions. |
____ | Expressed the item statement as simply and as clearly as possible. |
____ | Expressed a single idea in each test item. |
____ | Included enough background information and qualifications so that the ability to respond correctly did not depend on some special, uncommon knowledge. |
____ | Avoided lifting statements from the text, lecture, or other materials. |
____ | Avoided using negatively stated item statements. |
____ | Avoided the use of unfamiliar language. |
____ | Avoided the use of specific determiners such as "all," "always," "none," "never," etc., and qualifying determiners such as "usually," "sometimes," "often," etc. |
____ | Used more false items than true items (but not more than 15% additional false items). |
____ | Omitted only significant words from the statement. |
____ | Did not omit so many words from the statement that the intended meaning was lost. |
____ | Avoided grammatical or other clues to the correct response. |
____ | Included only one correct response per item. |
____ | Made the blanks of equal length. |
____ | When possible, deleted the words at the end of the statement after the student was presented with a clearly defined problem. |
____ | Avoided lifting statements directly from the text, lecture, or other sources. |
____ | Limited the required response to a single word or phrase. |
____ | Prepared items that elicited the type of behavior you wanted to measure. |
____ | Phrased each item so that the student's task was clearly indicated. |
____ | Indicated for each item a point value or weight and an estimated time limit for answering. |
____ | Asked questions that elicited responses on which experts could agree that one answer is better than others. |
____ | Avoided giving the student a choice among optional items. |
____ | Administered several short-answer items rather than 1 or 2 extended-response items. |
____ | Selected an appropriate grading model. |
____ | Tried not to allow factors which were irrelevant to the learning outcomes being measured to affect your grading (e.g., handwriting, spelling, neatness). |
____ | Read and graded all class answers to one item before going on to the next item. |
____ | Read and graded the answers without looking at the student's name to avoid possible preferential treatment. |
____ | Occasionally shuffled papers during the reading of answers. |
____ | When possible, asked another instructor to read and grade your students' responses. |
____ | Prepared items that elicit the type of behavior you wanted to measure. |
____ | Clearly identified and explained the simulated situation to the student. |
____ | Made the simulated situation as "life-like" as possible. |
____ | Provided directions which clearly inform the students of the type of response called for. |
____ | When appropriate, clearly stated time and activity limitations in the directions. |
____ | Adequately trained the observer(s)/scorer(s) to ensure that they were fair in scoring the appropriate behaviors. |
The following set of ICES (Instructor and Course Evaluation System) questionnaire items can be used to assess the quality of your test items. The items are presented with their original ICES catalogue number. You are encouraged to include one or more of the items on the ICES evaluation form in order to collect student opinion of your item writing quality.
102--How would you rate the instructor's examination questions? | 116--Did the exams challenge you to do original thinking? | ||||
Excellent | Poor | Yes, very challenging | No, not challenging | ||
103--How well did examination questions reflect content and emphasis of the course? | 118--Were there "trick" or trite questions on tests? | ||||
Well related | Poorly related | Lots of them | Few if any | ||
114--The exams reflected important points in the reading assignments. | 122--How difficult were the examinations? | ||||
Strongly agree | Strongly disagree | Too difficult | Too easy | ||
119--Were exam questions worded clearly? | 123--I found I could score reasonably well on exams by just cramming. | ||||
Yes, very clear | No, very unclear | Strongly agree | Strongly disagree | ||
115--Were the instructor's test questions thought provoking? | 121--How was the length of exams for the time allotted. | ||||
Definitely yes | Definitely no | Too long | Too short | ||
125--Were exams adequately discussed upon return? | 109--Were exams, papers, reports returned with errors explained or personal comments? | ||||
Yes, adequately | No, not enough | Almost always | Almost never |
The information on this page is intended for self-instruction. However, CITL staff members will consult with faculty who wish to analyze and improve their test item writing. The staff can also consult with faculty about other instructional problems. Instructors wishing to acquire CITL assistance can contact citl-info@illinois.edu.
Ebel, R. L. (1965). Measuring educational achievement. Prentice-Hall.
Ebel, R. L. (1972). Essentials of educational measurement. Prentice-Hall.
Gronlund, N. E. (1976). Measurement and evaluation in teaching (3rd ed.). Macmillan.
Mehrens W. A. & Lehmann I. J. (1973). Measurement and evaluation in education and psychology. Holt, Rinehart & Winston.
Nelson, C. H. (1970). Measurement and evaluation in the classroom. Macmillan.
Payne, D. A. (1974). The assessment of learning: Cognitive and affective. D.C. Heath & Co.
Scannell, D. P., & Tracy D. B. (1975). Testing and measurement in the classroom. Houghton Mifflin.
Thorndike, R. L. (1971). Educational measurement (2nd ed.). American Council on Education.