In their book Language Testing in Practice: Designing and Developing Useful Language Tests, Bachman and Palmer (available at this link) spends an entire chapter to talk about Test Usefulness. They take this effort since usefulness is the foremost quality of a test. If the test is not useful, there is no point in having it at all.
Test Usefulness has six qualities or elements. They are discussed in brief below.
1. Reliability
Reliability is about consistency. Two versions of a test must provide comparable scores. Two sets of test takers of comparable language abilities must give comparable scores. Same test administered after a period of time must also deliver comparable results. If the tests provide very different scores under these circumstances, we cannot trust or rely upon the test. Though it is not possible to attain 100% reliability at all times, it is a very necessary quality for any good test.
2. Construct Validity
We interpret the score of a test. We make decisions based on the score of a test. We promote students, rank them, etc. based on test scores. How do we justify such actions? It is not enough just to claim that our judgments or decisions are justified. We ought to demonstrate it. Construct validity implies that the test score reflects the areas of language ability the test claims to measure. To do this, the test must define what constructs measured in a particular test. Construct validity can be defined as the extent to which the test score can be interpreted as an indicator of the abilities or constructs of the test taker, that we measure. Another way of defining it is 'the correspondence between the characteristics of test task and Target Language Use (TLU) task to which we want to generalise our test score. Therefore to define construct validity, we need to define TLU task characteristics and constructs to be measured.
What ensures Construct validity? a) correspondence of characteristics of test task to TLU tasks, b)engagement of testees' language abilities by the test task characteristics.
Note that construct validation is an ongoing process, and that no interpretation is absolutely valid.
3. Authenticity
Authenticity is the correspondence between TLU task performance and test task performance. By ensuring that TLU and test tasks have same characteristics, we can ensure authenticity. Generalisation beyond test tasks depends on authenticity. Therefore, this is a critical quality/feature of test tasks.
4. Interactiveness
It is the extent and type of involvement of the test takers' individual characteristics in accomplishing a test task. Three major individual characteristics are language ability, topical knowledge and affective schemata. These characteristics interact with the test task characteristics. This interaction can be controlled by monitoring test task characteristics.
5. Impact
Impact is on test takers, teachers, and the society in general. Test takers' experience of taking the test, preparing for the test, receiving the feedback and facing the decisions based on test score are impacts on test taker. For teachers, a test might mean change or adjustment of instruction style, teaching materials, assessment and feedback. For the society at large, test methods imply allocation of funds, changes in decisions, arrangement of other facilities, infrastructure, etc. Characteristics of testing situation are also important factors.
6. Practicality
Practicality is about the feasibility of a test. The elements influencing this will be human resources, material resources and time. If one of these is not available, the test may not be practical.
For each testing situation/context we need to have a balance of these six elements so that the test is useful. Take one of these elements away, and the test becomes useless. For example, if all the first five qualities are there in a test, but the test is not practical, the test is not useful. Therefore, while designing or adapting tests, we need to carefully consider all these test qualities.
Test Usefulness has six qualities or elements. They are discussed in brief below.
1. Reliability
Reliability is about consistency. Two versions of a test must provide comparable scores. Two sets of test takers of comparable language abilities must give comparable scores. Same test administered after a period of time must also deliver comparable results. If the tests provide very different scores under these circumstances, we cannot trust or rely upon the test. Though it is not possible to attain 100% reliability at all times, it is a very necessary quality for any good test.
2. Construct Validity
We interpret the score of a test. We make decisions based on the score of a test. We promote students, rank them, etc. based on test scores. How do we justify such actions? It is not enough just to claim that our judgments or decisions are justified. We ought to demonstrate it. Construct validity implies that the test score reflects the areas of language ability the test claims to measure. To do this, the test must define what constructs measured in a particular test. Construct validity can be defined as the extent to which the test score can be interpreted as an indicator of the abilities or constructs of the test taker, that we measure. Another way of defining it is 'the correspondence between the characteristics of test task and Target Language Use (TLU) task to which we want to generalise our test score. Therefore to define construct validity, we need to define TLU task characteristics and constructs to be measured.
What ensures Construct validity? a) correspondence of characteristics of test task to TLU tasks, b)engagement of testees' language abilities by the test task characteristics.
Note that construct validation is an ongoing process, and that no interpretation is absolutely valid.
3. Authenticity
Authenticity is the correspondence between TLU task performance and test task performance. By ensuring that TLU and test tasks have same characteristics, we can ensure authenticity. Generalisation beyond test tasks depends on authenticity. Therefore, this is a critical quality/feature of test tasks.
4. Interactiveness
It is the extent and type of involvement of the test takers' individual characteristics in accomplishing a test task. Three major individual characteristics are language ability, topical knowledge and affective schemata. These characteristics interact with the test task characteristics. This interaction can be controlled by monitoring test task characteristics.
5. Impact
Impact is on test takers, teachers, and the society in general. Test takers' experience of taking the test, preparing for the test, receiving the feedback and facing the decisions based on test score are impacts on test taker. For teachers, a test might mean change or adjustment of instruction style, teaching materials, assessment and feedback. For the society at large, test methods imply allocation of funds, changes in decisions, arrangement of other facilities, infrastructure, etc. Characteristics of testing situation are also important factors.
6. Practicality
Practicality is about the feasibility of a test. The elements influencing this will be human resources, material resources and time. If one of these is not available, the test may not be practical.
For each testing situation/context we need to have a balance of these six elements so that the test is useful. Take one of these elements away, and the test becomes useless. For example, if all the first five qualities are there in a test, but the test is not practical, the test is not useful. Therefore, while designing or adapting tests, we need to carefully consider all these test qualities.
No comments:
Post a Comment