The ELT Blog: Language testing

Showing posts with label Language testing. Show all posts

Saturday, 5 August 2017

Reliability and Validity in Language Assessment

Both reliability and validity are important for a language test to be useful.

Reliability

Reliability in other words is consistency. It is like a weighing scale's reliability. A weighing scale must show the same weight of the same object on all occasions. If a test that gives me an A grade today must give me something similar a month from now also. Or a test that gives A grades to a group of students of the same ability must give about the same scores in a few weeks' time. That is, the test must be reliable. If a test gives A grade today, and F (fail) grade tomorrow, then the test is not reliable. If tests are not reliable, they are not useful. They will not provide us with any information about the test-taker. Therefore, our attempts must be to minimize the effects of the potential sources of inconsistency in the test.

Validity

Validity implies the meaningfulness and appropriateness of the interpretations we make based on a test score. Validity is when we are indeed testing what we intend to test. Validity is when we are confidently able to interpret the test score as a representation of the test-taker's underlying language ability we measured in the test. If there is no validity, we cannot generalize our interpretations to the Target Language Use (TLU) domain. If we can't generalize a test score to other domains, it is not very useful. In other words, without validity, tests are useless.

To ensure validity, we must look at the characteristics of the test task and the construct definition. Test task characteristics are important because they must match with the TLU domain tasks' characteristics. They must test the test-takers' language ability. This is possible only when you have defined the construct to be measured in clear terms.

Thursday, 25 May 2017

Characteristics of Tasks for Language Tests

Like all language learning and teaching tasks, language test tasks also have characteristics. Yes. But why are we talking about task characteristics in the context of tests? We are talking about the characteristics of test tasks for the following reasons.

Uses of Task Characteristics

Knowing the characteristics of tasks will help us link test and non-test tasks better. That is, if we know the characteristics of a task, we will be able to see if test tasks reflect non-test tasks. In other words, we could ensure that the task we use in a test is much like a task in real life situations.
Knowing the characteristics of tasks will give us information about what language ability of the test-taker is engaged while performing test/non-test tasks.
Knowing the characteristics of tasks will help us establish authenticity of test tasks. If the test task characteristics correspond to Target Language Use (TLU) task characteristics, we have an authentic test task.
If we know task characteristics, we will be able to control them while designing test tasks.

Test must be a clear, transparent process. In testing, it is important that the test taker must understand how to perform, what performance is expected, how the performance will be rated and how the result will be used.

In order to talk about tasks characteristics in the context of language assessment, we must first define language use tasks. Language use tasks are the tasks used in language tests to gather information about the test-taker's language abilities. They are situated in particular contexts, goal oriented and involve active participation of test taker/s.

TLU domain is the set of language use tasks that the test-taker might encounter outside the testing situation, to which we want to generalize our inferences about language abilities/skills.

For our purposes, we can look at language use as a set of language use tasks, and language test as a procedure to elicit language use instances from which inferences can be made about test-taker's language abilities.

Characteristics of Test Tasks

Task characteristics have very clear influence on task performance. When our intention is to elicit best performance from test-takers, we ought to consider task characteristics so that the test tasks are best suited to elicit their best performance. Especially when each test task is a bundle of characteristics, we need to have a framework for clear understanding. Bachman and Palmer (1996) proposes the following framework to understand and use task characteristics for test development and design.

The framework intends to help us base test tasks on TLU tasks, ensure comparability of test and non-test tasks, and ensure authenticity. The elements of the framework are:

1. Setting

Setting implies physical circumstances. It has three elements.

Physical settings (place, light, furniture, etc.),
Participants (administrators, other participants in group tasks, etc.) and
Time of task (conducted at what time, when test-takers are fresh/tired, etc.)

2. Test Rubric

Test rubric talks about structure and procedures of the test. Elements are:

Instructions: explicit so that test-taker is informed how to take the test, how it is scored, and how scores are used; Language of instruction, its presentation and specification of procedures must be conducive.
Structure: how parts are put together to form the entire test.
Time allotment for each item, and the entire test.
Scoring method: Criteria of correctness, scoring procedure and explicitness of both of these must be informed clearly.

3. Characteristics of Input

Elements are:

Format
Channel- aural, visual or both
Form- language, non-language or both
Language- native, target or both languages
Length of input texts
Type of input- item or prompt
Degree of speededness- how fast the testee must process the input
Vehicle- how the input is delivered: live, reproduced or both
Language of input- organisational (grammar, vocabulary, syntax, morphology, etc.) and pragmatic (functional and sociolinguistic) characteristics, and topical (personal, cultural, social information) characteristics.

4. Characteristics of expected response

Elements are:

Format
Type of response expected: selected, limited production or extended production
Degree of speededness- time available/needed to process
Language- native, target or both languages

5. Relationship between Input and Expected Response

Elements are:

Reactivity: how input or response directly influences subsequent input/responses

reciprocal tasks: with an interlocutor- has feedback and interaction
non-reciprocal tasks: no feedback or interaction
adaptive tests: new development. Subsequent tasks are varied in difficulty depending on previous response

Scope of relationship: The amount of language to be processed in order to respond as expected

broad scope- like in a prompt question
narrow scope- needs to process only limited amount of available input.

Directness of relationship: whether expected response is based directly on input or also on other background information/knowledge

Direct
Indirect

Application of this Framework: To compare TLU and test task characteristics, and to create new tasks by assembling different task characteristics.

Summary from Bachman and Palmer (1996)

Wednesday, 24 May 2017

Six qualities of Test Usefulness by Bachman and Palmer

In their book Language Testing in Practice: Designing and Developing Useful Language Tests, Bachman and Palmer (available at this link) spends an entire chapter to talk about Test Usefulness. They take this effort since usefulness is the foremost quality of a test. If the test is not useful, there is no point in having it at all.

Test Usefulness has six qualities or elements. They are discussed in brief below.

1. Reliability
Reliability is about consistency. Two versions of a test must provide comparable scores. Two sets of test takers of comparable language abilities must give comparable scores. Same test administered after a period of time must also deliver comparable results. If the tests provide very different scores under these circumstances, we cannot trust or rely upon the test. Though it is not possible to attain 100% reliability at all times, it is a very necessary quality for any good test.

2. Construct Validity
We interpret the score of a test. We make decisions based on the score of a test. We promote students, rank them, etc. based on test scores. How do we justify such actions? It is not enough just to claim that our judgments or decisions are justified. We ought to demonstrate it. Construct validity implies that the test score reflects the areas of language ability the test claims to measure. To do this, the test must define what constructs measured in a particular test. Construct validity can be defined as the extent to which the test score can be interpreted as an indicator of the abilities or constructs of the test taker, that we measure. Another way of defining it is 'the correspondence between the characteristics of test task and Target Language Use (TLU) task to which we want to generalise our test score. Therefore to define construct validity, we need to define TLU task characteristics and constructs to be measured.
What ensures Construct validity? a) correspondence of characteristics of test task to TLU tasks, b)engagement of testees' language abilities by the test task characteristics.
Note that construct validation is an ongoing process, and that no interpretation is absolutely valid.

3. Authenticity
Authenticity is the correspondence between TLU task performance and test task performance. By ensuring that TLU and test tasks have same characteristics, we can ensure authenticity. Generalisation beyond test tasks depends on authenticity. Therefore, this is a critical quality/feature of test tasks.

4. Interactiveness
It is the extent and type of involvement of the test takers' individual characteristics in accomplishing a test task. Three major individual characteristics are language ability, topical knowledge and affective schemata. These characteristics interact with the test task characteristics. This interaction can be controlled by monitoring test task characteristics.

5. Impact
Impact is on test takers, teachers, and the society in general. Test takers' experience of taking the test, preparing for the test, receiving the feedback and facing the decisions based on test score are impacts on test taker. For teachers, a test might mean change or adjustment of instruction style, teaching materials, assessment and feedback. For the society at large, test methods imply allocation of funds, changes in decisions, arrangement of other facilities, infrastructure, etc. Characteristics of testing situation are also important factors.

6. Practicality
Practicality is about the feasibility of a test. The elements influencing this will be human resources, material resources and time. If one of these is not available, the test may not be practical.

For each testing situation/context we need to have a balance of these six elements so that the test is useful. Take one of these elements away, and the test becomes useless. For example, if all the first five qualities are there in a test, but the test is not practical, the test is not useful. Therefore, while designing or adapting tests, we need to carefully consider all these test qualities.

Saturday, 5 August 2017

Reliability and Validity in Language Assessment

Thursday, 25 May 2017

Characteristics of Tasks for Language Tests

Wednesday, 24 May 2017

Six qualities of Test Usefulness by Bachman and Palmer

Amazon.in