Ограничения и возможности языковой оценки на основе задач limitations and possibilities of task-based language assessment

Zhao Xiaojing

level, security, quality, health, family environment, social environment factors and so on. In addition, object insecurity in many cases achieved because of human behavior.

The construction of the railway safety culture construction is very important. However, Staff security awareness, quality skills are not only rely on the system management and restraint, we must establish a sound education and training mechanisms, adopt a flexible and diverse forms of education methods to achieve the expected results. There are many other ways to improve the safety awareness of the staff, such as Safety cases discussion, performance, seminars visiting, publicity and entertainment in order to continuously improve staffs' safety awareness. In addition, "The Employee Relation Index "(ERI) questionnaire is a good tool to allocate staffs, and to reduce worker's unsafe behavior. 3.5. Strengthen scientific research and promotion work

Firstly, it is advisable to vigorously develop protective equipment in order to better protect employees' safety. Secondly, enterprises should strengthen cooperation and exchanges, and learn from previous experiences. Thirdly, with reference to the Causal Chain Model theory, analyses and researches of a large number of the actual incidents are necessary, so that simple and intuitive accident model could be built to guide enterprises running smoothly. Fourthly, under the guidance of the Causal Chain Model theory, in order to prevent the reoccurrence of

Zhao Xiaojing

similar accidents, people can take effective preventive measures according to accidents frequency and characteristics. Fifthly, it is necessary to analyze relationship between accidents frequency and operation time, operation proficient level, age, sex, to grasp regular principles, and work effectively and orderly. In addition, in order to provide necessary financial assurance, investment on security of railway enterprises should be enlarged.

REFERENCES

1. Bai Zhongwei. Discussion on strengthening personal safety management of railway transport enterprises // Railway Transport and Economy. 2003. V. (5) P. 44-46.

2. Zheng Guohua, Thinking of improve and perfect the legal system of the railway transport safety // China Safety Science Journal. 2007 V. (5). P. 4753.

3. Ministry of Railway. The 2006 report on railway labor safety work // Railroad labor safety. Health and environmental protection. 2007. V. 4.

4. Lin Zeyan, chief editor, accident prevention and practical technology // Science and Technology Literature Publishing House, 1999.

5. Chen Lu, Thinking of fatal accidents analysis of railway transportation enterprises during "20002005" period and what to do next // Railroad labor safety, health and environmental protection, 2007. V. 2

y^K 004.43

LIMITATIONS AND POSSIBILITIES OF TASK-BASED LANGUAGE ASSESSMENT1

1. Introduction

Over the last two decades, communicative language teaching has aroused wide interest among educators. Correspondingly, great emphasis has been laid on the improvement of students' ability to use second language as a tool to solve real life problems in different situations. Task-based instruction, an approach orienting towards the achievement of communicative goals in the real world, is widely applied as a representative of communicative teaching. A reflection of the development of such pedagogical approach is the popularity of performance language assessment, among which task-based language assessment

(TBLA), with the characteristic of using real world tasks "as the fundamental units of analysis" (Long and Norris, 2001:600), is arousing more and more attention in the area of language testing. The limitations as well as the great potentials of TBLA have been of great concern for language testing designers and researchers.

With an attempt to explore the characteristics of TBLA and to detect its limitations as well as its possibilities, this essay will firstly, identify something re-

2 This is a working paper for a Jiangxi provincial teaching and research project.

ported in the literature concerning the construct of TBLA. Then some issues as core to the selection of tasks and task difficulty will be introduced. In the fourth and fifth section, of great concern are two basic issues - validity and reliability of TBLA. Inherently related they are, they will be explored separately. The sixth section will shed light on practicability concerns of TBLA. The final section will dwell on the potential wash-back effect of TBLA on language teaching. The purpose of writing this essay, then, is to explore the core features of TBLA - its limitations as well as its possibilities to provide food for thoughts, both for language teaching and language testing research. 2. Construct of TBLA

The most important question to be asked of any test, according to Alderson (1981), is what is it measuring? Therefore, it is not surprising that the construct of TBLA is particularly of a great concern. Is there any difference between the construct of TBLA and that of traditional language tests?

Conventional test methods, be it psychometric language tests represented by the form of multiple choice, or interrogative language tests represented by the forms of cloze and dictation, intend to measure candidates' established knowledge about the target language rather than the actual use of the knowledge. Psychometric tests are typical of assessing candidates' knowledge of discrete skills such as vocabulary, grammar, etc., but one may argue that interrogative language tests, while presenting test-takers a wide range of structural and lexical items in a meaningful context, tend to elicit holistic L2 competence of using the language.

The fact is, in spite of the fact that interrogative language tests are helpful in determining the basic level of language proficiency of a given candidates, they fail to provide any proof of the test takers' ability to actually use the language (Morrow, 1981).

Unlike those conventional tests mentioned above, TBLA focuses on assessing candidates' direct use of language. When it comes to the construct of TBLA, things tend to be a bit complicated. There are two distinctive approaches to TBLA development -one is construct-centred approach, the other is task-centred approach (Bachman, 2002).

On one hand, the advocates of construct-centred appro ach believe that the inferences to be made of TBLA are about underlying language ability. Deville (2001) believes that in designing language assessment tasks, language testers and researchers need to include the knowledge and skills that underlie the language construct. Such specifications should be informed by theory and research on the language construct and the language-learning process as well as by systematic

observations of the particulars in a given context. (ibid: 225)

On the other hand, those proponents of task-centred approach hold the view that TBLA is neither to measure candidates' display of discrete linguistic knowledge, nor to elicit learners' defined levels of language ability, but to measure their overall ability to use the language. To be more exact, the objective of TBLA is to measure whether students can use the target language to accomplish target tasks, or, "the construct of interest is performance of the task itself' (Long and Norris, 2001:600).

The distinction of these two approaches, as is evident, points out two ways to interpret test scores. What type of inferences will a test user or stakeholder make - language proficiency elicited from the task or the degree of success in the accomplishment of certain tasks? Considering complexity nature of real-world tasks, which approach to choose and what inferences to make becomes a big issue. Such a validity issue will be re-examined in Section Four of this essay. I would like to point out that through the TBLA literature, it seems that task-centred approach has attracted increasing attention among test researchers and great effort has been made to investigate the construct validity concerning this approach.

However revolutionary this approach looks, to build one's judgement simply on the degree of accomplishing tasks seems to be an over-simplistic view of construct of TBLA. One of the potential problems of this approach, according to Bachman (2002), is that this approach is inappropriate in that it may be impossible to make inferences beyond a test result. Moreover, many educational purposes, such as diagnosis and assessing the achievement of learning objectives remain unattainable through this approach. Another, maybe more fundamental problem raised by Bachman is the kinds of inferences those proponents of task-centred claim, the inferences about the predictions about future performance on real-world tasks. He argues that because of the complexity and diversity of real life domains, it is impossible to generalize across tasks. Bachman puts forward an integrated approach!2, that is, to base our assessment on both tasks and constructs in the way they are designed, developed and used. Bachman argues that such integrate approach, makes it possible for test users to make a variety inferences about the capacity for language use that test-takers have, or about what they can or cannot do. (ibid: 471)

I Bachman, at the same time, admits that task specification

will present challenge to such a integrate approach, which

indicates that this is no easy job, either.

СИСТЕМНЫМ АНАЛИЗ И МЕЖДИСЦИПЛИНАРНЫМ ПОДХОД В ИССЛЕДОВАНИЯХ

Bachman's integrative approach seems to be reasonable since under most circumstances, inferences made by test users are a combination of the candidates' language competence and the possibility to fulfil certain tasks. Where an assessment is developed to meet the needs of certain specific domain, where needs analysis 3 is thoroughly carried out, task-centred approach is still recommendable. One example is that to measure international air traffic controllers' ability to identify numbers and directions in spoken discourse 4, the purpose of assessment is very specific -to measure whether candidates have the ability to accomplish certain task. In this case, task-centred approach only is absolutely sufficient. 3. Selection of tasks and task difficulty 3.1. Selection of tasks

Regardless which approach is to be taken, in designing a TBLA, selection of tasks becomes a key issue. The particular tasks we provide in our assessment will form the basis for one part of a validity argument, that is, of content relevant and representativeness (Bachman, 2002). What tasks are better at eliciting test takers' best performance to predict their future real world task accomplishment? Performance on which task is predictive of the performance in real-world situations? How confident we are when we make such decisions?

To answer these questions proves no easy job, which involves a careful analysis of characteristics of tasks. Problems raised by Ellis (2003), including (1) representativeness; (2) authenticity; (3) generalizabili-ty; (4) inseparability; and (5) reliability, pose a lot of limitations on the selection of tasks. The suggestion of needs analysis by Branden et al (2002) seems to be of great help to solve these selection problems.

Branden describes the development of TIBO, a computerised task-based test of L265 Dutch for vocational training purpose 6. In the procedure of selecting the task situation, needs analysis was highlighted. Comments of experts involved in this vocational training such as counsellors, language teachers and vocational trainers served as the basis for the selection of situations and tasks in TIBO. Then some certain situations were eliminated by reasons of practicality, economy and methodological reasons.

Brandan et al's approach seems to be informative in that by means of needs analysis, chances are

I The detail of needs analysis will be described in Section Two.

14 This example is originally from Alderson et al (1995:150).

15 Second language.

16 This program will be discussed again in Section Three concerning predictive validity.

that we may successfully select the most representative tasks. Indeed, the tasks in TIBO may well represent the tasks those candidates will be expected to accomplish in their future vocational training course. Needs analysis is functional in selecting tasks within a specific domainl8. However, needs analysis may not ultimately settle the whole issue of task selection. Difficulty may arise where test takers are from a variety of backgrounds and where they have broadly ranging needs in real world situation. 3.2. Task difficulty

In designing TBLA, another big issue is task difficulty. Fulcher(1996) collected students' perceptions of several tasks. One interesting finding here is that students of lower ability have more positive opinion on the validity of a certain task, but as the language ability increase, students' perceptions of validity decrease. From this finding, one may reasonably assume there must be a relationship between task type, task difficulty and students' ability. This is a point that arouses great interest of researchers.

Two general approaches have been identified in predicting task difficulties. One is methodological approach, the other is conceptual approach. The former approach is to identify a number of task features that is independent of ability and then investigate the relationships between these characteristics and empirical indicators of difficulty (Bachman, 2002).

Brindley and Slatyer (2002) investigate comparability problem!8 in the context of the Certificates in Spoken and Written English (CSWE), a framework used within the Adult Migrant English Program (AMEP) in Australia. They aim to design tasks to elicit the same behaviours and plan to administer the tasks in the same way in order to assess the same competence. In this study, variables that affect test scores are carefully examined and selected as follows: speech rate, text type, number of hearings, input source (live vs audio-recorded) and item format. Three assessment tasks, of which various combinations were administered to 284 ESL learners, are used. The study observes that two variables - speech rate and item format influence item difficulty. However, as is pointed out by Brindley and Slatyer, due to the complexities of the interactions between task characteristics, item characteristics and candidate responses, simply adjusting one task-level variable will not automatically make the task easier or more difficult. A different study conducted by Elder et al (2002) inves-

I In Branden et al (2002), the specific task domain is the

vocational training course in the industrial sector.

98 Comparability problems refer to the lack of generalizabil-ity between tasks.

tigates whether performance conditions have great impact on candidates' perceptions of task difficulty or whether different manipulation of tasks influences their language performance. However, the results indicate no systematic variability between task conditions and perceptions of task difficulty.

The alternative conceptual approach is to explicitly identify "difficulty features" which are essentially combinations of ability requirements and task characteristics that are hypothesized to affect the difficulty of a given task (Bachman 2002). Task difficulty matrix is designed by Norris et al (1998) to explore ways to differentiate and sequence assessment tasks according to their difficulty levels. As is illustrated in Table 1, this difficulty matrix involves the combinations of three variables representing code complexity, cognitive complexity and communicative demand10.

Table 1

Assessment of language performance

task difficulty matrix (Adapted from Norris et al 1998: 77)

easy^difficult easy ^difficult

range number of input sources

code complexity - + - +

input/output organization input availability

cognitive complexity - + - +

mode response level

communicative demand - + - +

Note: In the task difficulty matrix, a minus sign always indicates less difficulty with respect to component and characteristic relative to the given task, whereas a plus sign always indicates greater difficulty.

A project concerning language processing complexity was conducted by Norris et al (2002). One of the objectives of this project is to adapt notions of cognitive complexity into a framework, the so-called task difficulty matrix as in Table 1, for making quick estimates of "task difficulty", in order to generalize across tasks. Unfortunately, this study suggests that estimates of cognitive task difficulty were not related systematically with task success (Norris, 2002).

All these point towards the fact that difficulty is a characteristic that does not reside in the task alone. They are a result of interaction of different variances (Bachman, 2002). In designing a TBLA, task features

and characteristics of testee's must be carefully examined. Every effort must be made to eliminate those possible validity irrelevant variances. 4. Validity concerns

Test validation, which used to give an emphasis on the test items itself as a basis for validation, has evolved to construct-based investigations that focus on test score interpretation and use ( Deville 2001:211).

How to make trust-worthy inferences from the scores? Will the scores effectively predict candidates' future performance? Or shall we generalize from the test scores language competence of the candidates? Since construct concerns have been reported in Section Two, the following parts will validate TBLA through face validity, content validity and predictive validity, although some of those aspects are more or less overlapping.

4.1. Face validity

Compared with other tests, TBLA, which put its candidates directly to the simulated real-world activities, will be better at measuring the candidates' competence level in accomplishing real-world task, that is, TBLA is perceived as an approach facilitating a close correlation between what the test takers have to do during the test and what they usually do in the real world, thus retaining high face validity.

Moreover, stakeholders or test users must feel that TBLA provides them with a more trustworthy assessment since it directly measures the candidates' overall task fulfilment rather than discrete-point linguistic knowledge.

4.2. Content validity

In terms of content validity, things become complicated. On one hand, TBLA reflects the syllabus in most of the cases in that theoretically, most of linguistic syllabi place great importance on the students' communicative competence. Although some syllabi do give priority to discrete-point instruction which focuses on grammar and structure teaching, the final goal of such instruction is, exclusively, the use of the language. The tasks in TBLA still elicit the linguistic points such as accuracy, complexity and fluency (see Skehen, 1998) to contribute to the accomplishment of the certain tasks. Three attributes of tasks, namely, learner-centred properties, contextualisation and authenticity, are put forward by Deville (2001) as task features identified in the L2 instructional field. She goes on to argue that these three attributes, which pertain to test design and construction, address content validity issues11.

Norris et al base these three dimensions on Skehan's

framework of task complexity. (Skehan, 1998: 99)

However, Deville insists that content validation needs to be complemented with construct validity research.

On the other hand, since the content of TBLA is expected to reflect the real world communication, whether the selected test task is representative of a wide range of content domains remains an issue. Some problems in investigating content validity have been identified by some researchers. The problems are resulted firstly, from the difficulty of defining the target language use (TLU) domain and secondly, from selecting representative samples from that domain (Bachman, 2002). One may notice that this issue has much to do with what we have described in the selection of tasks, in which the needs-analysis approach is recommended.

The degree of content validity determines how accurate a prediction can be made. In this sense, content domain variances also affect the predictive validity, an aspect we will illustrate in the next paragraph. 4.3. Predictive validity

TBLA meets its challenge when it comes to predictive validity. As is mentioned in Section Two, TBLA is supposed to measure test takers' future performance, that is, it can serve as a predictor of how well a test taker will use the target language under specific L2 domain. Norris et al (2002 ) collected data of performance by 90 examinees on 13 tasks reflecting a variety of general domains of language use, trying to evaluate the predictive value of examinee's competence in accomplishing certain tasks. The findings of this study indicate that TBLA could inform intended inferences about examinee's likely abilities with a domain of tasks. However, the findings cannot serve as trustworthy basis for predicting examinee's likely ability with other tasks. The problem lies in the fact that the study fails to find any relationship between task difficulty levels and success in accomplishing the task, an issue we have discussed in Section Three.

It may be interesting to re-examine Branden et al (2002)'s program - TIBO12. In this program, the researchers design a series of tasks based on needs analysis and subjects it to pilot testing. Subjects involved three groups: one target group, two comparison groups including one of beginners and another one of the advanced. The pilot test scores do suggest a tendency to discriminate the beginners' group from the advanced. However, they do not perfectly discriminate the three groups, thus allowing no firm conclusion about the predictive value of the pass/fail score. A follow-up study, which involves the comparison of the test score and the actual performance in the vocational training course, is suggested by Branden et al to explore the predictive validity of TIBO.

12 This program has been mentioned in Section Two.

5. Reliability concerns

Although any assessment may inevitably include some aspects of error, the ultimate goal of language test researchers is to provide test users and test takers with reliable results, that is, test designers must make every effort to minimize the sources of errors to the greatest possible degree in order to ensure that the test result is the reflection of test takers' competence rather than anything else.

With regard to TBLA, there are a number of sources which affect the reliability of the assessment. Some task variables are related to the administration of the test procedure. Others may have something to do with the performance conditions or with task characteristics as we have examined in Section Three when discussing task difficulty. Wigglesworth ( 2001) concludes that even small changes in the characteristic and/or condition of task can be shown to influence the score obtained, which means that test designers must take very precise parameters of tasks into consideration in order to provide a fair treatment for every test takers. Besides those related to the task itself, two factors that one have to take into account are marking criteria as well as inter-rater and intra-rater reliability since TBLA is mostly built on subjective judgement. 5.1. Inter-rater and intra-rater reliability

Given that TBLA is basically a subjective procedure, a major problem in TBLA will probably be how to guarantee the inter-rater and intra-rater marking reliability. To improve inter-rater reliability, two or more raters are recommendable to assess the same task in order to balance the tendency of potential bias towards both ends. Some constructive suggestions are raised by Brindley (1994). The first suggestion is to provide sufficient training to the raters. Such rater training involves familiarizing those raters with the rating criteria and practice in utilizing the criteria to samples of performances across a range of ability levels. According to Brindley, such training will to a large extent insure reliability, however, at the same time he points out that a rater's tendency for severity or leniency in judgements seems to be unchanged even after rater training. This is a factor that affects reliability and cannot be easily accounted for or eliminated from the judgements and certainly it poses a threat to test reliability. To solve this problem, Brindley puts forward further suggestion, that is, to make use of measurement techniques - one such tool is item response theory (IRT), the other is multi-faceted Rasch model.

Measurement using IRT is designed to cope with the perception of difficulty13. Based on probability theory, IRT shows the probability of a given person getting a particular item right (Alderson et al,1995:90). Multi-faceted Rasch model, a family of technique IRT, allows candidate ability and item difficulty to be estimated independently and reported so that it adjusts candidate ability to estimates to take account of rater's tendency to rate either harshly or leniently (Brindley, 1994). 5.2. Assessment criteria

Writing assessment criteria into TBLA specification is quite demanding. External rating scales (also referred to as "bands") seem to be the most commonly used method in assessing task performance (Ellis, 2003). To define what someone "can do" (Men-desohn, 1989, cited by Brindley, 1994:73) indicates not only describing the tasks but also providing explicit and easy-to-recognize descriptors. Skehan (1998) argues that in writing such descriptors, dimensions chosen must be connected to some underlying theory of language or language cognitive process. His personal preferable dimensions include accuracy, complexity, and fluency. A question raised by Ellis (2003) is: how linguistic competency should be specified, to adopt a holistic method or analytic method14?

To define linguistic competency, a range of different approaches has been recommended by Brindley (1994). The first approach is by "expert judgement", that is, to ask expert judges to identify and sometimes to weight the key features of learner performance which are to be assessed. This approach appears to confirm to common sense since experts are always regarded as someone who will provide reliable advice. However, studies indicate that in making such kind of judgement, even experts can seldom arrive to a consensus. Alderson and Lukmani (1989, cited by Brindley, 1994) investigate an examination of item content in EFL reading tests and find that judges' opinion on a range of factors15 is far from being uniform.

13 As we have described in Section Three, the same task may be perceived as difficult by students of lower level but as easy by students of higher proficiency.

14 By analytic method, Ellis means to identify the four language skills for rating separately. And he also reports that many performance tests include both methods. Alderson et al (1995) points out that it is up to the institution to decide whether or how to combine these different ratings to provide an overall mark.

15 Those factors include what particular items are testing,

how difficult each item is and which particular level should

the item be assigned to.

By way of conclusion, a suggestion may be raised here that due to the within-group and between-group differences in experts' opinion, expert judgement should not be regarded as the only basis for determining assessment criteria.

Another approach provided by Brindley is to adopt those criteria that already existed, that is, to use their marking scale descriptors as guidance. However, one issue of using this approach is whether such general rating criteria is suitable to define the particular tasks.

The third approach by Brindley is genre-based approach, that is, to describe and assess language task performance that is underpinned by a powerful linguistic theory. However, to reach a systematic and trust-worthy conclusion, the amount of empirical research work one has to be involved is too much that this approach is doomed to meet its practicability challenge.

The research seems to have reached no solution, and it is obviously impractical to wait for a definite answer. The easiest approach for a test designer now, is to refer to an existed framework to define language proficiency. Of the major concern is that those established rating scales may be too holistic or too abstract16 to measure the precise difference between test takers, especially those scored at borderline. The rating scales of Common European Framework of Reference for Languages: Learning, teaching, assessment (CEF)17 seems to be recommendable. Instead of describing language competence in an abstract way, the descriptors used in CEF are based on "can do" principle, thus it may provide both raters and test users practicable, easy-to-approach criteria.

In all, there exist reliability concerns that will result in the uncertainty in TBLA and such uncertainty also arises with any forms of testing which involve subjective judgement. However, careful and considerate administration will surely minimize such kind of uncertainty.

6. Practicability of TBLA18

One reason that some test designers are hesitant in adopting TBLA may be due to the difficulty of administering such tests. One thing is that in administrating TBLA, even subtle variance may change the

For example, some exam adopts criteria such as "nativelike proficiency" or "expert users". Those criteria are abstract and disputable in themselves and without any universal standard.

17 The detail of CEF is accessible through the website: http://www.culture2.coe.int/portfolio/documents/05218031 36txt.pdf

18 See also Norris et al (1998), Brindley (1994).

iНе можете найти то, что вам нужно? Попробуйте сервис подбора литературы.

inferences of the test results. As is mentioned above, administrative variances pose a great threat to the validation of TBLA. The other aspect is that administrating TBLA is rather time-consuming and expensive. In traditional paper-and-pencil tests, one teacher may design a test and adopt the same test to a group of students at the same time and rate the papers afterwards. The procedure seems simple yet trust-worthy. However, when it comes to TBLA, normally one rater has to rate candidates one by one, that is, while one candidate is being tested, other candidates have to wait outside. Considering that the topics of the tasks are easy to be spread, chances are that some candidates may get the information in advance and may have a good preparation. To prevent such kind of cheating, examiners have to take some action to minimize the possibility that one candidate gets test information from others ahead of being tested. One precaution may be to design a wide range of topics to reduce the appearing frequency of the same task topic; another precaution is to employ more hands - either to serve as extra raters19 or to keep the other candidates in a waiting room. Both ways, as is evident, are rather energy-consuming and expensive. Investment on such administration, together with the investment on teacher training on the rating reliability we have mentioned above, constitutes a big practicality problem. In case of large-scale test, such problem becomes even conspicuous.

It seems that the issue of practicality intends to lead to the conclusion that TBLA is quite undesirable. However, this is not necessarily the case. The potential validity we have mentioned above and the favourable wash-back gains to be presented in the following section are more than sufficient to outweigh such practicability disadvantage. 7. Positive wash-back effect of TBLA

In spite of some limitations we have mentioned above, a number of positive features of TBLA have been perceived by researchers and language teachers as means of promoting "positive wash-back effects of assessment practices on instruction" (Mislevy et al, 2002:477).

Compared with traditional tests, TBLA tends to raise both the instructors and learners' awareness of using language as a tool rather than taking language itself as an end. As a result of such assessment, undoubtedly communicative language teaching and learning will be attached more importance. Moreover, judging learners' language competence through meaningful use enables both the teachers and students to be

19 As is mentioned before, this will result in inter-rater

reliability concern.

conscious of the strength and the weak points of both teaching and learning. In this way, it provides them with diagnostic feedback and promotes effective teaching and learning.

Ellis (2003) puts forward the idea that TBLA plays an important part in formative assessment20. He defines TBLA broadly by integrating the on-going contextualized assessment undertaken in language classroom into it. Then he further divides the involving tasks as planned and incidental, thus forming planned formative assessment and incidental formative assessment. The former assessment involves the classroom use of direct tests, while the latter refers to the ad hoc assessment that teachers (and students) carry out as a part of the performing a task selected for instructional purpose. Both kinds of task-based formative assessment, according to Ellis, will provide explicit and systematic information and instruct teaching and learning.

Ellis's proposal TBLA is a good agent to be used as a formative assessment is rather convincing, although one may be slightly in doubt whether Ellis is trying to mix testing with instruction because task-based incidental assessment, due to its tentativeness and informality, would better be classified as reflection of instruction rather than assessment. Whatever name it is given, one thing is certain, TBLA, if managed well, will to a great extent give inspiration to both teaching and learning.

TBLA, a typical communicative assessment, sometimes may change people's conception of teaching to such a large extent that it may facilitate communicative instruction or even reshape the entire curriculum to a meaning-focused orientation. A case in point is in Byrnes (2002), where he discusses how a task-based writing assessment system developed by teachers in a German program contributes greatly to an enhanced knowledge base and a new educational culture. In this study, the role of tasks and TBLA are explored to such an extent that, as Byrnes advocates, they "shifted its entire undergraduate curriculum from a form-based normative approach to a language use and language-meaning orientation for instruction" (ibid: 419). Study of Byrnes also reminds us that involving practitioners into the development of task-based classroom assessment will most probably make great differences in their teaching orientation, thus creating considerable positive wash-back. 8. Conclusion

From the above analysis, we may find that the value of TBLA is by no means ignorable: it provides a means of measuring actual use of language directly; it

20 He first confirms the summative role of TBLA.

ИРКУТСКИЙ ГОСУДАРСТВЕННЫЙ УНИВЕРСИТЕТ ПУТЕЙ СООБЩЕНИЯ

inspires teachers and learners to focus on language as a tool rather than as fixed knowledge; it allows both teaching and learning to monitor progress and tailor orientation. At the same time, designing TBLA is no easy job. There are a lot of limitations ranging from construction to scoring. Therefore, the design of TBLA requires considerable expertise. The test developers have to make every effort to identify potential threats to test validity and endeavour to eliminate most, if not all, of them. After all, it is up to test developers to provide valid and reliable tests for stakeholders and test users. Only through the hard work of test developers will sound inferences based on the test results be made.

REFERENCES

1. Alderson J. C. Report of the discussion on communicative language testing. : Issues in language testing / ed. : Alderson, J. C., Hughes, A. London : The British Council, 1981.

2. Alderson J. C., Clapham C., Wall D. Language Test Construction and Evaluation. Cambridge : Cambridge University Press, 1995.

3. Bachman L. F. Some reflections on task-based language performance assessment // Language testing. 2002. Vol. 19, № 4. P. 453-476.

4. Branden K., Depauw V., Gysen S. A computerized task-based test of second language Dutch for vocational training purposes // Language testing. 2002. Vol. 19., № 4. P. 438-452.

5. Brindley G. Task-centred assessment in language learning: the promise and the challenge / ed. : Bird N., Falvey P., Tsui A.B.M., Allison D. M., McNeill A. // Language and learning. 1994. P. 73-94.

6. Brindley G., Slatyer, H. Exploring task difficulty in ESL listening assessment. Language testing. 2002. Vol. 19, № 4. P. 369-394.

7. Byrnes H. The role of task and task-based assessment in a content-oriented collegiate foreign language curriculum // Language testing. 2002. Vol. 19, № 4. P. 419-437.

8. Deville M. C. Task-based assessments: Characteristics and validity evidence / ed. : Bygate M., Skenhan P., Swain M. Researching pedagogic tasks: second language learning, teaching and testing. Harlow: Pearson Education Limited. 2001

9. Elder C., Iwashita N., McNamara. Estimating the difficulty of oral proficiency tasks: what does the test-taker have to offer? // Language testing. 2002. Vol. 19, № 4. P. 347-368.

10. Ellis R. Task-based language learning and teaching. Oxford : Oxford University Press. 2003.

11. Fulcher G. Testing tasks: issues in task design and the group oral // Language testing. 1996. Vol. 13, № 1. P. 23-51.

12. Long, M., Norris J. Task-based language teaching and assessment // language teaching & learning. London : Routledge. 2001.

13. Mislevy R. J., Steinberg L. S., Almond R. G. Design and analysis in task-based language assessment // Language testing. 2002. Vol. 19, № 4. P. 477-496.

14. Morrow K. Communicative language testing: revolution or evolution? / ed. : Alderson J. C., Hughes A. // Issues in language testing. London : The British Council, 1981.

15. Norris J. Editorial // Language testing. 2002. Vol. 19, № 4.

16. Designing second language performance tasks / Norris J., Brown J. D., Hudson T., Yoshika J. Honolulu : University of Hawaii Press, 1998.

17. Examinee abilities and task difficulty in task-based second language performance assessment / Norris J., Brown J. D., Hudson T., Bonk W. // Language testing. 2002. Vol. 19, № 4. P. 395418.

18. Skehan P. A cognitive approach to language learning. Oxford : Oxford University Press, 1998.

19. Wigglesworth G. Influences on performance in task-based oral assessment : Researching pedagogic tasks : second language learning, teaching and testing / ed. : Bygate M., Skenhan P., Swain M. Harlow : Pearson Education Limited, 2001.

Аннотация научной статьи по языкознанию и литературоведению, автор научной работы — Zhao Xiaojing

Похожие темы научных работ по языкознанию и литературоведению , автор научной работы — Zhao Xiaojing

Текст научной работы на тему «Ограничения и возможности языковой оценки на основе задач limitations and possibilities of task-based language assessment»