Test Adaptation: Methods and Pitfalls

00:32:55
https://www.youtube.com/watch?v=uU0gDMJwocw

Summary

TLDRKurt's talk on test adaptation emphasizes a comprehensive understanding of linguistic and cultural nuances essential for fair assessments. He draws on his experience from various international assessments, stressing that test adaptation goes beyond mere translation; it involves a deep understanding of the constructs being measured and the cultural contexts of test-takers. Kurt also addresses the ethical implications of test adaptation, such as the necessity of obtaining permission for the use of copyrighted tests. His discussion includes methodologies for effective adaptation and highlights the ongoing need for rigorous research in this field to improve the quality and fairness of test assessments worldwide.

Takeaways

  • 🌍 Test adaptation involves more than translation; cultural context is vital.
  • 📝 Ethical considerations, including copyright, are essential in test usage.
  • 🔍 Back translation may not be the best method for ensuring quality.
  • 📊 Rigorous research is needed to ensure the reliability of adapted tests.
  • 🌐 Language differences can introduce bias in test outcomes.
  • 🤝 Collaboration with local experts enhances adaptation processes.
  • 📚 Clear guidelines help in maintaining fairness across international assessments.
  • 💡 Understanding local education systems is crucial for effective test adaptation.
  • 🎯 Adapted tests should maintain validity and construct equivalence.
  • 🎷 Test adaptations should account for cultural variations in response styles.

Timeline

  • 00:00:00 - 00:05:00

    The speaker introduces Kurt, the director of a testing center, who discusses test adaptation. He plans to cover historical perspectives, current practices, and future considerations, focusing on the implications for language minorities and individuals with disabilities in assessments.

  • 00:05:00 - 00:10:00

    Kurt shares his early experiences in test adaptation, having interned at ETS in the 1970s. He emphasizes the need for adaptation rather than mere translation, highlighting the complexities involved in cross-cultural testing as a burgeoning field, especially with multinational corporations aiming for global assessments.

  • 00:10:00 - 00:15:00

    He discusses the necessity of obtaining permission for using measures, citing issues of copyright, especially after an incident involving a retired professor demanding payments from students. Kurt underscores the professional courtesy of seeking permission, even when not legally required.

  • 00:15:00 - 00:20:00

    The talk explores the pros and cons of adapting tests, including cost-effectiveness, the need for international standardized measures, and psychological considerations. He stresses that adapting tests can present validity and fairness issues, especially when differing cultural contexts influence content comprehension.

  • 00:20:00 - 00:25:00

    Kurt discusses cultural variability in performance assessments and presents examples of test adaptations across countries, illustrating that seemingly trivial differences can yield significant variations in outcomes. He addresses the importance of understanding cultural context when comparing test performances across different demographics.

  • 00:25:00 - 00:32:55

    He concludes by emphasizing the need for rigorous research on adapted measures and the potential pitfalls involved in equating assessments across languages. Despite the financial allure for testing companies, he urges caution and thorough investigation into the validity and applicability of adapted tests.

Show more

Mind Map

Video Q&A

  • What is test adaptation?

    Test adaptation refers to modifying assessments to fit different languages and cultures, ensuring they are relevant and fair across various demographic groups.

  • Why is cultural understanding important in test adaptation?

    Cultural understanding is crucial because it affects how test items are perceived and answered, influencing the fairness and validity of the assessment.

  • What are some challenges in test adaptation?

    Challenges include differences in education systems, cultural biases, translation errors, and copyright issues.

  • What is the difference between translation and adaptation?

    Translation focuses solely on converting language, whereas adaptation considers cultural context and regional relevance.

  • What skills are required for effective test adaptation?

    Skills include fluency in languages, understanding of cultural nuances, and psychometric principles.

  • Why should permission be obtained for test adaptation?

    Obtaining permission respects copyright laws and acknowledges the intellectual property of original test developers.

  • What is 'back translation' in test adaptation?

    Back translation is a process where a translated test is re-translated back to the original language to check for consistency and accuracy.

  • What role do ethics play in test adaptation?

    Ethics ensure that adaptations respect the rights of original developers and promote fairness in testing practices across cultures.

  • What are some methodologies for adapting tests?

    Methodologies include double translation, pilot testing, and cross-cultural reviews to ensure the adapted test's equivalence.

  • How can adaptations affect test outcomes?

    Adaptations can introduce biases or discrepancies that affect the reliability and validity of scores across different cultures and languages.

View more video summaries

Get instant access to free YouTube video summaries powered by AI!
Subtitles
en
Auto Scroll:
  • 00:00:10
    and I have the pleasure of introducing
  • 00:00:12
    Kurt guys under who's the director of
  • 00:00:14
    the borough Centre for testing and the
  • 00:00:17
    WC Meier Henry distinguished professor
  • 00:00:19
    at the University of Nebraska to talk to
  • 00:00:22
    us about test adaptation thanks Kurt
  • 00:00:25
    let me thank Amy both for the invitation
  • 00:00:28
    and the introduction ETS and MHS and
  • 00:00:33
    when I was first asked to do the talk I
  • 00:00:35
    was asked whether I was going to talk
  • 00:00:37
    about past present or future and I told
  • 00:00:40
    her past and that was just based on my
  • 00:00:42
    age at the time but but I think I'm
  • 00:00:44
    going to talk about all three and I
  • 00:00:47
    decided to do this most of the work I've
  • 00:00:50
    done in recent years is unfairness
  • 00:00:51
    either for language minorities or for
  • 00:00:54
    people with disabilities and that was my
  • 00:00:57
    initial thought of what I would talk
  • 00:00:58
    about but instead I've decided to talk
  • 00:01:00
    about tests that in part because the
  • 00:01:05
    International test Commission is meeting
  • 00:01:07
    the summer in Montreal and I'm trying to
  • 00:01:08
    drum our supporter for that and in terms
  • 00:01:12
    of history as Neil's started talking
  • 00:01:15
    about when he was in this room I think I
  • 00:01:18
    beat him because I was an intern at ETS
  • 00:01:20
    in 1975 and we had a meeting in here and
  • 00:01:24
    interestingly there's another person who
  • 00:01:27
    was an intern with me that same year I
  • 00:01:30
    was here and it was amazing that they
  • 00:01:31
    allowed Linda Kuk to come when she was
  • 00:01:34
    only in junior high school at the time
  • 00:01:38
    but I also was in this room I was in
  • 00:01:42
    with Warren Willingham and then he hired
  • 00:01:44
    me back the next year as a research
  • 00:01:46
    associate and I joined the GRE technical
  • 00:01:51
    advisory committee in 1993 and in this
  • 00:01:55
    room they they brought me in a day early
  • 00:01:58
    so they had a session on ER and wine
  • 00:02:00
    well again when he was stepping down as
  • 00:02:02
    an officer of the organization and that
  • 00:02:05
    was in this room as well so it does have
  • 00:02:07
    some history now I also have a history
  • 00:02:11
    with the division five APA I first
  • 00:02:14
    joined the executive committee of the
  • 00:02:16
    division of 1993 25 years ago as well
  • 00:02:19
    and based on discussions I had the Jim
  • 00:02:25
    Butcher who was then the assessment
  • 00:02:27
    coordinator or whatever the it's called
  • 00:02:29
    he was a head of the assessment group he
  • 00:02:32
    and I got into a discussion and he asked
  • 00:02:33
    me to do a paper for psych assessment
  • 00:02:36
    which he was editing at the time and I
  • 00:02:38
    did it on the topic of test translation
  • 00:02:41
    and it's the paper that I've done that I
  • 00:02:45
    probably shouldn't have done because I
  • 00:02:47
    very little history in that topic at the
  • 00:02:49
    time but I was a foreign language major
  • 00:02:51
    as an undergraduate I knew something
  • 00:02:53
    about translation and so that's that's
  • 00:02:57
    what I'm gonna talk about and I'm gonna
  • 00:02:59
    say that some of the logic I'm gonna use
  • 00:03:01
    toward the end of the presentation is
  • 00:03:03
    what I would call journalistic logic I'm
  • 00:03:05
    using examples to make points that runs
  • 00:03:08
    counter proper to the typical method of
  • 00:03:11
    division five but I do tell you that
  • 00:03:16
    second to fifth is when ITC is meeting
  • 00:03:18
    in Montreal and it's at the same time as
  • 00:03:20
    a Jazz Festival there in case you love
  • 00:03:23
    jazz so again I'm going to give a very
  • 00:03:27
    quick overview of test adaptation we use
  • 00:03:29
    the word adaptation rather than
  • 00:03:31
    translation because it's more than
  • 00:03:33
    language you have to involve other
  • 00:03:35
    things other than language and I think I
  • 00:03:37
    will make that point very clearly to you
  • 00:03:39
    with some examples and and what I'm a
  • 00:03:42
    little different about is I've actually
  • 00:03:45
    done some work now on adaptation from
  • 00:03:48
    one language and culture to another of
  • 00:03:50
    some performance assessments through
  • 00:03:52
    OECD and I'm going to show you just how
  • 00:03:56
    difficult it is and and it fits exactly
  • 00:03:59
    in with Neil's comments over the
  • 00:04:02
    difficulty of equation or even linking
  • 00:04:04
    at some point because you start
  • 00:04:06
    questioning whether they're the same
  • 00:04:07
    measures by the time you've done some of
  • 00:04:09
    that why are we doing this kind of work
  • 00:04:14
    well first off there are lots of testing
  • 00:04:17
    companies right now that realize that
  • 00:04:18
    the world has shrunk and there are multi
  • 00:04:20
    cult multinational corporations who want
  • 00:04:23
    to administer the same tests all over
  • 00:04:25
    the world we want to make international
  • 00:04:28
    comparisons
  • 00:04:30
    I think our psychological science is
  • 00:04:32
    getting a little stronger that allows us
  • 00:04:34
    to do that in some cases we've
  • 00:04:36
    recognized the differences between attic
  • 00:04:39
    and emic kinds of measures and there are
  • 00:04:42
    a lot of fiscal and pragmatic reasons
  • 00:04:44
    why this it may be cheaper and easier to
  • 00:04:46
    adapt the test than it is to to build a
  • 00:04:49
    new one now let me give you two
  • 00:04:52
    precursors and can't read the citation
  • 00:04:56
    there but I got quoted in a science
  • 00:04:59
    article this past year I was called in
  • 00:05:04
    the interview
  • 00:05:04
    and so forth but it turns out there was
  • 00:05:07
    a retired professor in California who
  • 00:05:10
    had built a test well he go to a survey
  • 00:05:14
    in the sense of whether you take your
  • 00:05:16
    medications the way you're supposed to
  • 00:05:18
    and it's mostly used by insurance
  • 00:05:20
    companies as part of their tests of
  • 00:05:23
    drugs which they have to do to get
  • 00:05:25
    approval and it turns out after he
  • 00:05:28
    retired he got very sporadic about his
  • 00:05:32
    answering of emails and letters and
  • 00:05:34
    things like that and he had copyrighted
  • 00:05:37
    these scales and he makes it very clear
  • 00:05:39
    that they're very expensive to use
  • 00:05:41
    because after all he's selling them to
  • 00:05:43
    insurance companies but a bunch of grad
  • 00:05:45
    students wrote and said can I use this
  • 00:05:47
    for my master's thesis or doctoral
  • 00:05:49
    dissertation he didn't answer them and
  • 00:05:52
    they went ahead and used it anyhow and
  • 00:05:54
    he then sent them bills for upwards of
  • 00:05:57
    $20,000 each and the question was is
  • 00:06:02
    that appropriate and it's a very complex
  • 00:06:05
    question it's not a simple question
  • 00:06:06
    because it is a copyrighted thing what
  • 00:06:09
    just used it without permission and it
  • 00:06:12
    was his right to do so now he has since
  • 00:06:14
    adjusted and decided that when companies
  • 00:06:16
    use it he's gonna have one rating for
  • 00:06:18
    when students and so forth he's going to
  • 00:06:20
    use another rate but nevertheless this
  • 00:06:22
    is an important issue because there are
  • 00:06:24
    a lot of people that translate tests or
  • 00:06:27
    adapt tests that they don't have the
  • 00:06:29
    right to do so and they don't ask for it
  • 00:06:31
    so so I start off by saying if I if a
  • 00:06:36
    measure is copyrighted and published you
  • 00:06:38
    need to get that permission first and
  • 00:06:40
    even if it's not copyrighted you
  • 00:06:43
    probably should write the authors and
  • 00:06:46
    get and at least inform them that you're
  • 00:06:48
    planning to do that that I mean that's
  • 00:06:50
    just common courtesy I think
  • 00:06:52
    professional courtesy now why would you
  • 00:06:55
    do it and I've listed pros and cons here
  • 00:06:57
    and in the interest of science I'm gonna
  • 00:06:58
    go through this really fast that these
  • 00:07:01
    are established measures
  • 00:07:03
    that makes sense they're cost-effective
  • 00:07:05
    and cheaper as I said globalization
  • 00:07:08
    necessitates across culturally
  • 00:07:10
    appropriate measures to fulfill the
  • 00:07:12
    needs to compare evaluate Selectric et
  • 00:07:14
    cetera guidelines and best practice
  • 00:07:16
    research offer more options to test
  • 00:07:18
    users to make informed decisions and to
  • 00:07:20
    reduce negative outcomes the cons are
  • 00:07:22
    that there can be copyright issues and
  • 00:07:24
    count country membership requirements
  • 00:07:26
    and I think ETS has dealt with some of
  • 00:07:28
    those copyright issues I know of over
  • 00:07:30
    the years that they you have to ask you
  • 00:07:34
    the benefits justify the efforts and is
  • 00:07:37
    there a real need to have the same
  • 00:07:39
    measure in a different language that
  • 00:07:41
    fairness and validity of scores for
  • 00:07:42
    target populations and use must be
  • 00:07:44
    normed on the on the target demographic
  • 00:07:46
    issues and translated assessments even
  • 00:07:50
    with careful adaptation still introduce
  • 00:07:52
    additional negative psychometric and
  • 00:07:54
    cross-cultural issues and one of the
  • 00:07:56
    ways that I had learned to this is I was
  • 00:07:59
    an expert witness in two court cases in
  • 00:08:01
    Canada of 25 years ago and the witness
  • 00:08:04
    on the other side was John Conger who
  • 00:08:06
    some of you know and what happened is
  • 00:08:10
    all their tests were built in English
  • 00:08:11
    but then they had to translate them into
  • 00:08:13
    French because they are a bilingual
  • 00:08:15
    country and the French students are the
  • 00:08:18
    French candidates did about 3 percent
  • 00:08:21
    worse than the english-speaking
  • 00:08:23
    candidates and when I asked why I was
  • 00:08:25
    thought it was because the French
  • 00:08:27
    schools are not as good as the English
  • 00:08:28
    schools and later on was told that I was
  • 00:08:31
    indeed right that it was a translation
  • 00:08:33
    issue that that they that questions made
  • 00:08:36
    more sense in English the way they were
  • 00:08:37
    written first and then they were
  • 00:08:38
    translated they didn't do as well so
  • 00:08:40
    essentially it was a built-in bias
  • 00:08:42
    against the french-speaking candidates
  • 00:08:47
    but we do an OECD which is the publisher
  • 00:08:50
    of peas and a bunch of the other surveys
  • 00:08:52
    and they have done some in recent years
  • 00:08:55
    on critical thinking
  • 00:08:56
    in economics at the higher ed level
  • 00:08:58
    which people are not as familiar with
  • 00:08:59
    and and I've worked on the critical
  • 00:09:01
    thinking one which is what I'm going to
  • 00:09:02
    give you examples of people want to make
  • 00:09:05
    comparisons and I'm gonna make the
  • 00:09:07
    argument that some of those comparisons
  • 00:09:09
    are less sophisticated than we'd like to
  • 00:09:12
    think so what are the skills you need to
  • 00:09:16
    adapt to measure well certainly you need
  • 00:09:17
    to be fluent in both languages you need
  • 00:09:19
    a comprehensive understanding or the
  • 00:09:21
    constructs being assessed you need a
  • 00:09:24
    thorough understanding of both cultures
  • 00:09:26
    you have to have some ability to work on
  • 00:09:28
    testing measures there are skills
  • 00:09:30
    involved in writing items and so forth
  • 00:09:32
    and I'm gonna tell you I gave a keynote
  • 00:09:34
    at the Mexican National Academy of
  • 00:09:37
    assessment a couple of years ago and I
  • 00:09:40
    learned they've taken a very different
  • 00:09:42
    model than the United States has they
  • 00:09:45
    have some 89 languages that are from
  • 00:09:48
    indigenous people that make up only 5.4
  • 00:09:52
    percent of the population but they have
  • 00:09:54
    schools representing about 20 different
  • 00:09:56
    indigenous languages and the decision
  • 00:09:59
    they've made rather than the United
  • 00:10:01
    States is that all indigenous people
  • 00:10:04
    will be taught in their own language and
  • 00:10:06
    tested in their own language so their
  • 00:10:08
    national assessment group has to
  • 00:10:11
    translate all their tests to about 20
  • 00:10:13
    languages besides Spanish and and
  • 00:10:18
    instruction and in fact of those 20
  • 00:10:20
    languages 10 of them didn't even have a
  • 00:10:22
    written language so the first thing the
  • 00:10:24
    Mexican government had to do was to
  • 00:10:26
    develop those languages
  • 00:10:28
    languages before they could even decide
  • 00:10:31
    that they were going to testing so that
  • 00:10:35
    were instructed so it's a very different
  • 00:10:38
    model in it's a model that I'm actually
  • 00:10:41
    very comfortable with and I think if we
  • 00:10:43
    were going to build a ball maybe we
  • 00:10:44
    ought to do it the other way keep us
  • 00:10:47
    from going to Mexico but I also know
  • 00:10:50
    that in South Africa they have 11
  • 00:10:54
    official languages so then they build
  • 00:10:57
    the test they have to build it in those
  • 00:10:58
    11 languages right off the bat now what
  • 00:11:05
    do we want in a translation and
  • 00:11:07
    adaptation well the idea initially
  • 00:11:09
    anyhow was it item difficulty should be
  • 00:11:11
    the same within reason across languages
  • 00:11:13
    that sociolinguistic nuances should be
  • 00:11:15
    removed or avoided content relevance
  • 00:11:18
    that access should be comparable across
  • 00:11:20
    cultures the construct relevance and
  • 00:11:22
    validity should be constant we should
  • 00:11:25
    focus on the defined objectives and the
  • 00:11:27
    purpose that formatting appearance at
  • 00:11:30
    comparable tasks should be the same and
  • 00:11:32
    to avoid really bad practices now to
  • 00:11:36
    give you a sense of this the first study
  • 00:11:38
    I did in this regard was with a graduate
  • 00:11:40
    student many years ago who studied the
  • 00:11:42
    ewok which is the ways adult
  • 00:11:44
    intelligence scale the initial form was
  • 00:11:47
    translated bike into Spanish in Puerto
  • 00:11:50
    Rico and for example you may know and
  • 00:11:53
    giving the waist the first test you
  • 00:11:56
    usually gave was the vocabulary and they
  • 00:11:59
    go from easy to hard and that decides
  • 00:12:01
    what you're gonna do well with the
  • 00:12:03
    initial version of the way you are they
  • 00:12:06
    simply translated these the English
  • 00:12:08
    words into Spanish and there was no
  • 00:12:11
    longer any reasonable rank ordering of
  • 00:12:13
    difficulty because once you've done the
  • 00:12:14
    translation but that's how it was it was
  • 00:12:16
    just the same words in the different
  • 00:12:18
    language nots in my mind
  • 00:12:20
    believably bad practice and and I'm
  • 00:12:24
    gonna give you Ron Hamilton has two
  • 00:12:26
    examples that uses frequently one of
  • 00:12:29
    these comes from pieces fourth grade
  • 00:12:32
    science test and the question asked is
  • 00:12:35
    why do ducks swim so well the students
  • 00:12:39
    that do the best on that are the Swedes
  • 00:12:42
    in the world and it turns out when you
  • 00:12:44
    translate webbed feet which is the right
  • 00:12:47
    answer in English in Swedish that's
  • 00:12:50
    swimming feet now there's also another
  • 00:12:55
    question that he's often used that was
  • 00:12:58
    there's a technique which I'm going to
  • 00:12:59
    talk about the minute but back
  • 00:13:01
    translation where you translate it to
  • 00:13:03
    new language and you back translate to
  • 00:13:05
    see how it looks and how comparable it
  • 00:13:07
    is and it was essentially an analogy
  • 00:13:10
    question that was out of sight :
  • 00:13:13
    out of mind translated back that comes
  • 00:13:17
    to blind and insane so you can see this
  • 00:13:24
    is not everybody in test instruction
  • 00:13:27
    knows that test instructions both art
  • 00:13:28
    and science and like Neil was just
  • 00:13:31
    talking about the science part of it I'm
  • 00:13:32
    going to talk more about the art part of
  • 00:13:34
    it because that's that's what we're
  • 00:13:35
    talking about I mean among the
  • 00:13:37
    translation processes you can have a
  • 00:13:39
    simple translation which is what a lot
  • 00:13:40
    of tests like the ewok used initially
  • 00:13:43
    they can have adaptation with checks and
  • 00:13:46
    that's where usually you do this kind of
  • 00:13:48
    a back translation just decided what to
  • 00:13:50
    do and I just looked up back translation
  • 00:13:54
    Pierce was first developed as a
  • 00:13:56
    technique by Brisbane in 1970 so it's
  • 00:13:59
    been around for a while there are people
  • 00:14:01
    when I edited the handbook of assessment
  • 00:14:06
    psychology there are people that had
  • 00:14:09
    chapters in there like butcher
  • 00:14:10
    still say that back translation is the
  • 00:14:12
    state-of-the-art most people would
  • 00:14:14
    disagree with that now simply because if
  • 00:14:18
    you're a translator and you know you're
  • 00:14:20
    going to be evaluated by the quality of
  • 00:14:22
    your translation what happens is you
  • 00:14:25
    translate the question not to be optimal
  • 00:14:28
    in the target language but to be
  • 00:14:30
    optimally translated back to the
  • 00:14:32
    original language and those are two very
  • 00:14:34
    different things okay so so so back
  • 00:14:38
    translation has some problems in the
  • 00:14:41
    article that I wrote in psych assessment
  • 00:14:43
    I argued that those skills that I listed
  • 00:14:45
    are unlikely to be found well in one
  • 00:14:47
    person and say you need committee
  • 00:14:48
    approaches to doing this this has to be
  • 00:14:50
    done by more than one person and cadre
  • 00:14:54
    are secand is also a vice president here
  • 00:14:56
    at ETS has wrote a chapter for my
  • 00:14:58
    handbook on concurrent ways of doing
  • 00:15:01
    this is which is what we CD is trying to
  • 00:15:03
    do this where you build the tests in the
  • 00:15:06
    same in different languages at the same
  • 00:15:08
    time basically it doesn't work for
  • 00:15:13
    pre-existing measures which a lot of the
  • 00:15:14
    Tesla translation work is done on
  • 00:15:17
    measures that achieve a certain amount
  • 00:15:20
    of notoriety in target tipica in the
  • 00:15:22
    initial language usually English but in
  • 00:15:25
    this concurrent model what happens is
  • 00:15:28
    you develop two forms at the same time
  • 00:15:30
    you have groups working together that
  • 00:15:33
    they work with a shell it's malleable so
  • 00:15:36
    that they can change it as they go now
  • 00:15:38
    if you can imagine two committees doing
  • 00:15:41
    that that's not so hard but if you start
  • 00:15:43
    thinking about Mexico when you think
  • 00:15:45
    about 89 committees doing that it's it's
  • 00:15:47
    unimaginable in my mind you know or even
  • 00:15:50
    11 perhaps in South Africa so it's very
  • 00:15:53
    difficult once you get more than two now
  • 00:15:57
    one of the things we forget about is
  • 00:15:58
    culture and culture has a big impact
  • 00:16:01
    especially when you get into personality
  • 00:16:03
    variables and things like that but the
  • 00:16:04
    examples I'm going to give
  • 00:16:05
    journalistically in performance
  • 00:16:08
    assessments I would argue that there's a
  • 00:16:09
    lot of cultural issues that affect those
  • 00:16:12
    responses to we heard an earlier talk
  • 00:16:16
    that length is one of the big
  • 00:16:18
    characteristics that assess the quality
  • 00:16:20
    of essays well length might be a very
  • 00:16:23
    culturally dependent kind of variable as
  • 00:16:24
    an example so if you're going across
  • 00:16:27
    languages or cultures you might find big
  • 00:16:29
    differences and as someone who's
  • 00:16:31
    traveled to a variety of
  • 00:16:32
    english-speaking countries including
  • 00:16:34
    South Africa I will tell you there are
  • 00:16:36
    big cultural differences even as you
  • 00:16:38
    start going across some of those
  • 00:16:41
    countries so in this 1994 article that I
  • 00:16:48
    wrote I listed steps for for adapting a
  • 00:16:50
    measure and I'm gonna go through them
  • 00:16:53
    really fast
  • 00:16:54
    first e translator they have to measure
  • 00:16:56
    that sounds like it should be the whole
  • 00:16:57
    thing then you review the translated
  • 00:17:00
    measure 3 you revise that measure based
  • 00:17:03
    on comments from the review then you
  • 00:17:05
    pilot that's a small scale testing then
  • 00:17:08
    field tests standardized scores perform
  • 00:17:12
    validation research as appropriate
  • 00:17:14
    develop a manual and other documents for
  • 00:17:16
    users of the assessment train users and
  • 00:17:18
    collect reactions from users dan well I
  • 00:17:22
    know that our second and Lyons Thomas
  • 00:17:25
    which were the people who wrote the
  • 00:17:26
    chapter for my my handbook had some
  • 00:17:29
    other steps and I'm not going to go
  • 00:17:30
    through them but but shortly after I
  • 00:17:33
    wrote that article which was really one
  • 00:17:35
    of the first things on how to how to
  • 00:17:38
    adapt measures Hamilton and Petula
  • 00:17:40
    suggested that I left a few things out
  • 00:17:43
    which included hiring the appropriate
  • 00:17:45
    translators ensuring construct
  • 00:17:47
    equivalents and that's something Barbara
  • 00:17:49
    Byrne has written
  • 00:17:50
    and I would encourage you to take a look
  • 00:17:53
    at her work and then even to decide
  • 00:17:56
    whether or not to adapt there to build
  • 00:17:58
    the new and and whether to link scores
  • 00:18:00
    across and I'm going to come back to
  • 00:18:02
    that and in another article I've written
  • 00:18:04
    I've pointed out that I think there are
  • 00:18:06
    real scoring issues that have to be
  • 00:18:08
    addressed across versions and so I think
  • 00:18:12
    there are lots of different things we
  • 00:18:13
    could add to that
  • 00:18:14
    it certainly wasn't something to keep
  • 00:18:16
    down on a tablet in terms of and that
  • 00:18:20
    this is where quantitative and
  • 00:18:22
    qualitative clearly get involved you
  • 00:18:26
    have to have reviews of the assessment
  • 00:18:27
    for usability reviews of the instrument
  • 00:18:30
    for comparability pre tests with
  • 00:18:32
    relevant individuals timing and and we
  • 00:18:35
    know that culturally there are huge
  • 00:18:36
    differences in terms of people's
  • 00:18:38
    consideration of time and how important
  • 00:18:41
    time is suitability of instructions and
  • 00:18:44
    questions about the appropriateness of
  • 00:18:46
    certain items and so forth Billy
  • 00:18:50
    Solana Flores with whom I've worked on
  • 00:18:52
    some of the projects we're talking about
  • 00:18:54
    here has defined something called test
  • 00:18:57
    translation error the lack of
  • 00:18:59
    equivalence between the source language
  • 00:19:01
    version and the target language version
  • 00:19:02
    of test items due to the nature of
  • 00:19:06
    languages it's possible that an adapted
  • 00:19:08
    formative assessment does not capture
  • 00:19:09
    our transfer of nuances and psychometric
  • 00:19:12
    or consequence is that the adapted
  • 00:19:16
    version potentially tests different
  • 00:19:17
    constructs in the original form or test
  • 00:19:20
    them slightly differently so what kinds
  • 00:19:24
    of research is needed after adaptation
  • 00:19:27
    certainly you need to check reliability
  • 00:19:29
    in a variety of different ways
  • 00:19:31
    because it's so easy we frequently only
  • 00:19:33
    do internal consistency anymore but I
  • 00:19:36
    think test three tests and other things
  • 00:19:38
    to note whether it's a state or a threat
  • 00:19:40
    for example are also important item
  • 00:19:43
    analysis important factor analysis of
  • 00:19:46
    items SEM analyses and that's what
  • 00:19:48
    Barbara pushes and then secondarily
  • 00:19:51
    there I've got the SEM Fairness analyses
  • 00:19:54
    although one of my former students Steve
  • 00:19:58
    Sarita who many of you know he's talked
  • 00:20:01
    a lot he's actually used to do workshops
  • 00:20:03
    on using DIF in adapted and translated
  • 00:20:06
    measures but more recently has come up
  • 00:20:09
    with the idea it's probably not
  • 00:20:10
    appropriate to do different ala C's
  • 00:20:12
    across versions because what you're
  • 00:20:15
    doing is you're confounding two
  • 00:20:17
    variables with no ability to separate
  • 00:20:19
    them you have group differences and
  • 00:20:21
    translation differences and those are
  • 00:20:23
    completely and totally confounded so you
  • 00:20:25
    can't separate them he and Swami Nathan
  • 00:20:29
    have written that up looking at norms
  • 00:20:32
    and and then there's the possibility of
  • 00:20:33
    linking and I should note that Linda and
  • 00:20:36
    Bill egg off I think Linda Kuk into the
  • 00:20:38
    logoff did probably the best-known
  • 00:20:41
    blinking study I think on the Spanish
  • 00:20:44
    version of the SAT to the English
  • 00:20:45
    version back maybe 20 years ago we're so
  • 00:20:48
    25 now beyond validity this came up
  • 00:20:52
    earlier there's the term of utility and
  • 00:20:54
    usefulness and and my favorite example
  • 00:20:57
    of this is the Canadian SAT now there's
  • 00:21:00
    probably only one or two people that
  • 00:21:01
    even know
  • 00:21:02
    wasn't in this room but in the in the
  • 00:21:06
    60s the Ontario Institute for studies of
  • 00:21:10
    Education decided they were going to
  • 00:21:12
    build an SAT an ETS send-up said a lot
  • 00:21:15
    of consultants to work with them and
  • 00:21:17
    they built a very nice Canadian SAT and
  • 00:21:20
    when they were all done the Canadian
  • 00:21:24
    government decided students shouldn't
  • 00:21:26
    pay for it the university should pay for
  • 00:21:28
    it if they want so all the costs were
  • 00:21:30
    going to be distributed to the
  • 00:21:32
    universities and at that point they
  • 00:21:35
    decided no one wanted to use it and so
  • 00:21:37
    it went away so all the development
  • 00:21:38
    costs were for naught and and that's in
  • 00:21:41
    my mind the classic case of poor utility
  • 00:21:43
    and poor planning but there are there
  • 00:21:47
    are other cases you have to decide is it
  • 00:21:48
    really worth doing this from a whole
  • 00:21:50
    variety of purposes and then the
  • 00:21:52
    question does it make sense to equate or
  • 00:21:54
    link tests across languages and perhaps
  • 00:21:57
    if the questions are really similar and
  • 00:21:59
    you have a lot of other information that
  • 00:22:02
    you know about it might make sense
  • 00:22:06
    slides get better okay
  • 00:22:08
    I thought it was making it easier for
  • 00:22:10
    people to sleep but let's see the
  • 00:22:14
    decision demands really very high level
  • 00:22:18
    of tests and psychometric equivalence
  • 00:22:20
    you must be convinced that those tests
  • 00:22:23
    are really highly comparable and most
  • 00:22:25
    acquainting designs have much more
  • 00:22:27
    rigorous requirements as Neil just
  • 00:22:29
    explained then we have in adaptation
  • 00:22:31
    studies and there's a good article in
  • 00:22:35
    measurement issues and practice entitled
  • 00:22:38
    problems and issues and linking
  • 00:22:40
    assessments across languages by Cerises
  • 00:22:41
    and others what we need to know and
  • 00:22:45
    we're adapting a measure is one are the
  • 00:22:49
    constructs equivalent you need to know
  • 00:22:51
    that even before you get into the
  • 00:22:53
    measure
  • 00:22:53
    itself then we have these same
  • 00:22:55
    constructs in different cultures are
  • 00:22:56
    they equally meaningful in different
  • 00:22:58
    cultures then are the tests equivalent
  • 00:23:01
    in those different cultures and are the
  • 00:23:03
    testing conditions and so forth the same
  • 00:23:06
    and all those really need to be
  • 00:23:07
    established Creuset noted that
  • 00:23:10
    adaptation errors are most prevalent
  • 00:23:12
    source of DIF and international
  • 00:23:14
    assessments and he said that we know
  • 00:23:17
    that even state the state there are some
  • 00:23:18
    particular differences but when you go
  • 00:23:20
    across countries there are huge
  • 00:23:21
    curricular differences then there are
  • 00:23:24
    cultural biases and translation errors
  • 00:23:26
    all of which cause postural issues now
  • 00:23:32
    the international test Commission bill
  • 00:23:33
    is is famous really for its tests
  • 00:23:36
    adaptation guidelines that came out a
  • 00:23:38
    few years ago in their second edition
  • 00:23:40
    and these are to promote good practice
  • 00:23:43
    and test an adaptation you may know that
  • 00:23:45
    ITC the International test Commission
  • 00:23:47
    was developed initially because of
  • 00:23:50
    European countries as as Europe became
  • 00:23:53
    the European Union they they you can now
  • 00:23:59
    move easily across countries and so
  • 00:24:00
    forth so that people need to take tests
  • 00:24:02
    in different languages and so they've
  • 00:24:05
    they've put out really simple
  • 00:24:08
    easy-to-understand guidelines and
  • 00:24:10
    they're all freely available and
  • 00:24:12
    downloadable they now have like six sets
  • 00:24:13
    of guidelines this is to ensure a level
  • 00:24:17
    playing field for testing across
  • 00:24:18
    national boundaries and to provide a
  • 00:24:20
    mechanism whereby test users can observe
  • 00:24:23
    their duty of care to the public without
  • 00:24:25
    regard to national boundaries I do
  • 00:24:28
    believe documentation is important and
  • 00:24:31
    that's one of those things that has
  • 00:24:33
    increasingly become difficult to find in
  • 00:24:35
    the test
  • 00:24:35
    lots of tests don't have manuals anymore
  • 00:24:38
    I know when camara told me a few years
  • 00:24:41
    ago that the college water decided well
  • 00:24:43
    we're doing the research but we don't
  • 00:24:44
    have to pull it all together into a
  • 00:24:46
    single book you know I think users need
  • 00:24:50
    information that's easily available and
  • 00:24:53
    so forth
  • 00:24:55
    now I'm gonna get into the adaptation
  • 00:24:58
    issue some of you may know the critical
  • 00:25:01
    thinking component of the ELA which
  • 00:25:05
    stands for let's say one good assessment
  • 00:25:09
    of higher education learning outcomes in
  • 00:25:12
    English it's not as well known as Pisa
  • 00:25:14
    and so forth
  • 00:25:16
    used the CLA the clergy collegiate
  • 00:25:20
    learning assessment which you also may
  • 00:25:21
    know is a outcomes assessment measure
  • 00:25:24
    used by some 1,300 colleges in the
  • 00:25:26
    United States it's now the CLA plus it
  • 00:25:30
    was based actually on a GRE model in a
  • 00:25:32
    sense it's a performance assessment
  • 00:25:34
    where you read three or four pages of
  • 00:25:36
    material and then you write an essay and
  • 00:25:39
    it is it used to be scored it isn't
  • 00:25:42
    anymore but it used to be scored in
  • 00:25:43
    English by the GREs assessment automated
  • 00:25:49
    assessment now burrows was was hired to
  • 00:25:54
    translate this into a variety of
  • 00:25:56
    languages or to work with National
  • 00:25:58
    Committee's South Korea Slovakia Egypt
  • 00:26:00
    Colombia and so forth and a few other
  • 00:26:02
    countries now it's an essay you read
  • 00:26:05
    this problem the problem that they used
  • 00:26:07
    internationally was that there's a two
  • 00:26:11
    legs there's a river between them you
  • 00:26:13
    want to harness water power as it goes
  • 00:26:17
    from one leg to the other across the
  • 00:26:18
    river but there's an endangered fish
  • 00:26:20
    that lives in
  • 00:26:21
    River and so there's no right answer to
  • 00:26:24
    this but the thought is you have to
  • 00:26:25
    write an essay that describes you're
  • 00:26:27
    sensitive to the fish and you understand
  • 00:26:30
    the need for power and things like that
  • 00:26:32
    and I should note it's a company and
  • 00:26:34
    it's a for-profit company that once the
  • 00:26:37
    harness to power so so we work with
  • 00:26:42
    these different countries with teams of
  • 00:26:44
    people in the country to work on those
  • 00:26:46
    translations now Slovakia as an example
  • 00:26:48
    was a Western country used to be part of
  • 00:26:51
    Czechoslovakia they're a NATO country
  • 00:26:53
    there were almost no problems there at
  • 00:26:56
    all it translated very easily into their
  • 00:26:58
    language it makes sense to them and so
  • 00:27:00
    forth
  • 00:27:00
    now we Richard eggleston had had done
  • 00:27:03
    the translation there the year before
  • 00:27:05
    and they have no rivers no legs the
  • 00:27:10
    students don't know anything about water
  • 00:27:12
    power and so the way the problem was
  • 00:27:15
    changed was this became a seagoing fish
  • 00:27:19
    and they were trying to harness ocean
  • 00:27:20
    power now that starts changing the
  • 00:27:23
    question it's now it's introducing a new
  • 00:27:27
    concept as opposed to a concept that
  • 00:27:29
    people may know about now Columbia was
  • 00:27:33
    another country and and Willie Solano
  • 00:27:35
    Flores worked with us on this one and
  • 00:27:39
    years ago I knew that when we talked
  • 00:27:41
    about turning the GRE into Spanish we
  • 00:27:44
    were told we'd need at least three
  • 00:27:45
    different versions of Spanish and indeed
  • 00:27:47
    Colombia needed a different version from
  • 00:27:49
    Mexico that had already translated it
  • 00:27:52
    and and it was it was mostly the same
  • 00:27:56
    but just different words being inserted
  • 00:28:00
    now then we get to South Korea South
  • 00:28:03
    Korea said they had a great adaptation a
  • 00:28:07
    great translation but our analysis of
  • 00:28:10
    the data showed that it made no sense
  • 00:28:11
    was almost random data it looked like
  • 00:28:14
    and yet so I was charged to find out why
  • 00:28:18
    this was not work
  • 00:28:19
    and it just so happened I had a doctoral
  • 00:28:21
    student was Lightman ETS intern by the
  • 00:28:22
    name of my son Li and she is now done
  • 00:28:26
    and she teaches in the California State
  • 00:28:27
    University system and she's read it and
  • 00:28:31
    said this doesn't make any sense because
  • 00:28:33
    there's no power companies in South
  • 00:28:36
    Korea the government supplies the power
  • 00:28:38
    and it's and it isn't something that
  • 00:28:41
    people pay for in the same kind of way
  • 00:28:43
    that they do in the United States so
  • 00:28:46
    they had to change the question and once
  • 00:28:50
    we changed it to the government suddenly
  • 00:28:52
    the data came out much better so they
  • 00:28:55
    had they have done more of a literal
  • 00:28:57
    translation of that component and it
  • 00:28:59
    just didn't work now then we get to
  • 00:29:03
    Egypt and I did this one with Willie
  • 00:29:07
    Solano by the way and we had an Arabic
  • 00:29:10
    version already based on Kuwait but we
  • 00:29:13
    were called equate spoke hi Arabic in
  • 00:29:15
    Egypt spoke low Arabic I'm not sure
  • 00:29:17
    about that differences but that happens
  • 00:29:20
    in a lot of countries and so we we knew
  • 00:29:22
    we had to at least do that now obviously
  • 00:29:24
    I'm like wait they have the Nile running
  • 00:29:27
    right through Cairo so they they know
  • 00:29:29
    rivers and lakes and we actually did
  • 00:29:32
    think aloud as much as you would do with
  • 00:29:34
    students with disabilities and we
  • 00:29:36
    watched two people it's interesting to
  • 00:29:38
    watch people and doing this in Arabic
  • 00:29:39
    when you don't speak it but we got
  • 00:29:41
    translations back and the biggest
  • 00:29:44
    problem they said is their power is also
  • 00:29:47
    provided by the government but they said
  • 00:29:50
    no one in the government would ever ask
  • 00:29:52
    for our input as a consultant it just
  • 00:29:55
    would never happen the government
  • 00:29:57
    believes it knows all the answers and
  • 00:29:58
    basically the bottom line is and I want
  • 00:30:03
    to just mention till we did this at the
  • 00:30:05
    tail end of the Revolution and there was
  • 00:30:07
    gunfire in the background while we were
  • 00:30:09
    doing this and it was and you know to
  • 00:30:12
    get into our hotel
  • 00:30:14
    wieners and that dog sniffers on the car
  • 00:30:17
    and stuff like that it was really quite
  • 00:30:19
    fascinating so their solution was to
  • 00:30:23
    make this into the United States that
  • 00:30:25
    you're a consultant to a company United
  • 00:30:27
    States doing this because they thought
  • 00:30:28
    in the United States people might
  • 00:30:30
    actually ask answer ask questions I'm
  • 00:30:32
    not for interest of time I'm not going
  • 00:30:35
    to go through these methodological
  • 00:30:36
    issues I'm told that I'm getting really
  • 00:30:39
    sure but our experience looking at some
  • 00:30:42
    adapted measures is that some users try
  • 00:30:46
    to translate the validation from English
  • 00:30:48
    and just say it's the same he's done
  • 00:30:51
    that for example we just believe the
  • 00:30:53
    violation research is the same as it was
  • 00:30:55
    in English sometimes they do it in a
  • 00:30:58
    couple of countries and then they assume
  • 00:30:59
    well it's if it's true in Mexico and
  • 00:31:02
    it's true in Spain well then it would be
  • 00:31:03
    true all over the world and that that's
  • 00:31:05
    problematic some scales even use the
  • 00:31:08
    same norms from the original language
  • 00:31:11
    let's see where they do have norms it's
  • 00:31:13
    usually a much smaller and less
  • 00:31:14
    representative sample that was true on
  • 00:31:16
    the e wall which was done in a very
  • 00:31:18
    disproportionate sample in Puerto Rico
  • 00:31:20
    and that there become lots of other fit
  • 00:31:23
    issues we believe osed uses national
  • 00:31:27
    expert committees they have double
  • 00:31:29
    translation which means two people are
  • 00:31:30
    actually translating the measure are two
  • 00:31:32
    groups and then they compare the two
  • 00:31:34
    translated measures for comparability
  • 00:31:37
    and they have either cross checks or
  • 00:31:40
    reconciliation we think again I'm going
  • 00:31:46
    to skip this one I think but there are
  • 00:31:47
    reasons why this is continuing on
  • 00:31:51
    context measures matters and we know in
  • 00:31:54
    some countries for example a huge
  • 00:31:56
    proportion like the United States
  • 00:31:58
    actually a huge proportion of people go
  • 00:31:59
    to college and universities if there are
  • 00:32:01
    some other countries where it might only
  • 00:32:02
    be five or ten percent of the population
  • 00:32:04
    so then you're Aaron you have very
  • 00:32:07
    unusual comparisons there's lots of
  • 00:32:11
    economic factors cultures perceptions
  • 00:32:13
    linguistic structures styles there are
  • 00:32:17
    countries
  • 00:32:18
    go to different universities for example
  • 00:32:21
    so my themes as a whole that adapted
  • 00:32:25
    measures have a huge appeal they have
  • 00:32:27
    great potential they have huge financial
  • 00:32:29
    work for testing companies but we need
  • 00:32:32
    to conduct more and better research on
  • 00:32:34
    adapted measures and I questioned
  • 00:32:36
    whether a lot of cross national cross
  • 00:32:39
    language and weightings will be possible
  • 00:32:40
    or even meaningful thank you very much
  • 00:32:44
    [Applause]
Tags
  • test adaptation
  • cultural understanding
  • language assessment
  • psychometrics
  • ethical considerations
  • test translation
  • cross-cultural research
  • validity
  • reliability
  • international assessments