00:00:10
and I have the pleasure of introducing
00:00:12
Kurt guys under who's the director of
00:00:14
the borough Centre for testing and the
00:00:17
WC Meier Henry distinguished professor
00:00:19
at the University of Nebraska to talk to
00:00:22
us about test adaptation thanks Kurt
00:00:25
let me thank Amy both for the invitation
00:00:28
and the introduction ETS and MHS and
00:00:33
when I was first asked to do the talk I
00:00:35
was asked whether I was going to talk
00:00:37
about past present or future and I told
00:00:40
her past and that was just based on my
00:00:42
age at the time but but I think I'm
00:00:44
going to talk about all three and I
00:00:47
decided to do this most of the work I've
00:00:50
done in recent years is unfairness
00:00:51
either for language minorities or for
00:00:54
people with disabilities and that was my
00:00:57
initial thought of what I would talk
00:00:58
about but instead I've decided to talk
00:01:00
about tests that in part because the
00:01:05
International test Commission is meeting
00:01:07
the summer in Montreal and I'm trying to
00:01:08
drum our supporter for that and in terms
00:01:12
of history as Neil's started talking
00:01:15
about when he was in this room I think I
00:01:18
beat him because I was an intern at ETS
00:01:20
in 1975 and we had a meeting in here and
00:01:24
interestingly there's another person who
00:01:27
was an intern with me that same year I
00:01:30
was here and it was amazing that they
00:01:31
allowed Linda Kuk to come when she was
00:01:34
only in junior high school at the time
00:01:38
but I also was in this room I was in
00:01:42
with Warren Willingham and then he hired
00:01:44
me back the next year as a research
00:01:46
associate and I joined the GRE technical
00:01:51
advisory committee in 1993 and in this
00:01:55
room they they brought me in a day early
00:01:58
so they had a session on ER and wine
00:02:00
well again when he was stepping down as
00:02:02
an officer of the organization and that
00:02:05
was in this room as well so it does have
00:02:07
some history now I also have a history
00:02:11
with the division five APA I first
00:02:14
joined the executive committee of the
00:02:16
division of 1993 25 years ago as well
00:02:19
and based on discussions I had the Jim
00:02:25
Butcher who was then the assessment
00:02:27
coordinator or whatever the it's called
00:02:29
he was a head of the assessment group he
00:02:32
and I got into a discussion and he asked
00:02:33
me to do a paper for psych assessment
00:02:36
which he was editing at the time and I
00:02:38
did it on the topic of test translation
00:02:41
and it's the paper that I've done that I
00:02:45
probably shouldn't have done because I
00:02:47
very little history in that topic at the
00:02:49
time but I was a foreign language major
00:02:51
as an undergraduate I knew something
00:02:53
about translation and so that's that's
00:02:57
what I'm gonna talk about and I'm gonna
00:02:59
say that some of the logic I'm gonna use
00:03:01
toward the end of the presentation is
00:03:03
what I would call journalistic logic I'm
00:03:05
using examples to make points that runs
00:03:08
counter proper to the typical method of
00:03:11
division five but I do tell you that
00:03:16
second to fifth is when ITC is meeting
00:03:18
in Montreal and it's at the same time as
00:03:20
a Jazz Festival there in case you love
00:03:23
jazz so again I'm going to give a very
00:03:27
quick overview of test adaptation we use
00:03:29
the word adaptation rather than
00:03:31
translation because it's more than
00:03:33
language you have to involve other
00:03:35
things other than language and I think I
00:03:37
will make that point very clearly to you
00:03:39
with some examples and and what I'm a
00:03:42
little different about is I've actually
00:03:45
done some work now on adaptation from
00:03:48
one language and culture to another of
00:03:50
some performance assessments through
00:03:52
OECD and I'm going to show you just how
00:03:56
difficult it is and and it fits exactly
00:03:59
in with Neil's comments over the
00:04:02
difficulty of equation or even linking
00:04:04
at some point because you start
00:04:06
questioning whether they're the same
00:04:07
measures by the time you've done some of
00:04:09
that why are we doing this kind of work
00:04:14
well first off there are lots of testing
00:04:17
companies right now that realize that
00:04:18
the world has shrunk and there are multi
00:04:20
cult multinational corporations who want
00:04:23
to administer the same tests all over
00:04:25
the world we want to make international
00:04:28
comparisons
00:04:30
I think our psychological science is
00:04:32
getting a little stronger that allows us
00:04:34
to do that in some cases we've
00:04:36
recognized the differences between attic
00:04:39
and emic kinds of measures and there are
00:04:42
a lot of fiscal and pragmatic reasons
00:04:44
why this it may be cheaper and easier to
00:04:46
adapt the test than it is to to build a
00:04:49
new one now let me give you two
00:04:52
precursors and can't read the citation
00:04:56
there but I got quoted in a science
00:04:59
article this past year I was called in
00:05:04
the interview
00:05:04
and so forth but it turns out there was
00:05:07
a retired professor in California who
00:05:10
had built a test well he go to a survey
00:05:14
in the sense of whether you take your
00:05:16
medications the way you're supposed to
00:05:18
and it's mostly used by insurance
00:05:20
companies as part of their tests of
00:05:23
drugs which they have to do to get
00:05:25
approval and it turns out after he
00:05:28
retired he got very sporadic about his
00:05:32
answering of emails and letters and
00:05:34
things like that and he had copyrighted
00:05:37
these scales and he makes it very clear
00:05:39
that they're very expensive to use
00:05:41
because after all he's selling them to
00:05:43
insurance companies but a bunch of grad
00:05:45
students wrote and said can I use this
00:05:47
for my master's thesis or doctoral
00:05:49
dissertation he didn't answer them and
00:05:52
they went ahead and used it anyhow and
00:05:54
he then sent them bills for upwards of
00:05:57
$20,000 each and the question was is
00:06:02
that appropriate and it's a very complex
00:06:05
question it's not a simple question
00:06:06
because it is a copyrighted thing what
00:06:09
just used it without permission and it
00:06:12
was his right to do so now he has since
00:06:14
adjusted and decided that when companies
00:06:16
use it he's gonna have one rating for
00:06:18
when students and so forth he's going to
00:06:20
use another rate but nevertheless this
00:06:22
is an important issue because there are
00:06:24
a lot of people that translate tests or
00:06:27
adapt tests that they don't have the
00:06:29
right to do so and they don't ask for it
00:06:31
so so I start off by saying if I if a
00:06:36
measure is copyrighted and published you
00:06:38
need to get that permission first and
00:06:40
even if it's not copyrighted you
00:06:43
probably should write the authors and
00:06:46
get and at least inform them that you're
00:06:48
planning to do that that I mean that's
00:06:50
just common courtesy I think
00:06:52
professional courtesy now why would you
00:06:55
do it and I've listed pros and cons here
00:06:57
and in the interest of science I'm gonna
00:06:58
go through this really fast that these
00:07:01
are established measures
00:07:03
that makes sense they're cost-effective
00:07:05
and cheaper as I said globalization
00:07:08
necessitates across culturally
00:07:10
appropriate measures to fulfill the
00:07:12
needs to compare evaluate Selectric et
00:07:14
cetera guidelines and best practice
00:07:16
research offer more options to test
00:07:18
users to make informed decisions and to
00:07:20
reduce negative outcomes the cons are
00:07:22
that there can be copyright issues and
00:07:24
count country membership requirements
00:07:26
and I think ETS has dealt with some of
00:07:28
those copyright issues I know of over
00:07:30
the years that they you have to ask you
00:07:34
the benefits justify the efforts and is
00:07:37
there a real need to have the same
00:07:39
measure in a different language that
00:07:41
fairness and validity of scores for
00:07:42
target populations and use must be
00:07:44
normed on the on the target demographic
00:07:46
issues and translated assessments even
00:07:50
with careful adaptation still introduce
00:07:52
additional negative psychometric and
00:07:54
cross-cultural issues and one of the
00:07:56
ways that I had learned to this is I was
00:07:59
an expert witness in two court cases in
00:08:01
Canada of 25 years ago and the witness
00:08:04
on the other side was John Conger who
00:08:06
some of you know and what happened is
00:08:10
all their tests were built in English
00:08:11
but then they had to translate them into
00:08:13
French because they are a bilingual
00:08:15
country and the French students are the
00:08:18
French candidates did about 3 percent
00:08:21
worse than the english-speaking
00:08:23
candidates and when I asked why I was
00:08:25
thought it was because the French
00:08:27
schools are not as good as the English
00:08:28
schools and later on was told that I was
00:08:31
indeed right that it was a translation
00:08:33
issue that that they that questions made
00:08:36
more sense in English the way they were
00:08:37
written first and then they were
00:08:38
translated they didn't do as well so
00:08:40
essentially it was a built-in bias
00:08:42
against the french-speaking candidates
00:08:47
but we do an OECD which is the publisher
00:08:50
of peas and a bunch of the other surveys
00:08:52
and they have done some in recent years
00:08:55
on critical thinking
00:08:56
in economics at the higher ed level
00:08:58
which people are not as familiar with
00:08:59
and and I've worked on the critical
00:09:01
thinking one which is what I'm going to
00:09:02
give you examples of people want to make
00:09:05
comparisons and I'm gonna make the
00:09:07
argument that some of those comparisons
00:09:09
are less sophisticated than we'd like to
00:09:12
think so what are the skills you need to
00:09:16
adapt to measure well certainly you need
00:09:17
to be fluent in both languages you need
00:09:19
a comprehensive understanding or the
00:09:21
constructs being assessed you need a
00:09:24
thorough understanding of both cultures
00:09:26
you have to have some ability to work on
00:09:28
testing measures there are skills
00:09:30
involved in writing items and so forth
00:09:32
and I'm gonna tell you I gave a keynote
00:09:34
at the Mexican National Academy of
00:09:37
assessment a couple of years ago and I
00:09:40
learned they've taken a very different
00:09:42
model than the United States has they
00:09:45
have some 89 languages that are from
00:09:48
indigenous people that make up only 5.4
00:09:52
percent of the population but they have
00:09:54
schools representing about 20 different
00:09:56
indigenous languages and the decision
00:09:59
they've made rather than the United
00:10:01
States is that all indigenous people
00:10:04
will be taught in their own language and
00:10:06
tested in their own language so their
00:10:08
national assessment group has to
00:10:11
translate all their tests to about 20
00:10:13
languages besides Spanish and and
00:10:18
instruction and in fact of those 20
00:10:20
languages 10 of them didn't even have a
00:10:22
written language so the first thing the
00:10:24
Mexican government had to do was to
00:10:26
develop those languages
00:10:28
languages before they could even decide
00:10:31
that they were going to testing so that
00:10:35
were instructed so it's a very different
00:10:38
model in it's a model that I'm actually
00:10:41
very comfortable with and I think if we
00:10:43
were going to build a ball maybe we
00:10:44
ought to do it the other way keep us
00:10:47
from going to Mexico but I also know
00:10:50
that in South Africa they have 11
00:10:54
official languages so then they build
00:10:57
the test they have to build it in those
00:10:58
11 languages right off the bat now what
00:11:05
do we want in a translation and
00:11:07
adaptation well the idea initially
00:11:09
anyhow was it item difficulty should be
00:11:11
the same within reason across languages
00:11:13
that sociolinguistic nuances should be
00:11:15
removed or avoided content relevance
00:11:18
that access should be comparable across
00:11:20
cultures the construct relevance and
00:11:22
validity should be constant we should
00:11:25
focus on the defined objectives and the
00:11:27
purpose that formatting appearance at
00:11:30
comparable tasks should be the same and
00:11:32
to avoid really bad practices now to
00:11:36
give you a sense of this the first study
00:11:38
I did in this regard was with a graduate
00:11:40
student many years ago who studied the
00:11:42
ewok which is the ways adult
00:11:44
intelligence scale the initial form was
00:11:47
translated bike into Spanish in Puerto
00:11:50
Rico and for example you may know and
00:11:53
giving the waist the first test you
00:11:56
usually gave was the vocabulary and they
00:11:59
go from easy to hard and that decides
00:12:01
what you're gonna do well with the
00:12:03
initial version of the way you are they
00:12:06
simply translated these the English
00:12:08
words into Spanish and there was no
00:12:11
longer any reasonable rank ordering of
00:12:13
difficulty because once you've done the
00:12:14
translation but that's how it was it was
00:12:16
just the same words in the different
00:12:18
language nots in my mind
00:12:20
believably bad practice and and I'm
00:12:24
gonna give you Ron Hamilton has two
00:12:26
examples that uses frequently one of
00:12:29
these comes from pieces fourth grade
00:12:32
science test and the question asked is
00:12:35
why do ducks swim so well the students
00:12:39
that do the best on that are the Swedes
00:12:42
in the world and it turns out when you
00:12:44
translate webbed feet which is the right
00:12:47
answer in English in Swedish that's
00:12:50
swimming feet now there's also another
00:12:55
question that he's often used that was
00:12:58
there's a technique which I'm going to
00:12:59
talk about the minute but back
00:13:01
translation where you translate it to
00:13:03
new language and you back translate to
00:13:05
see how it looks and how comparable it
00:13:07
is and it was essentially an analogy
00:13:10
question that was out of sight :
00:13:13
out of mind translated back that comes
00:13:17
to blind and insane so you can see this
00:13:24
is not everybody in test instruction
00:13:27
knows that test instructions both art
00:13:28
and science and like Neil was just
00:13:31
talking about the science part of it I'm
00:13:32
going to talk more about the art part of
00:13:34
it because that's that's what we're
00:13:35
talking about I mean among the
00:13:37
translation processes you can have a
00:13:39
simple translation which is what a lot
00:13:40
of tests like the ewok used initially
00:13:43
they can have adaptation with checks and
00:13:46
that's where usually you do this kind of
00:13:48
a back translation just decided what to
00:13:50
do and I just looked up back translation
00:13:54
Pierce was first developed as a
00:13:56
technique by Brisbane in 1970 so it's
00:13:59
been around for a while there are people
00:14:01
when I edited the handbook of assessment
00:14:06
psychology there are people that had
00:14:09
chapters in there like butcher
00:14:10
still say that back translation is the
00:14:12
state-of-the-art most people would
00:14:14
disagree with that now simply because if
00:14:18
you're a translator and you know you're
00:14:20
going to be evaluated by the quality of
00:14:22
your translation what happens is you
00:14:25
translate the question not to be optimal
00:14:28
in the target language but to be
00:14:30
optimally translated back to the
00:14:32
original language and those are two very
00:14:34
different things okay so so so back
00:14:38
translation has some problems in the
00:14:41
article that I wrote in psych assessment
00:14:43
I argued that those skills that I listed
00:14:45
are unlikely to be found well in one
00:14:47
person and say you need committee
00:14:48
approaches to doing this this has to be
00:14:50
done by more than one person and cadre
00:14:54
are secand is also a vice president here
00:14:56
at ETS has wrote a chapter for my
00:14:58
handbook on concurrent ways of doing
00:15:01
this is which is what we CD is trying to
00:15:03
do this where you build the tests in the
00:15:06
same in different languages at the same
00:15:08
time basically it doesn't work for
00:15:13
pre-existing measures which a lot of the
00:15:14
Tesla translation work is done on
00:15:17
measures that achieve a certain amount
00:15:20
of notoriety in target tipica in the
00:15:22
initial language usually English but in
00:15:25
this concurrent model what happens is
00:15:28
you develop two forms at the same time
00:15:30
you have groups working together that
00:15:33
they work with a shell it's malleable so
00:15:36
that they can change it as they go now
00:15:38
if you can imagine two committees doing
00:15:41
that that's not so hard but if you start
00:15:43
thinking about Mexico when you think
00:15:45
about 89 committees doing that it's it's
00:15:47
unimaginable in my mind you know or even
00:15:50
11 perhaps in South Africa so it's very
00:15:53
difficult once you get more than two now
00:15:57
one of the things we forget about is
00:15:58
culture and culture has a big impact
00:16:01
especially when you get into personality
00:16:03
variables and things like that but the
00:16:04
examples I'm going to give
00:16:05
journalistically in performance
00:16:08
assessments I would argue that there's a
00:16:09
lot of cultural issues that affect those
00:16:12
responses to we heard an earlier talk
00:16:16
that length is one of the big
00:16:18
characteristics that assess the quality
00:16:20
of essays well length might be a very
00:16:23
culturally dependent kind of variable as
00:16:24
an example so if you're going across
00:16:27
languages or cultures you might find big
00:16:29
differences and as someone who's
00:16:31
traveled to a variety of
00:16:32
english-speaking countries including
00:16:34
South Africa I will tell you there are
00:16:36
big cultural differences even as you
00:16:38
start going across some of those
00:16:41
countries so in this 1994 article that I
00:16:48
wrote I listed steps for for adapting a
00:16:50
measure and I'm gonna go through them
00:16:53
really fast
00:16:54
first e translator they have to measure
00:16:56
that sounds like it should be the whole
00:16:57
thing then you review the translated
00:17:00
measure 3 you revise that measure based
00:17:03
on comments from the review then you
00:17:05
pilot that's a small scale testing then
00:17:08
field tests standardized scores perform
00:17:12
validation research as appropriate
00:17:14
develop a manual and other documents for
00:17:16
users of the assessment train users and
00:17:18
collect reactions from users dan well I
00:17:22
know that our second and Lyons Thomas
00:17:25
which were the people who wrote the
00:17:26
chapter for my my handbook had some
00:17:29
other steps and I'm not going to go
00:17:30
through them but but shortly after I
00:17:33
wrote that article which was really one
00:17:35
of the first things on how to how to
00:17:38
adapt measures Hamilton and Petula
00:17:40
suggested that I left a few things out
00:17:43
which included hiring the appropriate
00:17:45
translators ensuring construct
00:17:47
equivalents and that's something Barbara
00:17:49
Byrne has written
00:17:50
and I would encourage you to take a look
00:17:53
at her work and then even to decide
00:17:56
whether or not to adapt there to build
00:17:58
the new and and whether to link scores
00:18:00
across and I'm going to come back to
00:18:02
that and in another article I've written
00:18:04
I've pointed out that I think there are
00:18:06
real scoring issues that have to be
00:18:08
addressed across versions and so I think
00:18:12
there are lots of different things we
00:18:13
could add to that
00:18:14
it certainly wasn't something to keep
00:18:16
down on a tablet in terms of and that
00:18:20
this is where quantitative and
00:18:22
qualitative clearly get involved you
00:18:26
have to have reviews of the assessment
00:18:27
for usability reviews of the instrument
00:18:30
for comparability pre tests with
00:18:32
relevant individuals timing and and we
00:18:35
know that culturally there are huge
00:18:36
differences in terms of people's
00:18:38
consideration of time and how important
00:18:41
time is suitability of instructions and
00:18:44
questions about the appropriateness of
00:18:46
certain items and so forth Billy
00:18:50
Solana Flores with whom I've worked on
00:18:52
some of the projects we're talking about
00:18:54
here has defined something called test
00:18:57
translation error the lack of
00:18:59
equivalence between the source language
00:19:01
version and the target language version
00:19:02
of test items due to the nature of
00:19:06
languages it's possible that an adapted
00:19:08
formative assessment does not capture
00:19:09
our transfer of nuances and psychometric
00:19:12
or consequence is that the adapted
00:19:16
version potentially tests different
00:19:17
constructs in the original form or test
00:19:20
them slightly differently so what kinds
00:19:24
of research is needed after adaptation
00:19:27
certainly you need to check reliability
00:19:29
in a variety of different ways
00:19:31
because it's so easy we frequently only
00:19:33
do internal consistency anymore but I
00:19:36
think test three tests and other things
00:19:38
to note whether it's a state or a threat
00:19:40
for example are also important item
00:19:43
analysis important factor analysis of
00:19:46
items SEM analyses and that's what
00:19:48
Barbara pushes and then secondarily
00:19:51
there I've got the SEM Fairness analyses
00:19:54
although one of my former students Steve
00:19:58
Sarita who many of you know he's talked
00:20:01
a lot he's actually used to do workshops
00:20:03
on using DIF in adapted and translated
00:20:06
measures but more recently has come up
00:20:09
with the idea it's probably not
00:20:10
appropriate to do different ala C's
00:20:12
across versions because what you're
00:20:15
doing is you're confounding two
00:20:17
variables with no ability to separate
00:20:19
them you have group differences and
00:20:21
translation differences and those are
00:20:23
completely and totally confounded so you
00:20:25
can't separate them he and Swami Nathan
00:20:29
have written that up looking at norms
00:20:32
and and then there's the possibility of
00:20:33
linking and I should note that Linda and
00:20:36
Bill egg off I think Linda Kuk into the
00:20:38
logoff did probably the best-known
00:20:41
blinking study I think on the Spanish
00:20:44
version of the SAT to the English
00:20:45
version back maybe 20 years ago we're so
00:20:48
25 now beyond validity this came up
00:20:52
earlier there's the term of utility and
00:20:54
usefulness and and my favorite example
00:20:57
of this is the Canadian SAT now there's
00:21:00
probably only one or two people that
00:21:01
even know
00:21:02
wasn't in this room but in the in the
00:21:06
60s the Ontario Institute for studies of
00:21:10
Education decided they were going to
00:21:12
build an SAT an ETS send-up said a lot
00:21:15
of consultants to work with them and
00:21:17
they built a very nice Canadian SAT and
00:21:20
when they were all done the Canadian
00:21:24
government decided students shouldn't
00:21:26
pay for it the university should pay for
00:21:28
it if they want so all the costs were
00:21:30
going to be distributed to the
00:21:32
universities and at that point they
00:21:35
decided no one wanted to use it and so
00:21:37
it went away so all the development
00:21:38
costs were for naught and and that's in
00:21:41
my mind the classic case of poor utility
00:21:43
and poor planning but there are there
00:21:47
are other cases you have to decide is it
00:21:48
really worth doing this from a whole
00:21:50
variety of purposes and then the
00:21:52
question does it make sense to equate or
00:21:54
link tests across languages and perhaps
00:21:57
if the questions are really similar and
00:21:59
you have a lot of other information that
00:22:02
you know about it might make sense
00:22:06
slides get better okay
00:22:08
I thought it was making it easier for
00:22:10
people to sleep but let's see the
00:22:14
decision demands really very high level
00:22:18
of tests and psychometric equivalence
00:22:20
you must be convinced that those tests
00:22:23
are really highly comparable and most
00:22:25
acquainting designs have much more
00:22:27
rigorous requirements as Neil just
00:22:29
explained then we have in adaptation
00:22:31
studies and there's a good article in
00:22:35
measurement issues and practice entitled
00:22:38
problems and issues and linking
00:22:40
assessments across languages by Cerises
00:22:41
and others what we need to know and
00:22:45
we're adapting a measure is one are the
00:22:49
constructs equivalent you need to know
00:22:51
that even before you get into the
00:22:53
measure
00:22:53
itself then we have these same
00:22:55
constructs in different cultures are
00:22:56
they equally meaningful in different
00:22:58
cultures then are the tests equivalent
00:23:01
in those different cultures and are the
00:23:03
testing conditions and so forth the same
00:23:06
and all those really need to be
00:23:07
established Creuset noted that
00:23:10
adaptation errors are most prevalent
00:23:12
source of DIF and international
00:23:14
assessments and he said that we know
00:23:17
that even state the state there are some
00:23:18
particular differences but when you go
00:23:20
across countries there are huge
00:23:21
curricular differences then there are
00:23:24
cultural biases and translation errors
00:23:26
all of which cause postural issues now
00:23:32
the international test Commission bill
00:23:33
is is famous really for its tests
00:23:36
adaptation guidelines that came out a
00:23:38
few years ago in their second edition
00:23:40
and these are to promote good practice
00:23:43
and test an adaptation you may know that
00:23:45
ITC the International test Commission
00:23:47
was developed initially because of
00:23:50
European countries as as Europe became
00:23:53
the European Union they they you can now
00:23:59
move easily across countries and so
00:24:00
forth so that people need to take tests
00:24:02
in different languages and so they've
00:24:05
they've put out really simple
00:24:08
easy-to-understand guidelines and
00:24:10
they're all freely available and
00:24:12
downloadable they now have like six sets
00:24:13
of guidelines this is to ensure a level
00:24:17
playing field for testing across
00:24:18
national boundaries and to provide a
00:24:20
mechanism whereby test users can observe
00:24:23
their duty of care to the public without
00:24:25
regard to national boundaries I do
00:24:28
believe documentation is important and
00:24:31
that's one of those things that has
00:24:33
increasingly become difficult to find in
00:24:35
the test
00:24:35
lots of tests don't have manuals anymore
00:24:38
I know when camara told me a few years
00:24:41
ago that the college water decided well
00:24:43
we're doing the research but we don't
00:24:44
have to pull it all together into a
00:24:46
single book you know I think users need
00:24:50
information that's easily available and
00:24:53
so forth
00:24:55
now I'm gonna get into the adaptation
00:24:58
issue some of you may know the critical
00:25:01
thinking component of the ELA which
00:25:05
stands for let's say one good assessment
00:25:09
of higher education learning outcomes in
00:25:12
English it's not as well known as Pisa
00:25:14
and so forth
00:25:16
used the CLA the clergy collegiate
00:25:20
learning assessment which you also may
00:25:21
know is a outcomes assessment measure
00:25:24
used by some 1,300 colleges in the
00:25:26
United States it's now the CLA plus it
00:25:30
was based actually on a GRE model in a
00:25:32
sense it's a performance assessment
00:25:34
where you read three or four pages of
00:25:36
material and then you write an essay and
00:25:39
it is it used to be scored it isn't
00:25:42
anymore but it used to be scored in
00:25:43
English by the GREs assessment automated
00:25:49
assessment now burrows was was hired to
00:25:54
translate this into a variety of
00:25:56
languages or to work with National
00:25:58
Committee's South Korea Slovakia Egypt
00:26:00
Colombia and so forth and a few other
00:26:02
countries now it's an essay you read
00:26:05
this problem the problem that they used
00:26:07
internationally was that there's a two
00:26:11
legs there's a river between them you
00:26:13
want to harness water power as it goes
00:26:17
from one leg to the other across the
00:26:18
river but there's an endangered fish
00:26:20
that lives in
00:26:21
River and so there's no right answer to
00:26:24
this but the thought is you have to
00:26:25
write an essay that describes you're
00:26:27
sensitive to the fish and you understand
00:26:30
the need for power and things like that
00:26:32
and I should note it's a company and
00:26:34
it's a for-profit company that once the
00:26:37
harness to power so so we work with
00:26:42
these different countries with teams of
00:26:44
people in the country to work on those
00:26:46
translations now Slovakia as an example
00:26:48
was a Western country used to be part of
00:26:51
Czechoslovakia they're a NATO country
00:26:53
there were almost no problems there at
00:26:56
all it translated very easily into their
00:26:58
language it makes sense to them and so
00:27:00
forth
00:27:00
now we Richard eggleston had had done
00:27:03
the translation there the year before
00:27:05
and they have no rivers no legs the
00:27:10
students don't know anything about water
00:27:12
power and so the way the problem was
00:27:15
changed was this became a seagoing fish
00:27:19
and they were trying to harness ocean
00:27:20
power now that starts changing the
00:27:23
question it's now it's introducing a new
00:27:27
concept as opposed to a concept that
00:27:29
people may know about now Columbia was
00:27:33
another country and and Willie Solano
00:27:35
Flores worked with us on this one and
00:27:39
years ago I knew that when we talked
00:27:41
about turning the GRE into Spanish we
00:27:44
were told we'd need at least three
00:27:45
different versions of Spanish and indeed
00:27:47
Colombia needed a different version from
00:27:49
Mexico that had already translated it
00:27:52
and and it was it was mostly the same
00:27:56
but just different words being inserted
00:28:00
now then we get to South Korea South
00:28:03
Korea said they had a great adaptation a
00:28:07
great translation but our analysis of
00:28:10
the data showed that it made no sense
00:28:11
was almost random data it looked like
00:28:14
and yet so I was charged to find out why
00:28:18
this was not work
00:28:19
and it just so happened I had a doctoral
00:28:21
student was Lightman ETS intern by the
00:28:22
name of my son Li and she is now done
00:28:26
and she teaches in the California State
00:28:27
University system and she's read it and
00:28:31
said this doesn't make any sense because
00:28:33
there's no power companies in South
00:28:36
Korea the government supplies the power
00:28:38
and it's and it isn't something that
00:28:41
people pay for in the same kind of way
00:28:43
that they do in the United States so
00:28:46
they had to change the question and once
00:28:50
we changed it to the government suddenly
00:28:52
the data came out much better so they
00:28:55
had they have done more of a literal
00:28:57
translation of that component and it
00:28:59
just didn't work now then we get to
00:29:03
Egypt and I did this one with Willie
00:29:07
Solano by the way and we had an Arabic
00:29:10
version already based on Kuwait but we
00:29:13
were called equate spoke hi Arabic in
00:29:15
Egypt spoke low Arabic I'm not sure
00:29:17
about that differences but that happens
00:29:20
in a lot of countries and so we we knew
00:29:22
we had to at least do that now obviously
00:29:24
I'm like wait they have the Nile running
00:29:27
right through Cairo so they they know
00:29:29
rivers and lakes and we actually did
00:29:32
think aloud as much as you would do with
00:29:34
students with disabilities and we
00:29:36
watched two people it's interesting to
00:29:38
watch people and doing this in Arabic
00:29:39
when you don't speak it but we got
00:29:41
translations back and the biggest
00:29:44
problem they said is their power is also
00:29:47
provided by the government but they said
00:29:50
no one in the government would ever ask
00:29:52
for our input as a consultant it just
00:29:55
would never happen the government
00:29:57
believes it knows all the answers and
00:29:58
basically the bottom line is and I want
00:30:03
to just mention till we did this at the
00:30:05
tail end of the Revolution and there was
00:30:07
gunfire in the background while we were
00:30:09
doing this and it was and you know to
00:30:12
get into our hotel
00:30:14
wieners and that dog sniffers on the car
00:30:17
and stuff like that it was really quite
00:30:19
fascinating so their solution was to
00:30:23
make this into the United States that
00:30:25
you're a consultant to a company United
00:30:27
States doing this because they thought
00:30:28
in the United States people might
00:30:30
actually ask answer ask questions I'm
00:30:32
not for interest of time I'm not going
00:30:35
to go through these methodological
00:30:36
issues I'm told that I'm getting really
00:30:39
sure but our experience looking at some
00:30:42
adapted measures is that some users try
00:30:46
to translate the validation from English
00:30:48
and just say it's the same he's done
00:30:51
that for example we just believe the
00:30:53
violation research is the same as it was
00:30:55
in English sometimes they do it in a
00:30:58
couple of countries and then they assume
00:30:59
well it's if it's true in Mexico and
00:31:02
it's true in Spain well then it would be
00:31:03
true all over the world and that that's
00:31:05
problematic some scales even use the
00:31:08
same norms from the original language
00:31:11
let's see where they do have norms it's
00:31:13
usually a much smaller and less
00:31:14
representative sample that was true on
00:31:16
the e wall which was done in a very
00:31:18
disproportionate sample in Puerto Rico
00:31:20
and that there become lots of other fit
00:31:23
issues we believe osed uses national
00:31:27
expert committees they have double
00:31:29
translation which means two people are
00:31:30
actually translating the measure are two
00:31:32
groups and then they compare the two
00:31:34
translated measures for comparability
00:31:37
and they have either cross checks or
00:31:40
reconciliation we think again I'm going
00:31:46
to skip this one I think but there are
00:31:47
reasons why this is continuing on
00:31:51
context measures matters and we know in
00:31:54
some countries for example a huge
00:31:56
proportion like the United States
00:31:58
actually a huge proportion of people go
00:31:59
to college and universities if there are
00:32:01
some other countries where it might only
00:32:02
be five or ten percent of the population
00:32:04
so then you're Aaron you have very
00:32:07
unusual comparisons there's lots of
00:32:11
economic factors cultures perceptions
00:32:13
linguistic structures styles there are
00:32:17
countries
00:32:18
go to different universities for example
00:32:21
so my themes as a whole that adapted
00:32:25
measures have a huge appeal they have
00:32:27
great potential they have huge financial
00:32:29
work for testing companies but we need
00:32:32
to conduct more and better research on
00:32:34
adapted measures and I questioned
00:32:36
whether a lot of cross national cross
00:32:39
language and weightings will be possible
00:32:40
or even meaningful thank you very much
00:32:44
[Applause]