Explaining computerized English testing in plain English

app Languages
a pair of hands typing at a laptop

Research has shown that automated scoring can give more reliable and objective results than human examiners when evaluating a person’s mastery of English. This is because an automated scoring system is impartial, unlike humans, who can be influenced by irrelevant factors such as a test taker’s appearance or body language. Additionally, automated scoring treats regional accents equally, unlike human examiners who may favor accents they are more familiar with. Automated scoring also allows individual features of a spoken or written test question response to be analyzed independent of one another, so that a weakness in one area of language does not affect the scoring of other areas.

was created in response to the demand for a more accurate, objective, secure and relevant test of English. Our automated scoring system is a central feature of the test, and vital to ensuring the delivery of accurate, objective and relevant results – no matter who the test-taker is or where the test is taken.

Development and validation of the scoring system to ensure accuracy

PTE Academic’s automated scoring system was developed after extensive research and field testing. A prototype test was developed and administered to a sample of more than 10,000 test takers from 158 different countries, speaking 126 different native languages. This data was collected and used to train the automated scoring engines for both the written and spoken PTE Academic items.

To do this, multiple trained human markers assess each answer. Those results are used as the training material for machine learning algorithms, similar to those used by systems like Google Search or Apple’s Siri. The model makes initial guesses as to the scores each response should get, then consults the actual scores to see well how it did, adjusts itself in a few directions, then goes through the training set over and over again, adjusting and improving until it arrives at a maximally correct solution – a solution that ideally gets very close to predicting the set of human ratings.

Once trained up and performing at a high level, this model is used as a marking algorithm, able to score new responses just like human markers would. Correlations between scores given by this system and trained human markers are quite high. The standard error of measurement between app’s system and a human rater is less than that between one human rater and another – in other words, the machine scores are more accurate than those given by a pair of human raters, because much of the bias and unreliability has been squeezed out of them. In general, you can think of a machine scoring system as one that takes the best stuff out of human ratings, then acts like an idealized human marker.

app conducts scoring validation studies to ensure that the machine scores are consistently comparable to ratings given by skilled human raters. Here, a new set of test-taker responses (never seen by the machine) are scored by both human raters and by the automated scoring system. Research has demonstrated that the automated scoring technology underlying PTE Academic produces scores comparable to those obtained from careful human experts. This means that the automated system “acts” like a human rater when assessing test takers’ language skills, but does so with a machine's precision, consistency and objectivity.

Scoring speaking responses with app’s Ordinate technology

The spoken portion of PTE Academic is automatically scored using app’s Ordinate technology. Ordinate technology results from years of research in speech recognition, statistical modeling, linguistics and testing theory. The technology uses a proprietary speech processing system that is specifically designed to analyze and automatically score speech from fluent and second-language English speakers. The Ordinate scoring system collects hundreds of pieces of information from the test takers’ spoken responses in addition to just the words, such as pace, timing and rhythm, as well as the power of their voice, emphasis, intonation and accuracy of pronunciation. It is trained to recognize even somewhat mispronounced words, and quickly evaluates the content, relevance and coherence of the response. In particular, the meaning of the spoken response is evaluated, making it possible for these models to assess whether or not what was said deserves a high score.

Scoring writing responses with Intelligent Essay Assessor™ (IEA)

The written portion of PTE Academic is scored using the Intelligent Essay Assessor™ (IEA), an automated scoring tool powered by app’s state-of-the-art Knowledge Analysis Technologies™ (KAT) engine. Based on more than 20 years of research and development, the KAT engine automatically evaluates the meaning of text, such as an essay written by a student in response to a particular prompt. The KAT engine evaluates writing as accurately as skilled human raters using a proprietary application of the mathematical approach known as Latent Semantic Analysis (LSA). LSA evaluates the meaning of language by analyzing large bodies of relevant text and their meanings. Therefore, using LSA, the KAT engine can understand the meaning of text much like a human.

What aspects of English does PTE Academic assess?

Written scoring

Spoken scoring

  • Word choice
  • Grammar and mechanics
  • Progression of ideas
  • Organization
  • Style, tone
  • Paragraph structure
  • Development, coherence
  • Point of view
  • Task completion
  • Sentence mastery
  • Content
  • Vocabulary
  • Accuracy
  • Pronunciation
  • Intonation
  • Fluency
  • Expressiveness
  • Pragmatics

More blogs from app

  • A young child sat at a desk in a classroom writing

    Grammar: how to tame the unruly beast

    Por Simon Buckland

    “Grammar, which knows how to control even kings”- ѴDZè

    When you think of grammar, “rule” is probably the first word that pops into your mind. Certainly the traditional view of grammar is that it’s about the “rules of language”. Indeed, not so long ago, teaching a language meant just teaching grammatical rules, plus perhaps a few vocabulary lists. However, I’m going to suggest that there’s actually no such thing as a grammatical rule.

    To show you what I mean, let’s take the comparative of adjectives: “bigger”, “smaller”, “more useful”, “more interesting”, etc. We might start with a simple rule: for adjectives with one syllable, add -er, and for adjectives with two or more syllables, use more + adjective.

    But this doesn’t quite work: yes, we say “more useful”, but we also say “cleverer”, and “prettier”. OK then, suppose we modify the rule. Let’s also say that for two-syllable adjectives ending in -y or -er you add -er.

    Unfortunately, this doesn’t quite work either: we do say “cleverer”, but we also say “more sober” and “more proper”. And there are problems with some of the one-syllable adjectives too: we say “more real” and “more whole” rather than “realer” or “wholer”. If we modify the rule to fit these exceptions, it will be half a page long, and anyway, if we keep looking we’ll find yet more exceptions. This happens repeatedly in English grammar. Very often, rules seem so full of exceptions that they’re just not all that helpful.

    And there’s another big problem with the “rule approach”: it doesn’t tell you what the structure is actually used for, even with something as obvious as the comparative of adjectives. You might assume that it’s used for comparing things: “My house is smaller than Mary’s”; “John is more attractive than Stephen”. But look at this: “The harder you work, the more money you make.” Or this: “London is getting more and more crowded.” Both sentences use comparative adjectives, but they’re not directly comparing two things.

    What we’re actually looking at here is not a rule but several overlapping patterns, or paradigms to use the correct technical term:

    1. adjective + -er + than
    2. more + adjective + than
    3. parallel comparative adjectives: the + comparative adjective 1 … the + comparative adjective 2
    4. repeated comparative adjective: adjective + -er + and + adjective + -er/more and more + adjective

    This picture is more accurate, but it looks abstract and technical. It’s a long way from what we actually teach these days and the way we teach it, which tends to be organized around learning objectives and measurable outcomes, such as: “By the end of this lesson (or module) my students should be able to compare their own possessions with someone else’s possessions”. So we’re not teaching our students to memorize a rule or even to manipulate a pattern; we’re teaching them to actually do something in the real world. And, of course, we’re teaching it at a level appropriate for the student’s level.

    So, to come back to grammar, once we’ve established our overall lesson or module objective, here are some of the things we’re going to need to know.

    • What grammatical forms (patterns) can be used to express this objective?
    • Which ones are appropriate for the level of my students? Are there some that they should already know, or should I teach them in this lesson?
    • What do the forms look like in practice? What would be some good examples?

    Existing grammar textbooks generally don’t provide all this information; in particular, they’re very vague about level. Often they don’t even put grammar structures into specific CEFR levels but into a range, e.g. A1/A2 or A2/B1, and none fully integrates grammar with overall learning objectives.

    At app, we’ve set ourselves the goal of addressing these issues by developing a new type of grammar resource for English teachers and learners that:

    • Is based on the Global Scale of English with its precise gradation of developing learner proficiency
    • Is built on the Council of Europe language syllabuses, linking grammar to CEFR level and to language functions
    • Uses international teams of language experts to review the structures and assess their levels

    We include grammar in the GSE Teacher Toolkit, and you can use it to:

    • Search for grammar structures either by GSE or CEFR level
    • Search for grammar structures by keyword or grammatical category/part of speech
    • Find out at which level a given grammar structure should be taught
    • Find out which grammar structures support a given learning objective
    • Find out which learning objectives are related to a given grammar structure
    • Get examples for any given grammar structure
    • Get free teaching materials for many of the grammar structures

    Think of it as an open-access resource for anyone teaching English and designing a curriculum.