Explaining computerized English testing in plain English

app Languages
a pair of hands typing at a laptop

Research has shown that automated scoring can give more reliable and objective results than human examiners when evaluating a person’s mastery of English. This is because an automated scoring system is impartial, unlike humans, who can be influenced by irrelevant factors such as a test taker’s appearance or body language. Additionally, automated scoring treats regional accents equally, unlike human examiners who may favor accents they are more familiar with. Automated scoring also allows individual features of a spoken or written test question response to be analyzed independent of one another, so that a weakness in one area of language does not affect the scoring of other areas.

was created in response to the demand for a more accurate, objective, secure and relevant test of English. Our automated scoring system is a central feature of the test, and vital to ensuring the delivery of accurate, objective and relevant results – no matter who the test-taker is or where the test is taken.

Development and validation of the scoring system to ensure accuracy

PTE Academic’s automated scoring system was developed after extensive research and field testing. A prototype test was developed and administered to a sample of more than 10,000 test takers from 158 different countries, speaking 126 different native languages. This data was collected and used to train the automated scoring engines for both the written and spoken PTE Academic items.

To do this, multiple trained human markers assess each answer. Those results are used as the training material for machine learning algorithms, similar to those used by systems like Google Search or Apple’s Siri. The model makes initial guesses as to the scores each response should get, then consults the actual scores to see well how it did, adjusts itself in a few directions, then goes through the training set over and over again, adjusting and improving until it arrives at a maximally correct solution – a solution that ideally gets very close to predicting the set of human ratings.

Once trained up and performing at a high level, this model is used as a marking algorithm, able to score new responses just like human markers would. Correlations between scores given by this system and trained human markers are quite high. The standard error of measurement between app’s system and a human rater is less than that between one human rater and another – in other words, the machine scores are more accurate than those given by a pair of human raters, because much of the bias and unreliability has been squeezed out of them. In general, you can think of a machine scoring system as one that takes the best stuff out of human ratings, then acts like an idealized human marker.

app conducts scoring validation studies to ensure that the machine scores are consistently comparable to ratings given by skilled human raters. Here, a new set of test-taker responses (never seen by the machine) are scored by both human raters and by the automated scoring system. Research has demonstrated that the automated scoring technology underlying PTE Academic produces scores comparable to those obtained from careful human experts. This means that the automated system “acts” like a human rater when assessing test takers’ language skills, but does so with a machine's precision, consistency and objectivity.

Scoring speaking responses with app’s Ordinate technology

The spoken portion of PTE Academic is automatically scored using app’s Ordinate technology. Ordinate technology results from years of research in speech recognition, statistical modeling, linguistics and testing theory. The technology uses a proprietary speech processing system that is specifically designed to analyze and automatically score speech from fluent and second-language English speakers. The Ordinate scoring system collects hundreds of pieces of information from the test takers’ spoken responses in addition to just the words, such as pace, timing and rhythm, as well as the power of their voice, emphasis, intonation and accuracy of pronunciation. It is trained to recognize even somewhat mispronounced words, and quickly evaluates the content, relevance and coherence of the response. In particular, the meaning of the spoken response is evaluated, making it possible for these models to assess whether or not what was said deserves a high score.

Scoring writing responses with Intelligent Essay Assessor™ (IEA)

The written portion of PTE Academic is scored using the Intelligent Essay Assessor™ (IEA), an automated scoring tool powered by app’s state-of-the-art Knowledge Analysis Technologies™ (KAT) engine. Based on more than 20 years of research and development, the KAT engine automatically evaluates the meaning of text, such as an essay written by a student in response to a particular prompt. The KAT engine evaluates writing as accurately as skilled human raters using a proprietary application of the mathematical approach known as Latent Semantic Analysis (LSA). LSA evaluates the meaning of language by analyzing large bodies of relevant text and their meanings. Therefore, using LSA, the KAT engine can understand the meaning of text much like a human.

What aspects of English does PTE Academic assess?

Written scoring

Spoken scoring

  • Word choice
  • Grammar and mechanics
  • Progression of ideas
  • Organization
  • Style, tone
  • Paragraph structure
  • Development, coherence
  • Point of view
  • Task completion
  • Sentence mastery
  • Content
  • Vocabulary
  • Accuracy
  • Pronunciation
  • Intonation
  • Fluency
  • Expressiveness
  • Pragmatics

More blogs from app

  • A couple sat on a sofa one with a laptop the other with a book; they are both laughing

    How English conversation works

    Por Richard Cleeve

    English language teachers everywhere spend time and energy helping students practice their conversation skills. Some may ask whether conversation in English can actually be taught. And – if it can – what the rules might be.

    To explore these questions, we spoke to world-renowned . He is an Honorary Professor of Linguistics at the University of Bangor and has written more than 120 books on the subject.

    What makes a good conversation?

    “It’s very important that we put this everyday use of language under the microscope,” he says. He highlights three critical facets of conservation that we should bring into focus:

    • Fluency
    • Intelligibility
    • Appropriateness

    But all in all, he says that people should walk away from a conversation feeling like they’ve had a good chat.

    “For the most part, people want that kind of mutual respect, mutual opportunity, and have some sort of shared topic about which they feel comfortable – and these are the basics I think.”

    The rules of conversation

    There are plenty of ways you can teach learners to engage in a successful conversation – including how to speak informally, use intonation, and provide feedback. So let’s take a look at some of the key areas to focus on:

    1) Appropriateness

    Fluency and intelligibility are commonly covered in English language classes. But appropriateness can be more complicated to teach. When preparing to teach conversational appropriateness, we can look at it through two different lenses: subject matter and style:

    2) Subject matter

    “What subject matter is appropriate to use to get a conversation off the ground? There are cultural differences here,” he says. The weather is often a good icebreaker, since everyone is affected by it. The key is to find a common topic that all participants can understand and engage with.

    3) Style

    Teachers can also teach students about conversational style, focusing on how to make conversations more relaxed in English.

    There are “several areas of vocabulary and grammar – and pronunciation too, intonation for example – as well as body language, in which the informality of a conversation is expressed through quite traditional means,” says David. One example he offers is teaching students how to use contracted verb forms.

    4) Simultaneous feedback

    This is what makes a conversation tick. When we talk with someone, we let them know we’re listening by giving them feedback. We say things like “really” or “huh” and use body language like facial expressions and gestures.

    Of course, these feedback noises and expressions can be taught. But they won’t necessarily be new to students. English learners do the same when speaking their own language, anyway.

    Keep in mind though, that when it comes to speaking online on video conferencing platforms, it’s not easy to give this type of simultaneous feedback. People’s microphones might be on mute or there might be a delay, which makes reacting in conversations awkward. So, says David, this means online conversations become much more like monologues.

    5) Uptalk and accents

    Uptalk is when a person declares something in a sentence, but raises their intonation at the end. For English learners, it might sound like someone is asking a question.

    Here’s an example:

    • “I live in Holyhead” said in a flat tone – this is a statement.
    • “I live in Holyhead” said using uptalk – you are stating you live here, but recognize that someone else might not know where it is.

    Now, should teachers teach uptalk? David says yes. For one, it’s fashionable to speak this way – and it can be confusing for English learners if they don’t understand why it’s being used in a conversation.

    “The other thing is that we are dealing here with a genuine change in the language. One of the biggest problems for all language teachers is to keep up to date with language changes. And language change can be very fast and is at the moment,” he says.

    When it comes to accents, David is a fan. “It’s like being in a garden of flowers. Enjoy all the linguistic flowers,” he says, “That’s the beauty of language, its diversity”.