Explaining computerized English testing in plain English

app Languages
a pair of hands typing at a laptop

Research has shown that automated scoring can give more reliable and objective results than human examiners when evaluating a person’s mastery of English. This is because an automated scoring system is impartial, unlike humans, who can be influenced by irrelevant factors such as a test taker’s appearance or body language. Additionally, automated scoring treats regional accents equally, unlike human examiners who may favor accents they are more familiar with. Automated scoring also allows individual features of a spoken or written test question response to be analyzed independent of one another, so that a weakness in one area of language does not affect the scoring of other areas.

was created in response to the demand for a more accurate, objective, secure and relevant test of English. Our automated scoring system is a central feature of the test, and vital to ensuring the delivery of accurate, objective and relevant results – no matter who the test-taker is or where the test is taken.

Development and validation of the scoring system to ensure accuracy

PTE Academic’s automated scoring system was developed after extensive research and field testing. A prototype test was developed and administered to a sample of more than 10,000 test takers from 158 different countries, speaking 126 different native languages. This data was collected and used to train the automated scoring engines for both the written and spoken PTE Academic items.

To do this, multiple trained human markers assess each answer. Those results are used as the training material for machine learning algorithms, similar to those used by systems like Google Search or Apple’s Siri. The model makes initial guesses as to the scores each response should get, then consults the actual scores to see well how it did, adjusts itself in a few directions, then goes through the training set over and over again, adjusting and improving until it arrives at a maximally correct solution – a solution that ideally gets very close to predicting the set of human ratings.

Once trained up and performing at a high level, this model is used as a marking algorithm, able to score new responses just like human markers would. Correlations between scores given by this system and trained human markers are quite high. The standard error of measurement between app’s system and a human rater is less than that between one human rater and another – in other words, the machine scores are more accurate than those given by a pair of human raters, because much of the bias and unreliability has been squeezed out of them. In general, you can think of a machine scoring system as one that takes the best stuff out of human ratings, then acts like an idealized human marker.

app conducts scoring validation studies to ensure that the machine scores are consistently comparable to ratings given by skilled human raters. Here, a new set of test-taker responses (never seen by the machine) are scored by both human raters and by the automated scoring system. Research has demonstrated that the automated scoring technology underlying PTE Academic produces scores comparable to those obtained from careful human experts. This means that the automated system “acts” like a human rater when assessing test takers’ language skills, but does so with a machine's precision, consistency and objectivity.

Scoring speaking responses with app’s Ordinate technology

The spoken portion of PTE Academic is automatically scored using app’s Ordinate technology. Ordinate technology results from years of research in speech recognition, statistical modeling, linguistics and testing theory. The technology uses a proprietary speech processing system that is specifically designed to analyze and automatically score speech from fluent and second-language English speakers. The Ordinate scoring system collects hundreds of pieces of information from the test takers’ spoken responses in addition to just the words, such as pace, timing and rhythm, as well as the power of their voice, emphasis, intonation and accuracy of pronunciation. It is trained to recognize even somewhat mispronounced words, and quickly evaluates the content, relevance and coherence of the response. In particular, the meaning of the spoken response is evaluated, making it possible for these models to assess whether or not what was said deserves a high score.

Scoring writing responses with Intelligent Essay Assessor™ (IEA)

The written portion of PTE Academic is scored using the Intelligent Essay Assessor™ (IEA), an automated scoring tool powered by app’s state-of-the-art Knowledge Analysis Technologies™ (KAT) engine. Based on more than 20 years of research and development, the KAT engine automatically evaluates the meaning of text, such as an essay written by a student in response to a particular prompt. The KAT engine evaluates writing as accurately as skilled human raters using a proprietary application of the mathematical approach known as Latent Semantic Analysis (LSA). LSA evaluates the meaning of language by analyzing large bodies of relevant text and their meanings. Therefore, using LSA, the KAT engine can understand the meaning of text much like a human.

What aspects of English does PTE Academic assess?

Written scoring

Spoken scoring

  • Word choice
  • Grammar and mechanics
  • Progression of ideas
  • Organization
  • Style, tone
  • Paragraph structure
  • Development, coherence
  • Point of view
  • Task completion
  • Sentence mastery
  • Content
  • Vocabulary
  • Accuracy
  • Pronunciation
  • Intonation
  • Fluency
  • Expressiveness
  • Pragmatics

More blogs from app

  • A woman holding a tablet stood in a server room

    The role of AI in English assessment

    Por Jennifer Manning

    Digital assessment is becoming more and more widespread in recent years. But what’s the role of digital assessment in teaching today? We’d like to give you some insight into digital assessment and automated scoring.

    Just a few years ago, there may have been doubts about the role of AI in English assessment and the ability of a computer to score language tests accurately. But today, thousands of teachers worldwide use automated language tests to assess their students’ language proficiency.

    For example, app’s suite of Versant tests have been delivering automated language assessments for nearly 25 years. And since its launch in 1996, over 350 million tests have been scored. The same technology is used in app’s Benchmark and Level tests.

    So what makes automated scoring systems so reliable?

    Huge data sets of exam answers and results are used to train artificial intelligence machine learning technology to score English tests the same way that human markers do. This way, we’re not replacing human judgment; we’re just teaching computers to replicate it.

    Of course, computers are much more efficient than humans. They don’t mind monotonous work and don’t make mistakes (the standard marking error of an AI-scored test is lower than that of a human-scored test). So we can get unbiased, accurate, and consistent scores.

    The top benefits of automated scoring are speed, reliability, flexibility, and free from bias.

    Speed

    The main advantage computers have over humans is that they can quickly process complex information. Digital assessments can often provide an instant score turnaround. We can get accurate, reliable results within minutes. And that’s not just for multiple-choice answers but complex responses, too.

    The benefit for teachers and institutions is that they can have hundreds, thousands, or tens of thousands of learners taking a test simultaneously and instantly receive a score.

    The sooner you have scores, the sooner you can make decisions about placement and students’ language level or benchmark a learner’s strengths and weaknesses and make adjustments to learning that drive improvement and progress.

    Flexibility

    The next biggest benefit of digital assessment is flexible delivery models. This has become increasingly more important since online learning has become more prominent.

    Accessibility became key: how can your institution provide access to assessment for your learners, if you can’t deliver tests on school premises?

    The answer is digital assessment.

    For example, Versant, our web-based test can be delivered online or offline, on-site or off-site. All test-takers need is a computer and a headset with a microphone. They can take the test anywhere, any time of day, any day of the week, making it very flexible to fit into someone's schedule or situation.

    Free from bias

    Impartiality is another important benefit of AI-based scoring. The AI engine used to score digital proficiency tests is completely free from bias. It doesn’t get tired, and it doesn’t have good and bad days like human markers do. And it doesn’t have a personality.

    While some human markers are more generous and others are more strict, AI is always equally fair. Thanks to this, automated scoring provides consistent, standardized scores, no matter who’s taking the test.

    If you’re testing students from around the world, with different backgrounds, they will be scored solely on their level of English, in a perfectly objective way.

    Additional benefits of automated scoring are security and cost.

    Security

    Digital assessments are more difficult to monitor than in-person tests, so security is a valid concern. One way to deal with this is remote monitoring.

    Remote proctoring adds an extra layer of security, so test administrators can be confident that learners taking the test from home don’t cheat.

    For example, our software captures a video of test takers, and the AI detection system automatically flags suspicious test-taker behavior. Test administrators can access the video anytime for audits and reviews, and easily find suspicious segments highlighted by our AI.

    Here are a few examples of suspicious behavior that our system might flag:

    Image monitoring:

    • A different face or multiple faces appearing in the frame
    • Camera blocked

    Browser monitoring:

    • Navigating away from the test window or changing tabs multiple times

    Video monitoring:

    • Test taker moving out of camera view
    • More than one person in the camera view
    • Looking away from the camera multiple times

    Cost

    Last but not least, the cost of automated English certifications are a benefit. Indeed, automated scoring can be a more cost-effective way of monitoring tests, primarily because it saves time and resources.

    app English proficiency assessments are highly scalable and don’t require extra time from human scorers, no matter how many test-takers you have.

    Plus, there’s no need to spend time and money on training markers or purchasing equipment.

    AI is helping to lead the way with efficient, accessible, fair and cost-effective English test marking/management. Given time it should develop even further, becoming even more advanced and being of even more help within the world of English language learning and assessments.

  • A woman and a man talking together

    4 ways to improve your students' intelligibility

    Por

    Intelligibility is the art of being understood by others. Many students think they need to speak a language flawlessly and with a native-like accent to make themselves clear, but this is not quite true.

    While there is a correlation between proficiency and intelligibility, even students of lower general proficiency are capable of expressing what they mean, in a way that the listener understands, if they are using good intelligibility practices.

    Being understandable in a second language is often extremely important in work environments, especially as the world becomes more connected and job markets more competitive.

    Intelligibility is a vital aspect of communication but it is not exactly a skill in itself. Instead, it is a combination of fluency, pronunciation, and managing your speed of speech. To reflect how important this is for language learners when studying, traveling or at work, we use an Intelligibility Index as part of our Versant English Test scoring.

    This index is based on factors affecting how understandable speech is to fluent English speakers. These include things like speed, clarity, pronunciation and fluency. Ranging from 1 (low) to 5 (high), the Intelligibility Index shows how intelligible someone’s speech in English is likely to be in a real-world situation.

    Let’s go into some activities and exercises you can try in class to help your students improve their intelligibility with their English and speaking skills.

  • Children singing in a line holding song sheets, with a teacher singing facing them

    How to incorporate music into the classroom

    Por app Languages

    Learning English with music can enhance learning and create a more engaging and dynamic classroom environment. In a previous post, we discussed if music can help you learn a language; this post looks at how music can be incorporated into the classroom.

    Using music in your classroom can help improve student motivation, focus, and retention of information. Here are some ways you can use music to enhance your classroom teaching: