Explaining computerized English testing in plain English

app Languages
a pair of hands typing at a laptop

Research has shown that automated scoring can give more reliable and objective results than human examiners when evaluating a person’s mastery of English. This is because an automated scoring system is impartial, unlike humans, who can be influenced by irrelevant factors such as a test taker’s appearance or body language. Additionally, automated scoring treats regional accents equally, unlike human examiners who may favor accents they are more familiar with. Automated scoring also allows individual features of a spoken or written test question response to be analyzed independent of one another, so that a weakness in one area of language does not affect the scoring of other areas.

was created in response to the demand for a more accurate, objective, secure and relevant test of English. Our automated scoring system is a central feature of the test, and vital to ensuring the delivery of accurate, objective and relevant results – no matter who the test-taker is or where the test is taken.

Development and validation of the scoring system to ensure accuracy

PTE Academic’s automated scoring system was developed after extensive research and field testing. A prototype test was developed and administered to a sample of more than 10,000 test takers from 158 different countries, speaking 126 different native languages. This data was collected and used to train the automated scoring engines for both the written and spoken PTE Academic items.

To do this, multiple trained human markers assess each answer. Those results are used as the training material for machine learning algorithms, similar to those used by systems like Google Search or Apple’s Siri. The model makes initial guesses as to the scores each response should get, then consults the actual scores to see well how it did, adjusts itself in a few directions, then goes through the training set over and over again, adjusting and improving until it arrives at a maximally correct solution – a solution that ideally gets very close to predicting the set of human ratings.

Once trained up and performing at a high level, this model is used as a marking algorithm, able to score new responses just like human markers would. Correlations between scores given by this system and trained human markers are quite high. The standard error of measurement between app’s system and a human rater is less than that between one human rater and another – in other words, the machine scores are more accurate than those given by a pair of human raters, because much of the bias and unreliability has been squeezed out of them. In general, you can think of a machine scoring system as one that takes the best stuff out of human ratings, then acts like an idealized human marker.

app conducts scoring validation studies to ensure that the machine scores are consistently comparable to ratings given by skilled human raters. Here, a new set of test-taker responses (never seen by the machine) are scored by both human raters and by the automated scoring system. Research has demonstrated that the automated scoring technology underlying PTE Academic produces scores comparable to those obtained from careful human experts. This means that the automated system “acts” like a human rater when assessing test takers’ language skills, but does so with a machine's precision, consistency and objectivity.

Scoring speaking responses with app’s Ordinate technology

The spoken portion of PTE Academic is automatically scored using app’s Ordinate technology. Ordinate technology results from years of research in speech recognition, statistical modeling, linguistics and testing theory. The technology uses a proprietary speech processing system that is specifically designed to analyze and automatically score speech from fluent and second-language English speakers. The Ordinate scoring system collects hundreds of pieces of information from the test takers’ spoken responses in addition to just the words, such as pace, timing and rhythm, as well as the power of their voice, emphasis, intonation and accuracy of pronunciation. It is trained to recognize even somewhat mispronounced words, and quickly evaluates the content, relevance and coherence of the response. In particular, the meaning of the spoken response is evaluated, making it possible for these models to assess whether or not what was said deserves a high score.

Scoring writing responses with Intelligent Essay Assessor™ (IEA)

The written portion of PTE Academic is scored using the Intelligent Essay Assessor™ (IEA), an automated scoring tool powered by app’s state-of-the-art Knowledge Analysis Technologies™ (KAT) engine. Based on more than 20 years of research and development, the KAT engine automatically evaluates the meaning of text, such as an essay written by a student in response to a particular prompt. The KAT engine evaluates writing as accurately as skilled human raters using a proprietary application of the mathematical approach known as Latent Semantic Analysis (LSA). LSA evaluates the meaning of language by analyzing large bodies of relevant text and their meanings. Therefore, using LSA, the KAT engine can understand the meaning of text much like a human.

What aspects of English does PTE Academic assess?

Written scoring

Spoken scoring

  • Word choice
  • Grammar and mechanics
  • Progression of ideas
  • Organization
  • Style, tone
  • Paragraph structure
  • Development, coherence
  • Point of view
  • Task completion
  • Sentence mastery
  • Content
  • Vocabulary
  • Accuracy
  • Pronunciation
  • Intonation
  • Fluency
  • Expressiveness
  • Pragmatics

More blogs from app

  • A teacher stood at a board in a library with notes all over it, with his students in the background looking at him

    Mind the gap in your English lesson planning

    By Ehsan Gorji

    Professional English teachers love lesson planning. They can always teach a class using their full wardrobe of methods, techniques and games, but a detailed plan means they can deliver a richer and more modern lesson – after all, a teacher usually plans using their full potential.

    Whenever I observe a teacher in their classroom, I try to outline a sketch of their English lesson plan according to what is going on. I am careful to observe any 'magic moments' and deviations from the written plan and note them down separately. Some teachers seize these magic moments; others do not. Some teachers prepare a thorough lesson plan; others are happy with a basic to-do list. There are also teachers who have yet to believe the miracles a lesson plan could produce for them and therefore their sketch does not live up to expectations.

    The 'language chunks' mission

    After each classroom observation, I’ll have a briefing meeting with the English teacher. If the observation takes place in another city and we cannot arrange another face-to-face meeting, we’ll instead go online and discuss. At this point, I’ll elicit more about the teacher’s lesson plan and see to what extent I have been an accurate observer.

    I have found that Language Inspection is the most frequent gap in lesson planning by Iranian teachers. Most of them fully know what type of class they will teach; set SMART (Specific, Measurable, Attainable, Relevant and Timely) objectives; consider the probable challenges; prepare high-quality material; break the language systems into chunks and artistically engineer the lesson. Yet, they often do not consider how those language chunks will perform within a set class time – and their mission fails.

    The Language Inspection stage asks a teacher to go a bit further with their lesson planning and look at the level of difficulty of various pieces of content in the lesson. Is there enough balance so that students can successfully meet the lesson objectives? If the grammar, vocabulary and skills are all above a student’s ability, then the lesson will be too complex. Language Inspection allows a thoughtful teacher to closely align the objective with the difficulty of the grammar, vocabulary and skill. A bit like a train running along a fixed track, Language Inspection can help make sure that our lessons run smoothly.

    Lesson planning made easy with the GSE Teacher Toolkit

    If a lesson consists of some or many language chunks, those are the vocabulary, grammar and learning objectives we expect to be made into learning outcomes by the end of the class or course. While Language Analysis in a lesson plan reveals the vocabulary, grammar and learning objectives, in Language Inspection each chunk is examined to determine what they really do and how they can be presented and, more importantly, to assess the learning outcomes required.

    can be a teacher’s faithful lesson-planning pal – especially when it comes to Language Inspection. It’s simple to use, yet modern and exciting. It is detailed and it delivers everything you need.

    To use it, all you need is an internet connection on your mobile, tablet, laptop or PC. Launch and you’ll have the ability to delve into the heart of your lesson. You’ll be able to identify any gaps in a lesson – much like the same way you can see the gap between a train and a platforms edge. Mind the gap! You can look into the darkness of this gap and ask yourself: “Does this grammar form belong in this lesson? Do I need to fit in some vocabulary to fill up this blank space? Is it time to move forward in my schedule because my students are mastering this skill early?”

    gives you the ability to assess your lesson to look for these gaps – whether small or big – in your teaching. By doing this you can plan thoughtfully and clearly to support your students. It really is an opportunity to 'mind the gap' in your English lesson planning.