Can computers really mark exams? Benefits of ELT automated assessments

app Languages
Hands typing at a laptop with symbols

Automated assessment, including the use of Artificial Intelligence (AI), is one of the latest education tech solutions. It speeds up exam marking times, removes human biases, and is as accurate and at least as reliable as human examiners. As innovations go, this one is a real game-changer for teachers and students. 

However, it has understandably been met with many questions and sometimes skepticism in the ELT community – can computers really mark speaking and writing exams accurately? 

The answer is a resounding yes. Students from all parts of the world already take AI-graded tests.  aԻ Versanttests – for example – provide unbiased, fair and fast automated scoring for speaking and writing exams – irrespective of where the test takers live, or what their accent or gender is. 

This article will explain the main processes involved in AI automated scoring and make the point that AI technologies are built on the foundations of consistent expert human judgments. So, let’s clear up the confusion around automated scoring and AI and look into how it can help teachers and students alike. 

AI versus traditional automated scoring

First of all, let’s distinguish between traditional automated scoring and AI. When we talk about automated scoring, generally, we mean scoring items that are either multiple-choice or cloze items. You may have to reorder sentences, choose from a drop-down list, insert a missing word- that sort of thing. These question types are designed to test particular skills and automated scoring ensures that they can be marked quickly and accurately every time.

While automatically scored items like these can be used to assess receptive skills such as listening and reading comprehension, they cannot mark the productive skills of writing and speaking. Every student's response in writing and speaking items will be different, so how can computers mark them?

This is where AI comes in. 

We hear a lot about how AI is increasingly being used in areas where there is a need to deal with large amounts of unstructured data, effectively and 100% accurately – like in medical diagnostics, for example. In language testing, AI uses specialized computer software to grade written and oral tests. 

How AI is used to score speaking exams

The first step is to build an acoustic model for each language that can recognize speech and convert it into waveforms and text. While this technology used to be very unusual, most of our smartphones can do this now. 

These acoustic models are then trained to score every single prompt or item on a test. We do this by using human expert raters to score the items first, using double marking. They score hundreds of oral responses for each item, and these ‘Standards’ are then used to train the engine. 

Next, we validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. If this doesn’t happen for any item, we remove it, as it must match the standard set by human markers. We expect a correlation of between .95-.99. That means that tests will be marked between 95-99% exactly the same as human-marked samples. 

This is incredibly high compared to the reliability of human-marked speaking tests. In essence, we use a group of highly expert human raters to train the AI engine, and then their standard is replicated time after time.  

How AI is used to score writing exams

Our AI writing scoring uses a technology called . LSA is a natural language processing technique that can analyze and score writing, based on the meaning behind words – and not just their superficial characteristics. 

Similarly to our speech recognition acoustic models, we first establish a language-specific text recognition model. We feed a large amount of text into the system, and LSA uses artificial intelligence to learn the patterns of how words relate to each other and are used in, for example, the English language. 

Once the language model has been established, we train the engine to score every written item on a test. As in speaking items, we do this by using human expert raters to score the items first, using double marking. They score many hundreds of written responses for each item, and these ‘Standards’ are then used to train the engine. We then validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. 

The benchmark is always the expert human scores. If our AI system doesn’t closely match the scores given by human markers, we remove the item, as it is essential to match the standard set by human markers.

AI’s ability to mark multiple traits 

One of the challenges human markers face in scoring speaking and written items is assessing many traits on a single item. For example, when assessing and scoring speaking, they may need to give separate scores for content, fluency and pronunciation. 

In written responses, markers may need to score a piece of writing for vocabulary, style and grammar. Effectively, they may need to mark every single item at least three times, maybe more. However, once we have trained the AI systems on every trait score in speaking and writing, they can then mark items on any number of traits instantaneously – and without error. 

AI’s lack of bias

A fundamental premise for any test is that no advantage or disadvantage should be given to any candidate. In other words, there should be no positive or negative bias. This can be very difficult to achieve in human-marked speaking and written assessments. In fact, candidates often feel they may have received a different score if someone else had heard them or read their work.

Our AI systems eradicate the issue of bias. This is done by ensuring our speaking and writing AI systems are trained on an extensive range of human accents and writing types. 

We don’t want perfect native-speaking accents or writing styles to train our engines. We use representative non-native samples from across the world. When we initially set up our AI systems for speaking and writing scoring, we trialed our items and trained our engines using millions of student responses. We continue to do this now as new items are developed.

The benefits of AI automated assessment

There is nothing wrong with hand-marking homework tests and exams. In fact, it is essential for teachers to get to know their students and provide personal feedback and advice. However, manually correcting hundreds of tests, daily or weekly, can be repetitive, time-consuming, not always reliable and takes time away from working alongside students in the classroom. The use of AI in formative and summative assessments can increase assessed practice time for students and reduce the marking load for teachers.

Language learning takes time, lots of time to progress to high levels of proficiency. The blended use of AI can:

  • address the increasing importance of formative assessmentto drive personalized learning and diagnostic assessment feedback 

  • allow students to practice and get instant feedback inside and outside of allocated teaching time

  • address the issue of teacher workload

  • create a virtuous combination between humans and machines, taking advantage of what humans do best and what machines do best. 

  • provide fair, fast and unbiased summative assessment scores in high-stakes testing.

We hope this article has answered a few burning questions about how AI is used to assess speaking and writing in our language tests. An interesting quote from Fei-Fei Li, Chief scientist at Google and Stanford Professor describes AI like this:

“I often tell my students not to be misled by the name ‘artificial intelligence’ — there is nothing artificial about it; A.I. is made by humans, intended to behave [like] humans and, ultimately, to impact human lives and human society.”

AI in formative and summative assessments will never replace the role of teachers. AI will support teachers, provide endless opportunities for students to improve, and provide a solution to slow, unreliable and often unfair high-stakes assessments.

Examples of AI assessments in ELT

At app, we have developed a range of assessments using AI technology.

Versant

The Versant tests are a great tool to help establish language proficiency benchmarks in any school, organization or business. They are specifically designed for placement tests to determine the appropriate level for the learner.

PTE Academic

The  is aimed at those who need to prove their level of English for a university place, a job or a visa. It uses AI to score tests and results are available within five days. 

app English International Certificate (PEIC)

app English International Certificate (PEIC) also uses automated assessment technology. With a two-hour test available on-demand to take at home or at school (or at a secure test center). Using a combination of advanced speech recognition and exam grading technology and the expertise of professional ELT exam markers worldwide, our patented software can measure English language ability.

Read more about the use of AI in our learning and testing here, or if you're wondering which English test is right for your students make sure to check out our post 'Which exam is right for my students?'.

More blogs from app

  • A young person in front of a laptop with headphones

    Tips for effective online classroom management

    Por app Languages

    Online language learning and teaching brings with it a lot of things to think about. The following tips are designed to help you plan your primary-level online classes effectively and manage students in a digital environment.

    1. Keep energy levels high

    The school environment is an active and incredibly social space. It’s hard to replicate this online, potentially leading to boredom and frustration among your students. For this reason, you should take regular 'movement breaks' during the day to energize them. You can do the following quick sequence sitting or standing:

    • Stretch your arms above your head and reach for the sky.
    • Count to ten.
    • Drop your left arm to your side and bend to your left while stretching your right arm over your head.
    • Count to fifteen.
    • Come back to an upright position and stretch both arms above your head.
    • Count to ten.
    • Drop your right arm to your side and bend to your right while stretching your left arm over your head.
    • Count to fifteen.
    • Come back to an upright position and stretch both arms above your head.
    • Count to ten.
    • Lean forward until your fingertips touch the floor (only go as far as is comfortable for your body), then cross your arms and release your head so it hangs gently between your legs.
    • Count to fifteen.
    • Come back upright, shake your arms and legs, and get back to work!

    This excellent energy booster allows your students to revise parts of the body, commands and even make the link with other subjects.

    2. Encourage casual socialisation

    Small talk and gossip are fundamental parts of the regular school day. It’s essential to give students a few minutes to chat freely. It will help them feel relaxed and make your classes more comfortable.

    Let your students do this in whatever language they want and don’t get involved, just like at school. Alternatively, ask someone to share a YouTube video, song, Instagram, or TikTok post in a digital show and tell.

    3. Encourage the use of functional language

    After students have been chatting freely in their own language, take the opportunity to bring in functional language depending on the subject they were talking about in English. This will help get them ready for the lesson. Here are some ways to do this:

    • Singing - Play a song and get them to sing along.
    • Role-play - When students talk about food, you could role-play in a restaurant or talk about likes and dislikes.
    • Guessing games - Students must read the animals' descriptions and guess what they are. You can make up your own descriptions.

    4. Consider task and student density

    To optimize learning time, consider dividing your class into smaller groups and teaching each one individually for part of the timetabled class time. You may find that you get more done in 15 minutes with eight students than you would be able to get done in 60 minutes with 32 students.

    At the same time, you will be able to focus more easily on individual needs (you’ll be able to see all their video thumbnails on the same preview page). If it is not acceptable in your school to do this, divide the class so you’re not trying to teach everyone the same thing simultaneously.

    Having the whole class do a reading or writing activity is a lost opportunity to use this quiet time to give more focused support to smaller groups of learners, so think about setting a reading task for half the class, while you supervise a speaking activity with the other half, and then swap them over.

    Alternatively, set a writing activity for 1/3 of the students, a reading for 1/3 and a speaking activity for the remaining 1/3, and rotate the groups during the class.

    5. Manage your expectations

    Don’t expect to get the same amount of work done in an online class as in the classroom. Once you have waited for everyone to connect, get them to turn on their cameras, etc., you have less time to teach than you would usually have. Add this to the fact that it’s much more complex and time-consuming to give focused support to individual learners in a way that doesn’t interrupt everyone else.

    So, don’t plan the same task density in online classes as you would for face-to-face teaching. Explore flipping some of your activities, so your students arrive better prepared to get to work.

    It’s also much harder to engage students, measure their engagement and verify that they are staying on task online than in the physical classroom. In an online class, measuring engagement and reading reactions is harder. Always clearly explain the objectives and why you have decided on them. Regularly check to see if everyone understands and is able to work productively.

    When you’re all online, you can’t use visual clues to quickly judge whether anyone is having difficulties, like you can in the classroom. Ask direct questions to specific students rather than asking if everyone understands, or is OK. During and at the end of class, check and reinforce the achieved objectives.

  • A smiling little girl on a laptop with headphones on

    Tips for setting up an optimized online classroom

    Por app Languages

    Technology and the learning space

    How a physical classroom is organized, decorated and laid out impacts how your students feel, interact and learn. It’s just as important to think about how your virtual teaching space functions and what it looks like, as it will greatly affect your students’ learning experience.

    Classrooms are usually full of posters, examples of students’ work and other decorations. Just because you’re teaching online doesn’t mean your environment needs to look dull.

    Take some time to think about your virtual teaching space. Picture it in your head. What’s behind you? What’s on either side? Is there an echo? Is it light or dark? How far away are you from the camera?

    Online classroom setup dos and don’ts

    While teaching online isn’t always that different from teaching face-to-face, there are quite a few things you might not have considered before. Here are some of my top dos and don’ts to help:

    Lighting

    • Don’t sit in front of a window or other source of light; otherwise, your face will be in shadow and hard to see. If you have no option, close the curtains and use an artificial light source to illuminate your face.
    • Do reflect lighting off a wall or ceiling, so it hits your face indirectly. This creates a much more pleasing image. If possible, sit in front of any windows or to the side of them so that the light hits your face directly or from the side. If the room is naturally dark, reflect a couple of lamps off the wall in front of you or the ceiling.

    Audio

    • Do invest in a set of headphones with an inline microphone. Even cheap ones will make you easier to understand, and reduce environmental noise interference (traffic, your neighbor’s stereo, etc.).
    • Don’t teach in an empty classroom (if you can avoid it). They are a terrible place to teach online classes from because they suffer from echo, environmental noise, lighting and bandwidth problems.
    • If your teaching space has an echo, try placing pillows or cushions on either side of your screen. They help absorb echoes and make it easier for your students to hear you.

    Video

    • Sit far enough away from the camera so your students can see most of your upper body and arms. If you use a laptop, raise it up on an old shoebox or a couple of books, so that the camera isn’t pointing up your nose!
    • Do invest in the fastest internet connection you can afford (school administrators may want to consider offering subsidies so teachers can upgrade their connection speed). It is vital that you have enough internet bandwidth to stream good-quality audio and video and share materials with your students. Learn how to use your mobile phone data plan to create a wifi hotspot for your computer as a backup.

    Using technology with your students

    Here are some ways to get the most out of technology, build your student’s digital literacy skills and increase motivation:

    Space

    Students should connect from a private space where they are not interrupted by siblings, pets, housekeepers, or parents. The space should be well-lit and have a good Wi-Fi signal.

    Communication

    Just like you, they should use earphones with an inline microphone. Their webcams should be on, not just so you can see them, but so they can see each other. Encourage learners to have fun and personalize their space by changing their backgrounds or using filters.

    Distractions

    Parents and caregivers should be aware of the negative effect of noise and distractions on their children’s learning. It’s important that where possible, they avoid having business meetings in the same room their children are learning in. They should also ask other people in the house to respect the children’s right to enjoy a quiet, private, productive learning environment.

    Resources

    If you and your students are online using some form of computer, tablet, or mobile device to connect to class, make sure to use the resources available to you. Reinforce how to correctly use spell check when writing a document; for example, have your students use their cameras to take photos of their work to share or even their favorite toys.

    Flexibility

    Instead of trying (and often failing!) to get all your students to speak during the class, have them make videos or audio recordings for homework that they send to you or each other for feedback. Alternatively, experiment with breakout rooms, if using a platform that allows this.

    Preparation

    If you want to show a YouTube video during class, send the link to your students to watch for homework before class, or have them watch it during class on their own devices.

    Besides saving your internet bandwidth, they may even be inspired to click on one of the other recommended (usually related) videos alongside the one you want them to watch. It’ll be on their recently watched list if they want to go back and watch it again.

    Collaboration

    If you set group work that involves writing a text or designing a presentation, ask your students to collaborate with a shared Google Doc. You’ll be able to see what they’re doing in real-time and give them feedback. It works like you are walking around the classroom and looking over their shoulders.

    Feedback

    Explore the focused feedback tools your web conferencing platform offers, such as breakout rooms or an individual chat. But also, don’t forget to share relevant information and learning with the whole class. This helps them all benefit from your expertise, just like if they listen to you answering a classmate’s question in the classroom.

    If your students are at home, they can access materials and props they would never have at school. Think about how you could incorporate this into your teaching.

    Materials

    Finally, ensure that the materials you use are suitable for online learning. If you use a book, it should have a fully digital option and a platform available to your students with practice activities, videos, and audio recordings. You should avoid using static pages in favor of dynamic activities, or online documents that allow real-time collaboration.

    Involving parents and caregivers in your online teaching environment

    Create an online learning document for parents explaining how they can create a positive and productive learning environment for their children. Some families may experience significant difficulties and may be unable to implement everything. But it’s still important to explain to them how to optimize the experience if they can.

  • a pair of hands typing at a laptop

    Explaining computerized English testing in plain English

    Por app Languages

    Research has shown that automated scoring can give more reliable and objective results than human examiners when evaluating a person’s mastery of English. This is because an automated scoring system is impartial, unlike humans, who can be influenced by irrelevant factors such as a test taker’s appearance or body language. Additionally, automated scoring treats regional accents equally, unlike human examiners who may favor accents they are more familiar with. Automated scoring also allows individual features of a spoken or written test question response to be analyzed independent of one another, so that a weakness in one area of language does not affect the scoring of other areas.

    was created in response to the demand for a more accurate, objective, secure and relevant test of English. Our automated scoring system is a central feature of the test, and vital to ensuring the delivery of accurate, objective and relevant results – no matter who the test-taker is or where the test is taken.

    Development and validation of the scoring system to ensure accuracy

    PTE Academic’s automated scoring system was developed after extensive research and field testing. A prototype test was developed and administered to a sample of more than 10,000 test takers from 158 different countries, speaking 126 different native languages. This data was collected and used to train the automated scoring engines for both the written and spoken PTE Academic items.

    To do this, multiple trained human markers assess each answer. Those results are used as the training material for machine learning algorithms, similar to those used by systems like Google Search or Apple’s Siri. The model makes initial guesses as to the scores each response should get, then consults the actual scores to see well how it did, adjusts itself in a few directions, then goes through the training set over and over again, adjusting and improving until it arrives at a maximally correct solution – a solution that ideally gets very close to predicting the set of human ratings.

    Once trained up and performing at a high level, this model is used as a marking algorithm, able to score new responses just like human markers would. Correlations between scores given by this system and trained human markers are quite high. The standard error of measurement between app’s system and a human rater is less than that between one human rater and another – in other words, the machine scores are more accurate than those given by a pair of human raters, because much of the bias and unreliability has been squeezed out of them. In general, you can think of a machine scoring system as one that takes the best stuff out of human ratings, then acts like an idealized human marker.

    app conducts scoring validation studies to ensure that the machine scores are consistently comparable to ratings given by skilled human raters. Here, a new set of test-taker responses (never seen by the machine) are scored by both human raters and by the automated scoring system. Research has demonstrated that the automated scoring technology underlying PTE Academic produces scores comparable to those obtained from careful human experts. This means that the automated system “acts” like a human rater when assessing test takers’ language skills, but does so with a machine's precision, consistency and objectivity.

    Scoring speaking responses with app’s Ordinate technology

    The spoken portion of PTE Academic is automatically scored using app’s Ordinate technology. Ordinate technology results from years of research in speech recognition, statistical modeling, linguistics and testing theory. The technology uses a proprietary speech processing system that is specifically designed to analyze and automatically score speech from fluent and second-language English speakers. The Ordinate scoring system collects hundreds of pieces of information from the test takers’ spoken responses in addition to just the words, such as pace, timing and rhythm, as well as the power of their voice, emphasis, intonation and accuracy of pronunciation. It is trained to recognize even somewhat mispronounced words, and quickly evaluates the content, relevance and coherence of the response. In particular, the meaning of the spoken response is evaluated, making it possible for these models to assess whether or not what was said deserves a high score.

    Scoring writing responses with Intelligent Essay Assessor™ (IEA)

    The written portion of PTE Academic is scored using the Intelligent Essay Assessor™ (IEA), an automated scoring tool powered by app’s state-of-the-art Knowledge Analysis Technologies™ (KAT) engine. Based on more than 20 years of research and development, the KAT engine automatically evaluates the meaning of text, such as an essay written by a student in response to a particular prompt. The KAT engine evaluates writing as accurately as skilled human raters using a proprietary application of the mathematical approach known as Latent Semantic Analysis (LSA). LSA evaluates the meaning of language by analyzing large bodies of relevant text and their meanings. Therefore, using LSA, the KAT engine can understand the meaning of text much like a human.

    What aspects of English does PTE Academic assess?