Can computers really mark exams? Benefits of ELT automated assessments

app Languages
Hands typing at a laptop with symbols

Automated assessment, including the use of Artificial Intelligence (AI), is one of the latest education tech solutions. It speeds up exam marking times, removes human biases, and is as accurate and at least as reliable as human examiners. As innovations go, this one is a real game-changer for teachers and students. 

However, it has understandably been met with many questions and sometimes skepticism in the ELT community – can computers really mark speaking and writing exams accurately? 

The answer is a resounding yes. Students from all parts of the world already take AI-graded tests.  aԻ Versanttests – for example – provide unbiased, fair and fast automated scoring for speaking and writing exams – irrespective of where the test takers live, or what their accent or gender is. 

This article will explain the main processes involved in AI automated scoring and make the point that AI technologies are built on the foundations of consistent expert human judgments. So, let’s clear up the confusion around automated scoring and AI and look into how it can help teachers and students alike. 

AI versus traditional automated scoring

First of all, let’s distinguish between traditional automated scoring and AI. When we talk about automated scoring, generally, we mean scoring items that are either multiple-choice or cloze items. You may have to reorder sentences, choose from a drop-down list, insert a missing word- that sort of thing. These question types are designed to test particular skills and automated scoring ensures that they can be marked quickly and accurately every time.

While automatically scored items like these can be used to assess receptive skills such as listening and reading comprehension, they cannot mark the productive skills of writing and speaking. Every student's response in writing and speaking items will be different, so how can computers mark them?

This is where AI comes in. 

We hear a lot about how AI is increasingly being used in areas where there is a need to deal with large amounts of unstructured data, effectively and 100% accurately – like in medical diagnostics, for example. In language testing, AI uses specialized computer software to grade written and oral tests. 

How AI is used to score speaking exams

The first step is to build an acoustic model for each language that can recognize speech and convert it into waveforms and text. While this technology used to be very unusual, most of our smartphones can do this now. 

These acoustic models are then trained to score every single prompt or item on a test. We do this by using human expert raters to score the items first, using double marking. They score hundreds of oral responses for each item, and these ‘Standards’ are then used to train the engine. 

Next, we validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. If this doesn’t happen for any item, we remove it, as it must match the standard set by human markers. We expect a correlation of between .95-.99. That means that tests will be marked between 95-99% exactly the same as human-marked samples. 

This is incredibly high compared to the reliability of human-marked speaking tests. In essence, we use a group of highly expert human raters to train the AI engine, and then their standard is replicated time after time.  

How AI is used to score writing exams

Our AI writing scoring uses a technology called . LSA is a natural language processing technique that can analyze and score writing, based on the meaning behind words – and not just their superficial characteristics. 

Similarly to our speech recognition acoustic models, we first establish a language-specific text recognition model. We feed a large amount of text into the system, and LSA uses artificial intelligence to learn the patterns of how words relate to each other and are used in, for example, the English language. 

Once the language model has been established, we train the engine to score every written item on a test. As in speaking items, we do this by using human expert raters to score the items first, using double marking. They score many hundreds of written responses for each item, and these ‘Standards’ are then used to train the engine. We then validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. 

The benchmark is always the expert human scores. If our AI system doesn’t closely match the scores given by human markers, we remove the item, as it is essential to match the standard set by human markers.

AI’s ability to mark multiple traits 

One of the challenges human markers face in scoring speaking and written items is assessing many traits on a single item. For example, when assessing and scoring speaking, they may need to give separate scores for content, fluency and pronunciation. 

In written responses, markers may need to score a piece of writing for vocabulary, style and grammar. Effectively, they may need to mark every single item at least three times, maybe more. However, once we have trained the AI systems on every trait score in speaking and writing, they can then mark items on any number of traits instantaneously – and without error. 

AI’s lack of bias

A fundamental premise for any test is that no advantage or disadvantage should be given to any candidate. In other words, there should be no positive or negative bias. This can be very difficult to achieve in human-marked speaking and written assessments. In fact, candidates often feel they may have received a different score if someone else had heard them or read their work.

Our AI systems eradicate the issue of bias. This is done by ensuring our speaking and writing AI systems are trained on an extensive range of human accents and writing types. 

We don’t want perfect native-speaking accents or writing styles to train our engines. We use representative non-native samples from across the world. When we initially set up our AI systems for speaking and writing scoring, we trialed our items and trained our engines using millions of student responses. We continue to do this now as new items are developed.

The benefits of AI automated assessment

There is nothing wrong with hand-marking homework tests and exams. In fact, it is essential for teachers to get to know their students and provide personal feedback and advice. However, manually correcting hundreds of tests, daily or weekly, can be repetitive, time-consuming, not always reliable and takes time away from working alongside students in the classroom. The use of AI in formative and summative assessments can increase assessed practice time for students and reduce the marking load for teachers.

Language learning takes time, lots of time to progress to high levels of proficiency. The blended use of AI can:

  • address the increasing importance of formative assessmentto drive personalized learning and diagnostic assessment feedback 

  • allow students to practice and get instant feedback inside and outside of allocated teaching time

  • address the issue of teacher workload

  • create a virtuous combination between humans and machines, taking advantage of what humans do best and what machines do best. 

  • provide fair, fast and unbiased summative assessment scores in high-stakes testing.

We hope this article has answered a few burning questions about how AI is used to assess speaking and writing in our language tests. An interesting quote from Fei-Fei Li, Chief scientist at Google and Stanford Professor describes AI like this:

“I often tell my students not to be misled by the name ‘artificial intelligence’ — there is nothing artificial about it; A.I. is made by humans, intended to behave [like] humans and, ultimately, to impact human lives and human society.”

AI in formative and summative assessments will never replace the role of teachers. AI will support teachers, provide endless opportunities for students to improve, and provide a solution to slow, unreliable and often unfair high-stakes assessments.

Examples of AI assessments in ELT

At app, we have developed a range of assessments using AI technology.

Versant

The Versant tests are a great tool to help establish language proficiency benchmarks in any school, organization or business. They are specifically designed for placement tests to determine the appropriate level for the learner.

PTE Academic

The  is aimed at those who need to prove their level of English for a university place, a job or a visa. It uses AI to score tests and results are available within five days. 

More blogs from app

  • A woman holding a tablet stood in a server room

    The role of AI in English assessment

    By Jennifer Manning

    Digital assessment is becoming more and more widespread in recent years. But what’s the role of digital assessment in teaching today? We’d like to give you some insight into digital assessment and automated scoring.

    Just a few years ago, there may have been doubts about the role of AI in English assessment and the ability of a computer to score language tests accurately. But today, thousands of teachers worldwide use automated language tests to assess their students’ language proficiency.

    For example, app’s suite of Versant tests have been delivering automated language assessments for nearly 25 years. And since its launch in 1996, over 350 million tests have been scored. The same technology is used in app’s Benchmark and Level tests.

    So what makes automated scoring systems so reliable?

    Huge data sets of exam answers and results are used to train artificial intelligence machine learning technology to score English tests the same way that human markers do. This way, we’re not replacing human judgment; we’re just teaching computers to replicate it.

    Of course, computers are much more efficient than humans. They don’t mind monotonous work and don’t make mistakes (the standard marking error of an AI-scored test is lower than that of a human-scored test). So we can get unbiased, accurate, and consistent scores.

    The top benefits of automated scoring are speed, reliability, flexibility, and free from bias.

    Speed

    The main advantage computers have over humans is that they can quickly process complex information. Digital assessments can often provide an instant score turnaround. We can get accurate, reliable results within minutes. And that’s not just for multiple-choice answers but complex responses, too.

    The benefit for teachers and institutions is that they can have hundreds, thousands, or tens of thousands of learners taking a test simultaneously and instantly receive a score.

    The sooner you have scores, the sooner you can make decisions about placement and students’ language level or benchmark a learner’s strengths and weaknesses and make adjustments to learning that drive improvement and progress.

    Flexibility

    The next biggest benefit of digital assessment is flexible delivery models. This has become increasingly more important since online learning has become more prominent.

    Accessibility became key: how can your institution provide access to assessment for your learners, if you can’t deliver tests on school premises?

    The answer is digital assessment.

    For example, Versant, our web-based test can be delivered online or offline, on-site or off-site. All test-takers need is a computer and a headset with a microphone. They can take the test anywhere, any time of day, any day of the week, making it very flexible to fit into someone's schedule or situation.

    Free from bias

    Impartiality is another important benefit of AI-based scoring. The AI engine used to score digital proficiency tests is completely free from bias. It doesn’t get tired, and it doesn’t have good and bad days like human markers do. And it doesn’t have a personality.

    While some human markers are more generous and others are more strict, AI is always equally fair. Thanks to this, automated scoring provides consistent, standardized scores, no matter who’s taking the test.

    If you’re testing students from around the world, with different backgrounds, they will be scored solely on their level of English, in a perfectly objective way.

    Additional benefits of automated scoring are security and cost.

    Security

    Digital assessments are more difficult to monitor than in-person tests, so security is a valid concern. One way to deal with this is remote monitoring.

    Remote proctoring adds an extra layer of security, so test administrators can be confident that learners taking the test from home don’t cheat.

    For example, our software captures a video of test takers, and the AI detection system automatically flags suspicious test-taker behavior. Test administrators can access the video anytime for audits and reviews, and easily find suspicious segments highlighted by our AI.

    Here are a few examples of suspicious behavior that our system might flag:

    Image monitoring:

    • A different face or multiple faces appearing in the frame
    • Camera blocked

    Browser monitoring:

    • Navigating away from the test window or changing tabs multiple times

    Video monitoring:

    • Test taker moving out of camera view
    • More than one person in the camera view
    • Looking away from the camera multiple times

    Cost

    Last but not least, the cost of automated English certifications are a benefit. Indeed, automated scoring can be a more cost-effective way of monitoring tests, primarily because it saves time and resources.

    app English proficiency assessments are highly scalable and don’t require extra time from human scorers, no matter how many test-takers you have.

    Plus, there’s no need to spend time and money on training markers or purchasing equipment.

    AI is helping to lead the way with efficient, accessible, fair and cost-effective English test marking/management. Given time it should develop even further, becoming even more advanced and being of even more help within the world of English language learning and assessments.

  • hands holding a tablet interacting with it

    6 tools for busy HR professionals

    By Jennifer Manning

    More and , giving candidates the opportunity to apply for jobs from anywhere in the country and across the world. In turn, this wider net has enabled HR professionals to bring in giant pools of qualified candidates – and of course, more great hires.

    But with more job applications coming in, HR professionals know they need to work faster and more efficiently. And the right HR tools can help teams save time and standardize hiring across the board – especially when assessing candidates’ English skills or personality traits from afar.

    Need help choosing the best HR software? We’ve got you covered. Here are 6 tools for busy HR professionals – including a number of HR tests for measuring sought-after soft skills:

    1. Versant by app

    How it helps you: Test candidates’ English language abilities with AI

    Need a fair way to test candidates’ English skills? Versant by app is an HR test that uses artificial intelligence (AI) to score language assessments instantly. Made by app, the world’s leading education company, the tool tests candidates’ speaking, listening, reading and writing skills to help HR professionals evaluate how easily someone can handle different workplace tasks – like speaking with customers over the phone or writing clear emails to co-workers.

    Versant by app also provides an Intelligibility Index score, which objectively measures how well someone pronounces words or expresses their thoughts – both things that are important for effective workplace communication, but easily overlooked.

    The test is available 24/7, with no appointment required, in more than 100 countries around the world.

    Learn more about how Versant by app works

    2. Watson-Glaser Critical Thinking Appraisal

    How it helps you: Measure important critical thinking skills

    The Watson-Glaser test is a popular critical thinking assessment. In fact, it’s been around for more than a century, helping organizations and institutions measure the decision-making and rational thinking skills of employees, job applicants, and students alike.

    The Watson-Glaser Critical Thinking Appraisal tool makes it easy to administer the test on a larger scale. The assessment is timed (it takes 30 minutes) and includes a large bank of questions to help make sure no one ends up writing the same test. The scores are also given as a percentile, based around the following three criteria: whether someone can recognize assumptions, evaluate arguments and draw conclusions.

    Overall, it’s a great tool to use with current employees wanting to move up in the organization. But best of all? It can help HR professionals screen out candidates whose critical thinking skills aren’t up to par – and save time interviewing people who might be qualified on paper, but not necessarily in practice.

    3. Golden Personality Profiler

    How it helps you: Assess a candidate’s personality type and how it will affect their behavior at work.

    is one of the most in-depth personality assessments on the market. It allows HR professionals to understand what makes an individual unique. In turn, this leads to greater self-acceptance among employees and the ability to value differences in others—key factors impacting team performance.

    So, how does it work? Powered by Jung’s Theory of Type as well as the Five-Factor Model of personality, Golden identifies the most detailed aspects of an individual’s personality. The program presents findings in a clear and concise report to make it easy to understand.

    Of course, this is all good information to have in mind. But how can personality tests be helpful for HR? Not only does this test help predict how well candidates will perform at work, but it also helps to quickly identify a team’s strengths and resources and its potential weaknesses and blind spots. Furthermore, this tool can help HR professionals hire people who will match, or help shape, the company culture.

    4. Acsendo

    How it helps you: Run assessments and improve employee performance

    For many workplaces, it can be difficult to keep morale up. Many people have reported feeling overwhelmed, isolated and unproductive working from home. , on the other hand, can help HR professionals push employee engagement and measure how everyone’s performing.

    Within the tool, HR teams can run company assessments to measure employee satisfaction and how they view their work environment, among other things.

    It also enables HR to see if workers’ objectives align with company-wide goals, for example, and helps teams create development plans for employees. Even more, Acscendo advertises that their platform only takes a few days for teams to implement.

    5. Odoo

    How it helps you: Manage employees and recruit from one place

    is a pretty popular HR platform; they say they have more than 5 million users worldwide. The tool lets users keep track of things like employee leaves, hours worked, expenses and evaluations all in one place – as well as recruit and manage new job applications, for example.

    We also like that they’re open source and that more than 20,000 developers contribute to it globally.

    6. Raven’s

    How it helps you: Assess the skills needed for leadership positions and reduce bias

    Raven’s is another HR test to assess an employee’s soft skills. But it takes into special account the unique skills needed for leadership or management positions. These skills include abstract reasoning, complex problem-solving, and observation skills, among others.

    HR professionals get a report with the results. It shows how the candidate compares to others in the same role. The test isn’t influenced by language differences, and overall, it gives HR professionals a better understanding of who’s actually best for the job.

  • A young man sat a laptop with his arms up celebrating

    6 ways to get the best results on your Versant English test

    By Jennifer Manning

    Versant tests are popular automatically scored English assessments. They allow test takers to prove their English proficiency and demonstrate that they’re capable of using English at work.

    If you’re applying for a job or trying to get into a school language program, you may be preparing to take a Versant test right now! But how do you make sure you succeed at it?

    Here’s everything you need to know about preparing for your Versant test.

    What types of Versant tests are there?

    There are four different types of English tests in the Versant suite. Each is designed with the purpose of testing English language proficiency. However, they’re slightly different in structure and the skills they test. As a result, they are used by companies or educational institutions with different goals.

    Here are the five types of Versant tests:

    • Versant English Test: a short, 17-minute test that focuses on speaking skills. Companies that primarily use spoken English use this test to assess candidates’ ability to communicate in English. For example, it’s popular with call centers.
    • Versant Writing Test: a 35-minute writing test. It’s the ideal test for companies that use English primarily in writing. It evaluates writing skills through practical exercises like taking notes and writing emails.
    • Versant English Placement Test: a thorough, 50-minute test that evaluates all four skills (speaking, listening, reading, and writing). Academic institutions use this formative assessment to sort students into language programs.
    • Versant 4 Skills Essential: a shorter, 30-minute test that evaluates all four language skills. Companies often use it to find candidates with well-rounded English skills because it helps them fill entry-level positions quickly.
    • Versant Professional English Test: a comprehensive 60-minute test that evaluates all four skills. Companies use this test to baseline skills, measure progress and prove employees’ proficiency, oftentimes at the end of a business English training course.

    Which Versant test should you take?

    Which Versant test you take will depend on what your goals are. Have a look at these examples:

    • Arnaldo wants to study abroad for a year in Australia. He will most likely take the Versant English Placement Test to get into the university program of his choice.
    • Arjun is applying for a job at a call center. His future employers will request that he take the Versant English Test to demonstrate how he communicates in English.
    • Sofia’s aiming to become an email customer support specialist at an international retail firm. She’ll be asked to take the Versant Writing Test to prove her writing skills.
    • Farrah is applying for an internship at a fast-scaling startup. So, she’ll need to take the Versant 4 Skills Essential Test.
    • Last but not least, Samira is currently a mid-level manager at an insurance company and is enrolled in a course to upskill her communication skills. She’ll be asked to prove her English proficiency by taking the Versant Professional English Test.

    Tips for preparing for your Versant test

    No matter which Versant test you’re taking, there are things you can do to prepare. Here are 6 ways to make sure you get the best results:

    1. Work on your intelligibility

    Intelligibility refers to your ability to speak in a way that’s easy to understand for others. Even if you don’t speak flawlessly or have a native-like accent, your speech can still have a high intelligibility level. That is if you are able to express what you mean.

    The Versant English Test has an intelligibility score. The system calculates it based on various speech factors like speed, clarity, pronunciation, and fluency. So, it’s important that you work on your intelligibility before tackling a Versant test.

    Here are two exercises you can do to improve your intelligibility:

    • Record your speech. Recording yourself talking for a minute or so lets you play it back, analyze your speech and identify parts of it that are hard to understand. Maybe you’re mispronouncing some words, talking too fast, or pausing too often. Try to practice talking about the same topic until your speech becomes easier to understand.
    • Practice shadowing. Shadowing is a technique that brings together listening and speaking. Find a video of a proficient public speaker giving a speech on YouTube. Try to say the same words as the speaker at about the same time. Do this for about 30 seconds at a time. This will help you mimic the speaker’s speech, improving your intonation, pronunciation, and fluency.

    If you can, enlist the help of an English teacher to help you work on your weaknesses, or find a friend who is a fluent English speaker and set up regular video chats.

    2. Practice typing on your computer

    Unless you’re taking the Versant English Test, which is a speaking-only test, you’ll be asked to prove your English writing skills. Since Versant tests are most often taken off-site, it’s likely that you’ll be taking it on your own computer at home. That’s why it’s a good idea to practice typing on your computer before your Versant test.

    While Versant will not factor your typing into your English proficiency scores, the Versant Writing Test and Versant English Placement Test do include a separate typing speed and accuracy score. They’re provided as supplemental information for 3 reasons:

    1. Since typing is a familiar task to most candidates, it is a comfortable introduction to the test.
    2. It allows candidates to familiarize themselves with the keyboard.
    3. If typing speed is below 12 words per minute, and/or accuracy is below 90%, then it is likely that this candidate’s written English proficiency was not properly measured due to poor typing skills. The test administrator should take this into account when interpreting test scores.

    Bear in mind that all the exercises you need to complete are timed. So, if you want to make sure that you have enough time to type your answers correctly, it’s good to get a little practice. This way, you’ll be able to focus wholeheartedly on the content and structure of your sentences, not your typing.

    To give you an example, the Versant English Placement Test has a dictation task, where you have to type sentences exactly as you hear them. It also has a passage reconstruction task, where you read a text, put it aside, and type what you remember from it.

    Then, there’s a summary and opinion task where you have to read a passage, summarize the author’s opinion, and give your own. These are all practical exercises that evaluate how well you’d perform in real-life situations at work. For example, taking notes at a meeting, writing emails, or putting together a presentation.

    3. Listen to everyday spoken English

    Another definitive characteristic of Versant is that it tests how well you can understand and use English in an everyday context. It does not test the technical or literary use of the language. So, to get into Versant, it’s a good idea to immerse yourself in some everyday spoken English.

    For example, you can watch videos of someone on YouTube talking directly to their audience in a casual way. Or, you can listen to a podcast that features a laid-back conversation between two people. And, if you can, don’t just listen but also practice talking about everyday topics. Ask a friend or a family member to chat with you in English about simple things like how your day was or what you had for dinner.

    Tips for taking your Versant test

    Preparation is key. But it’s also important to make sure that you take the test the correct way. Since Versant is a flexible test that can be completed online or offline and administered remotely, there are a few tricks to making sure you get the best out of it:

    1. Choose your testing environment well

    Take the test in a quiet room, with no background noise or people talking around you. Make sure that the space doesn’t have an echo. And, turn off your notifications so you won’t be disturbed by incoming phone calls or messages.

    2. Make good-quality recordings

    The best way to do speaking tests is by using a headset with a built-in microphone. Keep the microphone 3-5 cm from your mouth. Try not to touch or move it while answering questions.

    3. Speak in a natural way

    Try to speak at a normal conversational speed and volume. Just the way you would speak if you were talking to another person. Don’t raise your voice or speak too softly. Try not to speak too slowly or rush your answers. And, do not repeat your answers again and again.

    Want to learn more about Versant? check out our postEverything you need to know about the Versant tests.