Can computers really mark exams? Benefits of ELT automated assessments

app Languages
Hands typing at a laptop with symbols

Automated assessment, including the use of Artificial Intelligence (AI), is one of the latest education tech solutions. It speeds up exam marking times, removes human biases, and is as accurate and at least as reliable as human examiners. As innovations go, this one is a real game-changer for teachers and students. 

However, it has understandably been met with many questions and sometimes skepticism in the ELT community – can computers really mark speaking and writing exams accurately? 

The answer is a resounding yes. Students from all parts of the world already take AI-graded tests.  aԻ Versanttests – for example – provide unbiased, fair and fast automated scoring for speaking and writing exams – irrespective of where the test takers live, or what their accent or gender is. 

This article will explain the main processes involved in AI automated scoring and make the point that AI technologies are built on the foundations of consistent expert human judgments. So, let’s clear up the confusion around automated scoring and AI and look into how it can help teachers and students alike. 

AI versus traditional automated scoring

First of all, let’s distinguish between traditional automated scoring and AI. When we talk about automated scoring, generally, we mean scoring items that are either multiple-choice or cloze items. You may have to reorder sentences, choose from a drop-down list, insert a missing word- that sort of thing. These question types are designed to test particular skills and automated scoring ensures that they can be marked quickly and accurately every time.

While automatically scored items like these can be used to assess receptive skills such as listening and reading comprehension, they cannot mark the productive skills of writing and speaking. Every student's response in writing and speaking items will be different, so how can computers mark them?

This is where AI comes in. 

We hear a lot about how AI is increasingly being used in areas where there is a need to deal with large amounts of unstructured data, effectively and 100% accurately – like in medical diagnostics, for example. In language testing, AI uses specialized computer software to grade written and oral tests. 

How AI is used to score speaking exams

The first step is to build an acoustic model for each language that can recognize speech and convert it into waveforms and text. While this technology used to be very unusual, most of our smartphones can do this now. 

These acoustic models are then trained to score every single prompt or item on a test. We do this by using human expert raters to score the items first, using double marking. They score hundreds of oral responses for each item, and these ‘Standards’ are then used to train the engine. 

Next, we validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. If this doesn’t happen for any item, we remove it, as it must match the standard set by human markers. We expect a correlation of between .95-.99. That means that tests will be marked between 95-99% exactly the same as human-marked samples. 

This is incredibly high compared to the reliability of human-marked speaking tests. In essence, we use a group of highly expert human raters to train the AI engine, and then their standard is replicated time after time.  

How AI is used to score writing exams

Our AI writing scoring uses a technology called . LSA is a natural language processing technique that can analyze and score writing, based on the meaning behind words – and not just their superficial characteristics. 

Similarly to our speech recognition acoustic models, we first establish a language-specific text recognition model. We feed a large amount of text into the system, and LSA uses artificial intelligence to learn the patterns of how words relate to each other and are used in, for example, the English language. 

Once the language model has been established, we train the engine to score every written item on a test. As in speaking items, we do this by using human expert raters to score the items first, using double marking. They score many hundreds of written responses for each item, and these ‘Standards’ are then used to train the engine. We then validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. 

The benchmark is always the expert human scores. If our AI system doesn’t closely match the scores given by human markers, we remove the item, as it is essential to match the standard set by human markers.

AI’s ability to mark multiple traits 

One of the challenges human markers face in scoring speaking and written items is assessing many traits on a single item. For example, when assessing and scoring speaking, they may need to give separate scores for content, fluency and pronunciation. 

In written responses, markers may need to score a piece of writing for vocabulary, style and grammar. Effectively, they may need to mark every single item at least three times, maybe more. However, once we have trained the AI systems on every trait score in speaking and writing, they can then mark items on any number of traits instantaneously – and without error. 

AI’s lack of bias

A fundamental premise for any test is that no advantage or disadvantage should be given to any candidate. In other words, there should be no positive or negative bias. This can be very difficult to achieve in human-marked speaking and written assessments. In fact, candidates often feel they may have received a different score if someone else had heard them or read their work.

Our AI systems eradicate the issue of bias. This is done by ensuring our speaking and writing AI systems are trained on an extensive range of human accents and writing types. 

We don’t want perfect native-speaking accents or writing styles to train our engines. We use representative non-native samples from across the world. When we initially set up our AI systems for speaking and writing scoring, we trialed our items and trained our engines using millions of student responses. We continue to do this now as new items are developed.

The benefits of AI automated assessment

There is nothing wrong with hand-marking homework tests and exams. In fact, it is essential for teachers to get to know their students and provide personal feedback and advice. However, manually correcting hundreds of tests, daily or weekly, can be repetitive, time-consuming, not always reliable and takes time away from working alongside students in the classroom. The use of AI in formative and summative assessments can increase assessed practice time for students and reduce the marking load for teachers.

Language learning takes time, lots of time to progress to high levels of proficiency. The blended use of AI can:

  • address the increasing importance of formative assessmentto drive personalized learning and diagnostic assessment feedback 

  • allow students to practice and get instant feedback inside and outside of allocated teaching time

  • address the issue of teacher workload

  • create a virtuous combination between humans and machines, taking advantage of what humans do best and what machines do best. 

  • provide fair, fast and unbiased summative assessment scores in high-stakes testing.

We hope this article has answered a few burning questions about how AI is used to assess speaking and writing in our language tests. An interesting quote from Fei-Fei Li, Chief scientist at Google and Stanford Professor describes AI like this:

“I often tell my students not to be misled by the name ‘artificial intelligence’ — there is nothing artificial about it; A.I. is made by humans, intended to behave [like] humans and, ultimately, to impact human lives and human society.”

AI in formative and summative assessments will never replace the role of teachers. AI will support teachers, provide endless opportunities for students to improve, and provide a solution to slow, unreliable and often unfair high-stakes assessments.

Examples of AI assessments in ELT

At app, we have developed a range of assessments using AI technology.

Versant

The Versant tests are a great tool to help establish language proficiency benchmarks in any school, organization or business. They are specifically designed for placement tests to determine the appropriate level for the learner.

PTE Academic

The  is aimed at those who need to prove their level of English for a university place, a job or a visa. It uses AI to score tests and results are available within five days. 

app English International Certificate (PEIC)

app English International Certificate (PEIC) also uses automated assessment technology. With a two-hour test available on-demand to take at home or at school (or at a secure test center). Using a combination of advanced speech recognition and exam grading technology and the expertise of professional ELT exam markers worldwide, our patented software can measure English language ability.

Read more about the use of AI in our learning and testing here, or if you're wondering which English test is right for your students make sure to check out our post 'Which exam is right for my students?'.

More blogs from app

  • Young children in a group smiling and raising their hands

    Keeping students motivated in the lead-up to the holidays

    Por

    As the holiday season approaches, learners often struggle to stay motivated and focused on their studies amidst the festive cheer and distractions. It's easy to get caught up in the excitement of the holidays, but maintaining consistency in language learning is crucial for making progress. To help you stay on track during this joyful yet potentially distracting time, here are some effective strategies and tips to keep things going.

  • A man and child smiling at eachother and dressed up warm at a winter market where its snowing

    Celebrating Nikolaustag: Exploring the German language

    Por

    As December approaches, people around the world prepare for the festive season as the chilly winds of winter set in. Amidst the various traditions and celebrations, one particular festivity is Nikolaustag. This day is dedicated to Saint Nicholas and is predominantly celebrated in German-speaking regions.

    Nikolaustag, celebrated on 6 December, in ode to Saint Nicholas, a Bishop in Myra in the 4th century. He was known for his kindness and generosity.

    In Germany and neighboring countries this day is celebrated with various customs. Children clean and polish their shoes or place them outside their doors, hoping to receive gifts and treats from Saint Nicholas. Adults, on the other hand, enjoy festive markets filled with seasonal delights.

    This day is a reminder of the importance of kindness, compassion and generosity towards others, especially those who are less fortunate. It is a time to come together with family and friends, exchange gifts and enjoy the warmth and joy of the holiday season.

    German on the global stage

    The German language, celebrated for its precision and rich literary heritage, holds a significant place in the global linguistic landscape beyond the festivities of Nikolaustag.

    German is an official language in Germany, Austria, Switzerland, Luxembourg, Liechtenstein and certain communities worldwide due to historical migrations and cultural exchanges.

    In recent years there has been a noticeable surge in the popularity of learning German worldwide. In 2020 it was reported that were learning German.

    The importance of the language in various sectors, including technology, science and commerce, has contributed to its popularity. Germany provides abundant opportunities for German language exchanges through institutions such as the and .

    German has significantly impacted intellectual debates and discussions worldwide, spanning various fields such as literature, philosophy, music and science. The works of great writers like and , influential artists like and , and the philosophies of and are some examples of the profound influence of German culture.

    German language and culture have played a significant role in shaping scientific research and development. Many renowned scientists, such as and , have made notable contributions in their respective fields. German has also been a prominent language in academia, with numerous universities worldwide offering German language courses and conducting research in various fields.

    The undeniable impact of German culture on the world continues to inspire and influence various aspects of modern life.

    Global Scale of Languages announcement

    Learning languages such as German not only provides personal and professional growth opportunities but also promotes cross-cultural understanding and respect.

    And if you needed another reason to pick up German, the Global Scale of Languages (GSL) has added German to its list of languages. This gives German-language educators and learners a highly detailed level of support to fast-track their progress on their journey to fluency in German.

    The GSL uses the same proven learning design principles for German as it does for its other languages (English, French, Italian and Spanish), giving you world-class support.

  • Young children stood in a row clapping and celebrating with a christmas tree in the background

    Classroom tips: 12 days of Christmas

    Por Iram Ahmed

    With the holiday season approaching, it’s good to add some fun into teaching to keep your students engaged and motivated. We’ve created 12 simple classroom activities and tips that you can carry out with your primary class to encourage them to be good.