Can computers really mark exams? Benefits of ELT automated assessments

app Languages
Hands typing at a laptop with symbols

Automated assessment, including the use of Artificial Intelligence (AI), is one of the latest education tech solutions. It speeds up exam marking times, removes human biases, and is as accurate and at least as reliable as human examiners. As innovations go, this one is a real game-changer for teachers and students. 

However, it has understandably been met with many questions and sometimes skepticism in the ELT community – can computers really mark speaking and writing exams accurately? 

The answer is a resounding yes. Students from all parts of the world already take AI-graded tests.  aԻ Versanttests – for example – provide unbiased, fair and fast automated scoring for speaking and writing exams – irrespective of where the test takers live, or what their accent or gender is. 

This article will explain the main processes involved in AI automated scoring and make the point that AI technologies are built on the foundations of consistent expert human judgments. So, let’s clear up the confusion around automated scoring and AI and look into how it can help teachers and students alike. 

AI versus traditional automated scoring

First of all, let’s distinguish between traditional automated scoring and AI. When we talk about automated scoring, generally, we mean scoring items that are either multiple-choice or cloze items. You may have to reorder sentences, choose from a drop-down list, insert a missing word- that sort of thing. These question types are designed to test particular skills and automated scoring ensures that they can be marked quickly and accurately every time.

While automatically scored items like these can be used to assess receptive skills such as listening and reading comprehension, they cannot mark the productive skills of writing and speaking. Every student's response in writing and speaking items will be different, so how can computers mark them?

This is where AI comes in. 

We hear a lot about how AI is increasingly being used in areas where there is a need to deal with large amounts of unstructured data, effectively and 100% accurately – like in medical diagnostics, for example. In language testing, AI uses specialized computer software to grade written and oral tests. 

How AI is used to score speaking exams

The first step is to build an acoustic model for each language that can recognize speech and convert it into waveforms and text. While this technology used to be very unusual, most of our smartphones can do this now. 

These acoustic models are then trained to score every single prompt or item on a test. We do this by using human expert raters to score the items first, using double marking. They score hundreds of oral responses for each item, and these ‘Standards’ are then used to train the engine. 

Next, we validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. If this doesn’t happen for any item, we remove it, as it must match the standard set by human markers. We expect a correlation of between .95-.99. That means that tests will be marked between 95-99% exactly the same as human-marked samples. 

This is incredibly high compared to the reliability of human-marked speaking tests. In essence, we use a group of highly expert human raters to train the AI engine, and then their standard is replicated time after time.  

How AI is used to score writing exams

Our AI writing scoring uses a technology called . LSA is a natural language processing technique that can analyze and score writing, based on the meaning behind words – and not just their superficial characteristics. 

Similarly to our speech recognition acoustic models, we first establish a language-specific text recognition model. We feed a large amount of text into the system, and LSA uses artificial intelligence to learn the patterns of how words relate to each other and are used in, for example, the English language. 

Once the language model has been established, we train the engine to score every written item on a test. As in speaking items, we do this by using human expert raters to score the items first, using double marking. They score many hundreds of written responses for each item, and these ‘Standards’ are then used to train the engine. We then validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. 

The benchmark is always the expert human scores. If our AI system doesn’t closely match the scores given by human markers, we remove the item, as it is essential to match the standard set by human markers.

AI’s ability to mark multiple traits 

One of the challenges human markers face in scoring speaking and written items is assessing many traits on a single item. For example, when assessing and scoring speaking, they may need to give separate scores for content, fluency and pronunciation. 

In written responses, markers may need to score a piece of writing for vocabulary, style and grammar. Effectively, they may need to mark every single item at least three times, maybe more. However, once we have trained the AI systems on every trait score in speaking and writing, they can then mark items on any number of traits instantaneously – and without error. 

AI’s lack of bias

A fundamental premise for any test is that no advantage or disadvantage should be given to any candidate. In other words, there should be no positive or negative bias. This can be very difficult to achieve in human-marked speaking and written assessments. In fact, candidates often feel they may have received a different score if someone else had heard them or read their work.

Our AI systems eradicate the issue of bias. This is done by ensuring our speaking and writing AI systems are trained on an extensive range of human accents and writing types. 

We don’t want perfect native-speaking accents or writing styles to train our engines. We use representative non-native samples from across the world. When we initially set up our AI systems for speaking and writing scoring, we trialed our items and trained our engines using millions of student responses. We continue to do this now as new items are developed.

The benefits of AI automated assessment

There is nothing wrong with hand-marking homework tests and exams. In fact, it is essential for teachers to get to know their students and provide personal feedback and advice. However, manually correcting hundreds of tests, daily or weekly, can be repetitive, time-consuming, not always reliable and takes time away from working alongside students in the classroom. The use of AI in formative and summative assessments can increase assessed practice time for students and reduce the marking load for teachers.

Language learning takes time, lots of time to progress to high levels of proficiency. The blended use of AI can:

  • address the increasing importance of formative assessmentto drive personalized learning and diagnostic assessment feedback 

  • allow students to practice and get instant feedback inside and outside of allocated teaching time

  • address the issue of teacher workload

  • create a virtuous combination between humans and machines, taking advantage of what humans do best and what machines do best. 

  • provide fair, fast and unbiased summative assessment scores in high-stakes testing.

We hope this article has answered a few burning questions about how AI is used to assess speaking and writing in our language tests. An interesting quote from Fei-Fei Li, Chief scientist at Google and Stanford Professor describes AI like this:

“I often tell my students not to be misled by the name ‘artificial intelligence’ — there is nothing artificial about it; A.I. is made by humans, intended to behave [like] humans and, ultimately, to impact human lives and human society.”

AI in formative and summative assessments will never replace the role of teachers. AI will support teachers, provide endless opportunities for students to improve, and provide a solution to slow, unreliable and often unfair high-stakes assessments.

Examples of AI assessments in ELT

At app, we have developed a range of assessments using AI technology.

Versant

The Versant tests are a great tool to help establish language proficiency benchmarks in any school, organization or business. They are specifically designed for placement tests to determine the appropriate level for the learner.

PTE Academic

The  is aimed at those who need to prove their level of English for a university place, a job or a visa. It uses AI to score tests and results are available within five days. 

app English International Certificate (PEIC)

app English International Certificate (PEIC) also uses automated assessment technology. With a two-hour test available on-demand to take at home or at school (or at a secure test center). Using a combination of advanced speech recognition and exam grading technology and the expertise of professional ELT exam markers worldwide, our patented software can measure English language ability.

Read more about the use of AI in our learning and testing here, or if you're wondering which English test is right for your students make sure to check out our post 'Which exam is right for my students?'.

More blogs from app

  • Two friends smiling at a person in front of them

    Exploring common English homophones

    Por
    Reading time: 4 minutes

    Navigating the tricky world of homophones can be challenging, especially for English learners. This guide aims to clarify some of the most common homophones and their meanings, helping you use them correctly in your writing.

    What is a homophone?

    A homophone is a word that is pronounced the same as another word but differs in meaning and often in spelling. Homophones can create confusion in writing since they sound identical, yet their meanings and spellings can vary largely. For instance, "pair" refers to a set of two, while "pear" is a type of fruit. Understanding homophones is essential for mastering both written and spoken English, as misuse can lead to misunderstandings.

  • Children sat down on the floor reading books, with some looking up at their teacher who is sat with a book

    How to improve literacy in the classroom

    Por Katharine Scott
    Reading time: 5 minutes

    Katharine Scott is a teacher trainer and educational materials developer with over 20 years’ experience writing English language textbooks. She’s co-author of the app Primary course - English Code and is based in Spain. Katharine outlines a number of practical ways you can help English language learners develop key literacy skills.

    What is literacy?

    Teachers at all stages of education often complain about their students’ reading skills. The students are literate. In other words, they can interpret the graphemes, or letters on the page, into words. But they struggle to identify the purpose of a text or to analyze it in a meaningful way. We could say that the students have poor literacy skills.

    Literacy is a term used to describe an active, critical form of reading. Some of the skills of a critical reader include:

    Checking new information

    A crucial literacy skill involves discerning whether a text is factually true or not. A critical reader always checks new information against existing knowledge. As we read, we have an internal dialogue: Where does that information come from? That’s impossible because ….

    Separating fact from opinion

    This skill is essential for understanding many different types of texts from newspaper articles to scientific research.

    Understanding the purpose of a text

    All pieces of text have a main purpose. This may be entertainment, in the case of a story or persuasion, in the case of advertising. A critical reader will know how to identify the purpose of the text.

    In the classroom, different types of text require different responses from the students. It’s important, as students grow older, that they know how to read and respond appropriately to a piece of written information.

    Identifying key information in a text

    This is an essential skill for summarizing information or following instructions. It is also important when we transform written information into something else, like a chart.

    In many ways, literacy is the key skill that underpins learning at all stages. This may seem like an exaggeration, but consider the importance of the four skills outlined above.

    Strategies to promote literacy

    Many teachers and parents of early learners instinctively develop literacy skills before the children can even read.

    When we read a story out loud to a child, we often ask questions about the narrative as we turn the pages: What is going to happen next? How do you think …. feels? Why is …?

    These questions set the foundations for literacy.

    Working with a reading text

    Too often, the comprehension questions that teachers ask about a text are mechanical. They ask the student to “lift” the information out of the text.

    A tale of two dragons

    "Once upon a time, there was an island in the sea. One day, people were working in the fields. The sun was shining and there was one cloud in the sky. The cloud was a strange shape and moving towards the island. Soon the cloud was very big. Then a small boy looked up."

    Taken from English Code, Unit 4, p. 62

    Typical comprehension questions based on the text would be:

    • Where were the people working?
    • How many clouds were in the sky?

    These questions do not really reflect on the meaning of the text and do not lead to a critical analysis. While these simple questions are a good checking mechanism, they don’t help develop literacy skills.

    If we want to develop critical readers, we need to incorporate a critical analysis of reading texts into class work through a deep reading comprehension. We can organize the comprehension into three types.

    1. Text level

    Comprehension at “text level” is about exploring the meaning of individual words and phrases in a text. Examples for the text above could be:

    • Find words that show the story is a fairy tale.
    • Underline a sentence about the weather.

    Other text-level activities include:

    • Finding words in the text from a definition
    • Identifying opinions in the text
    • Finding verbs of speech
    • Finding and classifying words or phrases

    2. Between the lines

    Comprehension “between the lines” means speculating and making guesses with the information we already have from the text. This type of literacy activity often involves lots of questions and discussions with the students. You should encourage students to give good reasons for their opinions. An example for the text above could be:

    • What do you think the cloud really is?

    Other “Between the lines” activities include:

    • Discussing how characters in a story feel and why
    • Discussing characters’ motivation
    • Identifying the most important moments in a story
    • Speculating about what is going to happen next
    • Identifying possible events from fantasy events

    Literacy activities are not only based on fiction. We need to help students be critical readers of all sorts of texts. The text below is factual and informative:

    What skills do you need for ice hockey?

    "Ice hockey players should be very good skaters. They always have good balance. They change direction very quickly and they shouldn't fall over. Players should also have fast reactions because the puck moves very quickly."

    Taken from English Code, Level 4, p. 96

    “Between the lines” activities for this text could be:

    • What equipment do you need to play ice hockey?
    • What is the purpose of this piece of text?

    3. Behind the lines

    Comprehension “behind the lines” is about the information we, the readers, already have. Our previous knowledge, our age, our social background and many other aspects change the way we understand and interpret a text.

    An example for the text above could be:

    • What countries do you think are famous for ice hockey?

    Sometimes a lack of socio-cultural knowledge can lead to misunderstanding. Look at the text below.

    Is the relationship between Ms Turner and Jack Roberts formal or informal?

    73 Highlands Road Oxbo, Wisconsin 54552
    April 11th

    Dear Ms. Tamer,
    Some people want to destroy the forest and build an airport. This forest is a habitat for many wolves. If they destroy the forest, the wolves will leave the forest. If the wolves leave the forest, there will be more rabbits. This won't be good for our forest.
    Please build the airport in a different place. Please don't destroy the forest.

    Kind regards, Jack Robers

    Taken from English code, Level 4, unit 5, Writing Lab

    If your students are unaware of the convention of using Dear to start a letter in English, they may not answer this question correctly.

    Other “Behind the lines” literacy activities include:

    • Identifying the type of text
    • Imagining extra information based on the readers’ experiences
    • Using existing knowledge to check a factual account
    • Identifying false information

    Examples:

    • What job do you think Ms Turner has?
    • Do you think Jack lives in a village or a city?
    • Do wolves live in forests?

    Literacy is more than reading

    From the activities above, it’s clear that a literacy scheme develops more than reading skills. As students speculate and give their opinions, they talk and listen to each other.

    A literacy scheme can also develop writing skills. The text analysis gives students a model to follow in their writing. In addition, a literacy scheme works on higher-order thinking skills such as analysis, deduction and summary.

    Developing literacy skills so that students become active, critical readers should be a key part of educational programs at all ages. Literacy activities based on a reading text can be especially useful for the foreign language class.

    With literacy activities, we can encourage students:

    • To use the text as a springboard for communicating ideas and opinions
    • To analyze the text as a model for writing activities
    • To see how language is used in context
    • To explore the meanings of words

    More crucially, we are developing critical readers for the future.

  • A teacher sat in a classroom with a child, sharing crayons with eachother and smiling

    Four ways to keep kindergarten ESL students focused all day

    Por Heath Pulliam
    Reading time: 5 minutes

    Heath Pulliam is an independent education writer with a focus on the language learning space. He’s taught English in South Korea and various subjects in the United States to a variety of ages. He’s also a language learning enthusiast and studies Spanish in his free time.

    Those who have taught children anywhere between the ages of 4 and 8 know that one of the biggest challenges of getting through to them is keeping your presentation style interesting. As someone who taught ESL in South Korea to kindergarteners, there are a few factors that make keeping students engaged a challenge. In countries where students learn English, students often have a heavy courseload and high expectations. As a first-year teacher, I learned a lot about what worked and what didn’t through trial and error. These are four methods that I consistently used to keep my students interested and engaged all day.

    Students are quick to lose focus at such a young age. You’re not speaking their mother tongue and some parts of an ESL curriculum are less than exciting. With young students, you can’t lecture your way through the material all day. Kindergarteners have a small window of focus and it must be capitalized on. The following methods are ones that worked for me and can be modified to cover any topic you’ll run into in an ESL curriculum.