Can computers really mark exams? Benefits of ELT automated assessments

app Languages
Hands typing at a laptop with symbols

Automated assessment, including the use of Artificial Intelligence (AI), is one of the latest education tech solutions. It speeds up exam marking times, removes human biases, and is as accurate and at least as reliable as human examiners. As innovations go, this one is a real game-changer for teachers and students. 

However, it has understandably been met with many questions and sometimes skepticism in the ELT community – can computers really mark speaking and writing exams accurately? 

The answer is a resounding yes. Students from all parts of the world already take AI-graded tests.  aԻ Versanttests – for example – provide unbiased, fair and fast automated scoring for speaking and writing exams – irrespective of where the test takers live, or what their accent or gender is. 

This article will explain the main processes involved in AI automated scoring and make the point that AI technologies are built on the foundations of consistent expert human judgments. So, let’s clear up the confusion around automated scoring and AI and look into how it can help teachers and students alike. 

AI versus traditional automated scoring

First of all, let’s distinguish between traditional automated scoring and AI. When we talk about automated scoring, generally, we mean scoring items that are either multiple-choice or cloze items. You may have to reorder sentences, choose from a drop-down list, insert a missing word- that sort of thing. These question types are designed to test particular skills and automated scoring ensures that they can be marked quickly and accurately every time.

While automatically scored items like these can be used to assess receptive skills such as listening and reading comprehension, they cannot mark the productive skills of writing and speaking. Every student's response in writing and speaking items will be different, so how can computers mark them?

This is where AI comes in. 

We hear a lot about how AI is increasingly being used in areas where there is a need to deal with large amounts of unstructured data, effectively and 100% accurately – like in medical diagnostics, for example. In language testing, AI uses specialized computer software to grade written and oral tests. 

How AI is used to score speaking exams

The first step is to build an acoustic model for each language that can recognize speech and convert it into waveforms and text. While this technology used to be very unusual, most of our smartphones can do this now. 

These acoustic models are then trained to score every single prompt or item on a test. We do this by using human expert raters to score the items first, using double marking. They score hundreds of oral responses for each item, and these ‘Standards’ are then used to train the engine. 

Next, we validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. If this doesn’t happen for any item, we remove it, as it must match the standard set by human markers. We expect a correlation of between .95-.99. That means that tests will be marked between 95-99% exactly the same as human-marked samples. 

This is incredibly high compared to the reliability of human-marked speaking tests. In essence, we use a group of highly expert human raters to train the AI engine, and then their standard is replicated time after time.  

How AI is used to score writing exams

Our AI writing scoring uses a technology called . LSA is a natural language processing technique that can analyze and score writing, based on the meaning behind words – and not just their superficial characteristics. 

Similarly to our speech recognition acoustic models, we first establish a language-specific text recognition model. We feed a large amount of text into the system, and LSA uses artificial intelligence to learn the patterns of how words relate to each other and are used in, for example, the English language. 

Once the language model has been established, we train the engine to score every written item on a test. As in speaking items, we do this by using human expert raters to score the items first, using double marking. They score many hundreds of written responses for each item, and these ‘Standards’ are then used to train the engine. We then validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. 

The benchmark is always the expert human scores. If our AI system doesn’t closely match the scores given by human markers, we remove the item, as it is essential to match the standard set by human markers.

AI’s ability to mark multiple traits 

One of the challenges human markers face in scoring speaking and written items is assessing many traits on a single item. For example, when assessing and scoring speaking, they may need to give separate scores for content, fluency and pronunciation. 

In written responses, markers may need to score a piece of writing for vocabulary, style and grammar. Effectively, they may need to mark every single item at least three times, maybe more. However, once we have trained the AI systems on every trait score in speaking and writing, they can then mark items on any number of traits instantaneously – and without error. 

AI’s lack of bias

A fundamental premise for any test is that no advantage or disadvantage should be given to any candidate. In other words, there should be no positive or negative bias. This can be very difficult to achieve in human-marked speaking and written assessments. In fact, candidates often feel they may have received a different score if someone else had heard them or read their work.

Our AI systems eradicate the issue of bias. This is done by ensuring our speaking and writing AI systems are trained on an extensive range of human accents and writing types. 

We don’t want perfect native-speaking accents or writing styles to train our engines. We use representative non-native samples from across the world. When we initially set up our AI systems for speaking and writing scoring, we trialed our items and trained our engines using millions of student responses. We continue to do this now as new items are developed.

The benefits of AI automated assessment

There is nothing wrong with hand-marking homework tests and exams. In fact, it is essential for teachers to get to know their students and provide personal feedback and advice. However, manually correcting hundreds of tests, daily or weekly, can be repetitive, time-consuming, not always reliable and takes time away from working alongside students in the classroom. The use of AI in formative and summative assessments can increase assessed practice time for students and reduce the marking load for teachers.

Language learning takes time, lots of time to progress to high levels of proficiency. The blended use of AI can:

  • address the increasing importance of formative assessmentto drive personalized learning and diagnostic assessment feedback 

  • allow students to practice and get instant feedback inside and outside of allocated teaching time

  • address the issue of teacher workload

  • create a virtuous combination between humans and machines, taking advantage of what humans do best and what machines do best. 

  • provide fair, fast and unbiased summative assessment scores in high-stakes testing.

We hope this article has answered a few burning questions about how AI is used to assess speaking and writing in our language tests. An interesting quote from Fei-Fei Li, Chief scientist at Google and Stanford Professor describes AI like this:

“I often tell my students not to be misled by the name ‘artificial intelligence’ — there is nothing artificial about it; A.I. is made by humans, intended to behave [like] humans and, ultimately, to impact human lives and human society.”

AI in formative and summative assessments will never replace the role of teachers. AI will support teachers, provide endless opportunities for students to improve, and provide a solution to slow, unreliable and often unfair high-stakes assessments.

Examples of AI assessments in ELT

At app, we have developed a range of assessments using AI technology.

Versant

The Versant tests are a great tool to help establish language proficiency benchmarks in any school, organization or business. They are specifically designed for placement tests to determine the appropriate level for the learner.

PTE Academic

The  is aimed at those who need to prove their level of English for a university place, a job or a visa. It uses AI to score tests and results are available within five days. 

More blogs from app

  • Children sat down on the floor reading books, with some looking up at their teacher who is sat with a book

    How to improve literacy in the classroom

    By Katharine Scott
    Reading time: 5 minutes

    Katharine Scott is a teacher trainer and educational materials developer with over 20 years’ experience writing English language textbooks. She’s co-author of the app Primary course - English Code and is based in Spain. Katharine outlines a number of practical ways you can help English language learners develop key literacy skills.

    What is literacy?

    Teachers at all stages of education often complain about their students’ reading skills. The students are literate. In other words, they can interpret the graphemes, or letters on the page, into words. But they struggle to identify the purpose of a text or to analyze it in a meaningful way. We could say that the students have poor literacy skills.

    Literacy is a term used to describe an active, critical form of reading. Some of the skills of a critical reader include:

    Checking new information

    A crucial literacy skill involves discerning whether a text is factually true or not. A critical reader always checks new information against existing knowledge. As we read, we have an internal dialogue: Where does that information come from? That’s impossible because ….

    Separating fact from opinion

    This skill is essential for understanding many different types of texts from newspaper articles to scientific research.

    Understanding the purpose of a text

    All pieces of text have a main purpose. This may be entertainment, in the case of a story or persuasion, in the case of advertising. A critical reader will know how to identify the purpose of the text.

    In the classroom, different types of text require different responses from the students. It’s important, as students grow older, that they know how to read and respond appropriately to a piece of written information.

    Identifying key information in a text

    This is an essential skill for summarizing information or following instructions. It is also important when we transform written information into something else, like a chart.

    In many ways, literacy is the key skill that underpins learning at all stages. This may seem like an exaggeration, but consider the importance of the four skills outlined above.

    Strategies to promote literacy

    Many teachers and parents of early learners instinctively develop literacy skills before the children can even read.

    When we read a story out loud to a child, we often ask questions about the narrative as we turn the pages: What is going to happen next? How do you think …. feels? Why is …?

    These questions set the foundations for literacy.

    Working with a reading text

    Too often, the comprehension questions that teachers ask about a text are mechanical. They ask the student to “lift” the information out of the text.

    A tale of two dragons

    "Once upon a time, there was an island in the sea. One day, people were working in the fields. The sun was shining and there was one cloud in the sky. The cloud was a strange shape and moving towards the island. Soon the cloud was very big. Then a small boy looked up."

    Taken from English Code, Unit 4, p. 62

    Typical comprehension questions based on the text would be:

    • Where were the people working?
    • How many clouds were in the sky?

    These questions do not really reflect on the meaning of the text and do not lead to a critical analysis. While these simple questions are a good checking mechanism, they don’t help develop literacy skills.

    If we want to develop critical readers, we need to incorporate a critical analysis of reading texts into class work through a deep reading comprehension. We can organize the comprehension into three types.

    1. Text level

    Comprehension at “text level” is about exploring the meaning of individual words and phrases in a text. Examples for the text above could be:

    • Find words that show the story is a fairy tale.
    • Underline a sentence about the weather.

    Other text-level activities include:

    • Finding words in the text from a definition
    • Identifying opinions in the text
    • Finding verbs of speech
    • Finding and classifying words or phrases

    2. Between the lines

    Comprehension “between the lines” means speculating and making guesses with the information we already have from the text. This type of literacy activity often involves lots of questions and discussions with the students. You should encourage students to give good reasons for their opinions. An example for the text above could be:

    • What do you think the cloud really is?

    Other “Between the lines” activities include:

    • Discussing how characters in a story feel and why
    • Discussing characters’ motivation
    • Identifying the most important moments in a story
    • Speculating about what is going to happen next
    • Identifying possible events from fantasy events

    Literacy activities are not only based on fiction. We need to help students be critical readers of all sorts of texts. The text below is factual and informative:

    What skills do you need for ice hockey?

    "Ice hockey players should be very good skaters. They always have good balance. They change direction very quickly and they shouldn't fall over. Players should also have fast reactions because the puck moves very quickly."

    Taken from English Code, Level 4, p. 96

    “Between the lines” activities for this text could be:

    • What equipment do you need to play ice hockey?
    • What is the purpose of this piece of text?

    3. Behind the lines

    Comprehension “behind the lines” is about the information we, the readers, already have. Our previous knowledge, our age, our social background and many other aspects change the way we understand and interpret a text.

    An example for the text above could be:

    • What countries do you think are famous for ice hockey?

    Sometimes a lack of socio-cultural knowledge can lead to misunderstanding. Look at the text below.

    Is the relationship between Ms Turner and Jack Roberts formal or informal?

    73 Highlands Road Oxbo, Wisconsin 54552
    April 11th

    Dear Ms. Tamer,
    Some people want to destroy the forest and build an airport. This forest is a habitat for many wolves. If they destroy the forest, the wolves will leave the forest. If the wolves leave the forest, there will be more rabbits. This won't be good for our forest.
    Please build the airport in a different place. Please don't destroy the forest.

    Kind regards, Jack Robers

    Taken from English code, Level 4, unit 5, Writing Lab

    If your students are unaware of the convention of using Dear to start a letter in English, they may not answer this question correctly.

    Other “Behind the lines” literacy activities include:

    • Identifying the type of text
    • Imagining extra information based on the readers’ experiences
    • Using existing knowledge to check a factual account
    • Identifying false information

    Examples:

    • What job do you think Ms Turner has?
    • Do you think Jack lives in a village or a city?
    • Do wolves live in forests?

    Literacy is more than reading

    From the activities above, it’s clear that a literacy scheme develops more than reading skills. As students speculate and give their opinions, they talk and listen to each other.

    A literacy scheme can also develop writing skills. The text analysis gives students a model to follow in their writing. In addition, a literacy scheme works on higher-order thinking skills such as analysis, deduction and summary.

    Developing literacy skills so that students become active, critical readers should be a key part of educational programs at all ages. Literacy activities based on a reading text can be especially useful for the foreign language class.

    With literacy activities, we can encourage students:

    • To use the text as a springboard for communicating ideas and opinions
    • To analyze the text as a model for writing activities
    • To see how language is used in context
    • To explore the meanings of words

    More crucially, we are developing critical readers for the future.

  • Two Young children high fiving one another

    The importance of teaching values to young learners

    By Katharine Scott

    Values in education

    The long years children spend at school are not only about acquiring key knowledge and skills. At school, children also learn to work together, share, exchange opinions, disagree, choose fairly, and so on. We could call these abilities social skills as they help children live and flourish in a wider community than their family circle.

    Social skills are not necessarily the same as social values. Children acquire social skills from all kinds of settings. The tools they use to resolve problems will often come from examples. In the playground, children observe each other and notice behavior. They realize what is acceptable to the other children and which strategies are successful. Some of the things they observe will not reflect healthy social values.

    Part of a school’s mission is to help children learn social skills firmly based on a shared set of values. Many schools recognize this and have a program for education in values.

    What values are we talking about?

    Labeling is always tricky when dealing with an abstract concept such as social values. General ideas include:

    • living in a community, collaborating together
    • respecting others in all of human diversity
    • caring for the environment and the surroundings
    • having a sense of self-worth.

    At the root of these values are ethical considerations. While it may seem that primary education is too early for ethics, children from a very young age do have a sense of fairness and a sense of honesty. This doesn’t mean that children never lie or behave unfairly. Of course they do! But from about three years old, children know that this behavior is not correct, and they complain when they come across it in others.

    In the school context, social values are too often reduced to a set of school rules and regulations. Typical examples are:

    • 'Don't be late!'
    • 'Wait your turn!'
    • 'Pick up your rubbish!'
    • 'Don't invent unkind nicknames'.

    While all these statements reflect important social values, if we don’t discuss them with the children, the reasoning behind each statement gets lost. They become boring school rules. And we all know that it can be fun to break school rules if you can get away with it. These regulations are not enough to represent an education in values.

    School strategies

    At a school level, successful programs often focus on a specific area of a values syllabus. These programs involve all members of a school community: students, teachers, parents, and administrative staff.

    Here are some examples of school programs:

    Caring for the environment

    Interest in ecology and climate change has led many schools to implement programs focused on respect for the environment and other ecological issues. Suitable activities could include:

    • a system of recycling
    • a vegetable garden
    • initiatives for transforming to renewable energy
    • a second-hand bookstore.

    Anti-bullying programs

    As,many schools have anti-bullying policies to deal with bullying incidents. However, the most effective programs also have training sessions for teachers and a continuous program for the children to help them identify bullying behavior. Activities include:

    • empathy activities to understand different points of view
    • activities to develop peer responsibility about bullying
    • activities aimed at increasing children’s sense of self-worth.

    Anti-racism programs

    Combating negative racial stereotypes has, until recently, relied mainly on individual teacher initiatives. However, as racial stereotypes are constructed in society, it would be useful to have a school-wide program. This could include:

    • materials focusing on the achievements of ethnic minorities
    • school talks from members of ethnic minority communities
    • empathy activities to understand the difficulties of marginalized groups.
    • study of the culture and history of ethnic minorities.

    As children learn from observed behavior, it’s important that everyone in the school community acts consistently with the values in the program.