Can computers really mark exams? Benefits of ELT automated assessments

app Languages
Hands typing at a laptop with symbols

Automated assessment, including the use of Artificial Intelligence (AI), is one of the latest education tech solutions. It speeds up exam marking times, removes human biases, and is as accurate and at least as reliable as human examiners. As innovations go, this one is a real game-changer for teachers and students. 

However, it has understandably been met with many questions and sometimes skepticism in the ELT community – can computers really mark speaking and writing exams accurately? 

The answer is a resounding yes. Students from all parts of the world already take AI-graded tests.  aԻ Versanttests – for example – provide unbiased, fair and fast automated scoring for speaking and writing exams – irrespective of where the test takers live, or what their accent or gender is. 

This article will explain the main processes involved in AI automated scoring and make the point that AI technologies are built on the foundations of consistent expert human judgments. So, let’s clear up the confusion around automated scoring and AI and look into how it can help teachers and students alike. 

AI versus traditional automated scoring

First of all, let’s distinguish between traditional automated scoring and AI. When we talk about automated scoring, generally, we mean scoring items that are either multiple-choice or cloze items. You may have to reorder sentences, choose from a drop-down list, insert a missing word- that sort of thing. These question types are designed to test particular skills and automated scoring ensures that they can be marked quickly and accurately every time.

While automatically scored items like these can be used to assess receptive skills such as listening and reading comprehension, they cannot mark the productive skills of writing and speaking. Every student's response in writing and speaking items will be different, so how can computers mark them?

This is where AI comes in. 

We hear a lot about how AI is increasingly being used in areas where there is a need to deal with large amounts of unstructured data, effectively and 100% accurately – like in medical diagnostics, for example. In language testing, AI uses specialized computer software to grade written and oral tests. 

How AI is used to score speaking exams

The first step is to build an acoustic model for each language that can recognize speech and convert it into waveforms and text. While this technology used to be very unusual, most of our smartphones can do this now. 

These acoustic models are then trained to score every single prompt or item on a test. We do this by using human expert raters to score the items first, using double marking. They score hundreds of oral responses for each item, and these ‘Standards’ are then used to train the engine. 

Next, we validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. If this doesn’t happen for any item, we remove it, as it must match the standard set by human markers. We expect a correlation of between .95-.99. That means that tests will be marked between 95-99% exactly the same as human-marked samples. 

This is incredibly high compared to the reliability of human-marked speaking tests. In essence, we use a group of highly expert human raters to train the AI engine, and then their standard is replicated time after time.  

How AI is used to score writing exams

Our AI writing scoring uses a technology called . LSA is a natural language processing technique that can analyze and score writing, based on the meaning behind words – and not just their superficial characteristics. 

Similarly to our speech recognition acoustic models, we first establish a language-specific text recognition model. We feed a large amount of text into the system, and LSA uses artificial intelligence to learn the patterns of how words relate to each other and are used in, for example, the English language. 

Once the language model has been established, we train the engine to score every written item on a test. As in speaking items, we do this by using human expert raters to score the items first, using double marking. They score many hundreds of written responses for each item, and these ‘Standards’ are then used to train the engine. We then validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. 

The benchmark is always the expert human scores. If our AI system doesn’t closely match the scores given by human markers, we remove the item, as it is essential to match the standard set by human markers.

AI’s ability to mark multiple traits 

One of the challenges human markers face in scoring speaking and written items is assessing many traits on a single item. For example, when assessing and scoring speaking, they may need to give separate scores for content, fluency and pronunciation. 

In written responses, markers may need to score a piece of writing for vocabulary, style and grammar. Effectively, they may need to mark every single item at least three times, maybe more. However, once we have trained the AI systems on every trait score in speaking and writing, they can then mark items on any number of traits instantaneously – and without error. 

AI’s lack of bias

A fundamental premise for any test is that no advantage or disadvantage should be given to any candidate. In other words, there should be no positive or negative bias. This can be very difficult to achieve in human-marked speaking and written assessments. In fact, candidates often feel they may have received a different score if someone else had heard them or read their work.

Our AI systems eradicate the issue of bias. This is done by ensuring our speaking and writing AI systems are trained on an extensive range of human accents and writing types. 

We don’t want perfect native-speaking accents or writing styles to train our engines. We use representative non-native samples from across the world. When we initially set up our AI systems for speaking and writing scoring, we trialed our items and trained our engines using millions of student responses. We continue to do this now as new items are developed.

The benefits of AI automated assessment

There is nothing wrong with hand-marking homework tests and exams. In fact, it is essential for teachers to get to know their students and provide personal feedback and advice. However, manually correcting hundreds of tests, daily or weekly, can be repetitive, time-consuming, not always reliable and takes time away from working alongside students in the classroom. The use of AI in formative and summative assessments can increase assessed practice time for students and reduce the marking load for teachers.

Language learning takes time, lots of time to progress to high levels of proficiency. The blended use of AI can:

  • address the increasing importance of formative assessmentto drive personalized learning and diagnostic assessment feedback 

  • allow students to practice and get instant feedback inside and outside of allocated teaching time

  • address the issue of teacher workload

  • create a virtuous combination between humans and machines, taking advantage of what humans do best and what machines do best. 

  • provide fair, fast and unbiased summative assessment scores in high-stakes testing.

We hope this article has answered a few burning questions about how AI is used to assess speaking and writing in our language tests. An interesting quote from Fei-Fei Li, Chief scientist at Google and Stanford Professor describes AI like this:

“I often tell my students not to be misled by the name ‘artificial intelligence’ — there is nothing artificial about it; A.I. is made by humans, intended to behave [like] humans and, ultimately, to impact human lives and human society.”

AI in formative and summative assessments will never replace the role of teachers. AI will support teachers, provide endless opportunities for students to improve, and provide a solution to slow, unreliable and often unfair high-stakes assessments.

Examples of AI assessments in ELT

At app, we have developed a range of assessments using AI technology.

Versant

The Versant tests are a great tool to help establish language proficiency benchmarks in any school, organization or business. They are specifically designed for placement tests to determine the appropriate level for the learner.

PTE Academic

The  is aimed at those who need to prove their level of English for a university place, a job or a visa. It uses AI to score tests and results are available within five days. 

More blogs from app

  • A group of students in a classroom sat at their desks, smiling and looking towards their teacher at the front

    How can we encourage English learners to feel self-confident?

    By

    Encouraging learners to feel more confident in the classroom is a problem often faced by teachers. Below are five simple things you can adopt in your classroom to encourage learners to feel self-confident.

    The small things

    Let’s start with the physical comfort of our students. Having the room adequately heated or cooled, asking if they would like the window open, making sure everyone has had some water or checking to see if anyone needs to go to the bathroom or wash their hands only takes a minute at the beginning of the lesson. It helps our children to know that their welfare is our concern.

    Then, make sure that everyone has their books and praise them for being organized or having their pencils sharpened and ready. These things seem trivial, but they count. They count because we are acknowledging the fact that it isn’t always easy to get up and ready for school every morning, day after day and that just managing that well is an achievement.

    So, starting by checking the small things helps to give our students a feeling of well-being before the lesson has even begun.

    Clarity and familiarity

    Be clear. Be clear about what you are all going to do and why you are going to do it. There is no such thing as ‘the obvious’ when it comes to learning. For example, you know that English is spoken internationally, but primary-aged students may have no concept of what ‘internationally’ means.

    They may never have considered the concept of language itself. So, we must state the ‘obvious’ and do it in ways which are meaningful to the children, through videos, pictures and relatable examples. This goes for everything; what a verb is, how we form negative statements, what question marks indicate and what today’s lesson aims are.

    Whatever they need to know, we need to state it clearly and when they have forgotten, we tell them again without ever making them feel that they ‘should’ have remembered. They forget – we remind. That’s our job.

    Then there is the familiarity of a routine. Apart from making us feel reassured that we know what is happening, routines also feed into the innate need for repetition. Young children want their favorite bedtime stories told to them exactly the same way each night and will pop their heads up to correct us if we do something differently. That repetition is part of practice; doing, saying or hearing something repeatedly until we are completely sure we know it.

    Most teachers don’t need reminding of this, but it might be helpful to remember that within that routine, one can also have surprises.

    A five-minute ‘something different’ slot could be built into your routine. This could be a fun quiz, game or song and dance. A straightforward way of managing this is to write the names of different ‘surprise activities’ on pieces of card, put them into a pot and let a different student pick a card each day.

    Room to maneuver

    We all feel more confident if we know that we are free to experiment and, within that experimentation, to make mistakes. It can’t be stated often enough that we will only ever learn something by doing it wrong, often many times, before we do it right.

    This message may be even more important nowadays when we see and hear perfect versions of whatever has been created - music, cookery and writing to name but a few - especially on social media.

    The learning process is not brought to our attention as often as the result, and the results are often digitally altered to look more impressive. We need to remind our children of this and make them feel good about their efforts, however small and halting.

    Peer pressure often contributes to a lack of self-confidence; you only need one mocking ‘friend’ to put you off. So, we must be vigilant in noticing little glances or whispered asides and praise the majority of the students who are quietly accepting or encouraging.

    Space to flourish

    Finally, confidence in our language learning abilities will soar when we know we can make the language our own and use it however we want.

    This goes beyond personalizing activities, which can be done at any level ("What’s your favorite food?" "Do you like tomatoes?") and is dependent on the teacher noticing and accepting what individual children are really interested in. So, for example, if we continue with the example of food, a sporty child might be interested in what famous sports people have for breakfast or which foods give us stamina.

    A child who is interested in nature might want to know what birds and animals eat. For this to happen, first we need to notice their interests, show enthusiasm for what they are finding out and encourage them to share what they have learnt with the class.

  • A girl holding a pile of books smiling in a room with large sheves of books.

    How to bring Shakespeare to life in the classroom

    By Anna Roslaniec

    The 23rd of April marks the birth (and death) of William Shakespeare: poet, playwright and pre-eminent dramatist. His poems and plays have been translated into 80 languages, even Esperanto and Klingon.

    It is remarkable how Shakespeare’s iconic body of work has withstood the test of time. More than four centuries on, his reflections on the human condition have lost none of their relevance. Contemporary artists and writers continue to draw on his language, imagery and drama for inspiration.

    But, despite the breadth and longevity of his appeal, getting students excited about Shakespeare is not always straightforward. The language is challenging, the characters may be unfamiliar and the plots can seem far removed from modern life.

    However, with the right methods and resources, there is plenty for teenagers and young adults to engage with. After all, love, desperation, jealousy and anger are feelings we can all relate to, regardless of the age group, culture or century we belong to!
    So, how can you bring classic Shakespearean dramas like Hamlet, Othello and Macbeth to life?

    There are many ways for your learners to connect with Shakespeare and get excited by his works. Here we’ll show you three classroom activities to do with your students and some indispensable resources to ensure that reading Shakespeare is as accessible and enjoyable as possible!

  • A young girl meditating outside in a green space

    Does mindfulness really work? Can it help your students?

    By Amy Malloy

    What is mindfulness?

    The term mindfulness refers to a state of awareness. This is arrived at by paying conscious attention to the present moment and observing it without judgment, with curiosity and compassion.

    It is often confused with meditation, but really they’re not the same thing at all. Meditating and focusing on the breath is just one of the ways we can consciously pay attention and become more aware of ourselves and the present moment.

    You might be conscious that mindfulness has over the last decade. As with anything trendy, it can be easy to build preconceptions and dismiss it before trying it yourself. So let’s break it down together and start with the basics.

    Why is mindfulness important?

    Have you ever been driving somewhere in the car and noticed that you’ve arrived at your destination without really noticing the journey at all? All your thoughts on the way were elsewhere.

    This is called being on automatic pilot. It’s a symptom of our mind and body’s brilliant way of turning our everyday processes into a routine. It means we don’t need to think about it every time we need our body to move, speak or function.

    Just as the scenery can pass us by on a journey, so too can our thoughts and reactions to the things happening around us. They happen in our minds and bodies without us noticing. Our conscious mind is focused on something in the future, the past, or in our imaginations instead.

    Being on automatic pilot is often very helpful. But it also comes with a significant downside. Without us even realizing, negative thought cycles can build up under the surface. They can make us feel stressed and anxious.

    When this happens our minds conclude that there is a threat and sounds the alarm. This stress , ability to process new information, and ability to learn.

    This is where mindfulness comes in.

    Mindfulness helps us catch these in their tracks, allowing us to consciously notice negative thoughts. Rather than panicking, we become aware of how we are feeling – and why. We can therefore shift our relationship with our thoughts and emotions so that they don’t seem so challenging anymore.

    In a school setting, this can help students regulate the stress surrounding exam pressure. Students can also learn to sit more comfortably with the impermanent emotions of adolescence, which seemed all-consuming and everlasting at the time.

    What can our students learn from mindfulness?

    Over the past decade, neuroscientific research has discovered that our brains are immensely malleable. Every interaction we have in our day-to-day lives builds connections that affect how our brains and thoughts function. Just like building muscle through exercise, our brain forms new matter in the areas we use most.

    In short, we can either continue to cement the habits we’ve already formed or build brain matter in areas that encourage healthier, more positive functioning.

    Studies have demonstrated in many contexts that the brains of those who regularly practice mindfulness use different pathways to those who don’t: pathways which allow self-regulation of adrenaline and the stress responses and make it easier to experience external events without the accompanying narrative of critical thought.

    Even ten minutes of practicing mindful awareness a day has been . Mindfulness has also been shown to improve concentration and focus, resilience, emotional regulation and sleep quality in children, teens and adults alike.

    How can we begin to practice mindfulness?

    We start by learning to focus attention on a physical anchor. This may be focusing on the body, the breath, or even using the senses to observe sounds, sights, tastes, touch etc. in our external environment. We then build the length of time we can focus, and grow accustomed to the mind wandering and returning to the point of focus.

    Then we learn to be curious about what we notice in the present moment and that we can observe without judging or forming an opinion.

    In time, it can be possible to learn to observe our relationship with the thoughts that come in and out of our minds. We can then find ways to accept difficult feelings and allow them to pass over without panicking or instinctively reacting.

    Want to learn more about mindfulness and wellbeing? Check out our blog posts on the subject here.