Can computers really mark exams? Benefits of ELT automated assessments

app Languages
Hands typing at a laptop with symbols

Automated assessment, including the use of Artificial Intelligence (AI), is one of the latest education tech solutions. It speeds up exam marking times, removes human biases, and is as accurate and at least as reliable as human examiners. As innovations go, this one is a real game-changer for teachers and students. 

However, it has understandably been met with many questions and sometimes skepticism in the ELT community – can computers really mark speaking and writing exams accurately? 

The answer is a resounding yes. Students from all parts of the world already take AI-graded tests.  aԻ Versanttests – for example – provide unbiased, fair and fast automated scoring for speaking and writing exams – irrespective of where the test takers live, or what their accent or gender is. 

This article will explain the main processes involved in AI automated scoring and make the point that AI technologies are built on the foundations of consistent expert human judgments. So, let’s clear up the confusion around automated scoring and AI and look into how it can help teachers and students alike. 

AI versus traditional automated scoring

First of all, let’s distinguish between traditional automated scoring and AI. When we talk about automated scoring, generally, we mean scoring items that are either multiple-choice or cloze items. You may have to reorder sentences, choose from a drop-down list, insert a missing word- that sort of thing. These question types are designed to test particular skills and automated scoring ensures that they can be marked quickly and accurately every time.

While automatically scored items like these can be used to assess receptive skills such as listening and reading comprehension, they cannot mark the productive skills of writing and speaking. Every student's response in writing and speaking items will be different, so how can computers mark them?

This is where AI comes in. 

We hear a lot about how AI is increasingly being used in areas where there is a need to deal with large amounts of unstructured data, effectively and 100% accurately – like in medical diagnostics, for example. In language testing, AI uses specialized computer software to grade written and oral tests. 

How AI is used to score speaking exams

The first step is to build an acoustic model for each language that can recognize speech and convert it into waveforms and text. While this technology used to be very unusual, most of our smartphones can do this now. 

These acoustic models are then trained to score every single prompt or item on a test. We do this by using human expert raters to score the items first, using double marking. They score hundreds of oral responses for each item, and these ‘Standards’ are then used to train the engine. 

Next, we validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. If this doesn’t happen for any item, we remove it, as it must match the standard set by human markers. We expect a correlation of between .95-.99. That means that tests will be marked between 95-99% exactly the same as human-marked samples. 

This is incredibly high compared to the reliability of human-marked speaking tests. In essence, we use a group of highly expert human raters to train the AI engine, and then their standard is replicated time after time.  

How AI is used to score writing exams

Our AI writing scoring uses a technology called . LSA is a natural language processing technique that can analyze and score writing, based on the meaning behind words – and not just their superficial characteristics. 

Similarly to our speech recognition acoustic models, we first establish a language-specific text recognition model. We feed a large amount of text into the system, and LSA uses artificial intelligence to learn the patterns of how words relate to each other and are used in, for example, the English language. 

Once the language model has been established, we train the engine to score every written item on a test. As in speaking items, we do this by using human expert raters to score the items first, using double marking. They score many hundreds of written responses for each item, and these ‘Standards’ are then used to train the engine. We then validate the trained engine by feeding in many more human-marked items, and check that the machine scores are very highly correlated to the human scores. 

The benchmark is always the expert human scores. If our AI system doesn’t closely match the scores given by human markers, we remove the item, as it is essential to match the standard set by human markers.

AI’s ability to mark multiple traits 

One of the challenges human markers face in scoring speaking and written items is assessing many traits on a single item. For example, when assessing and scoring speaking, they may need to give separate scores for content, fluency and pronunciation. 

In written responses, markers may need to score a piece of writing for vocabulary, style and grammar. Effectively, they may need to mark every single item at least three times, maybe more. However, once we have trained the AI systems on every trait score in speaking and writing, they can then mark items on any number of traits instantaneously – and without error. 

AI’s lack of bias

A fundamental premise for any test is that no advantage or disadvantage should be given to any candidate. In other words, there should be no positive or negative bias. This can be very difficult to achieve in human-marked speaking and written assessments. In fact, candidates often feel they may have received a different score if someone else had heard them or read their work.

Our AI systems eradicate the issue of bias. This is done by ensuring our speaking and writing AI systems are trained on an extensive range of human accents and writing types. 

We don’t want perfect native-speaking accents or writing styles to train our engines. We use representative non-native samples from across the world. When we initially set up our AI systems for speaking and writing scoring, we trialed our items and trained our engines using millions of student responses. We continue to do this now as new items are developed.

The benefits of AI automated assessment

There is nothing wrong with hand-marking homework tests and exams. In fact, it is essential for teachers to get to know their students and provide personal feedback and advice. However, manually correcting hundreds of tests, daily or weekly, can be repetitive, time-consuming, not always reliable and takes time away from working alongside students in the classroom. The use of AI in formative and summative assessments can increase assessed practice time for students and reduce the marking load for teachers.

Language learning takes time, lots of time to progress to high levels of proficiency. The blended use of AI can:

  • address the increasing importance of formative assessmentto drive personalized learning and diagnostic assessment feedback 

  • allow students to practice and get instant feedback inside and outside of allocated teaching time

  • address the issue of teacher workload

  • create a virtuous combination between humans and machines, taking advantage of what humans do best and what machines do best. 

  • provide fair, fast and unbiased summative assessment scores in high-stakes testing.

We hope this article has answered a few burning questions about how AI is used to assess speaking and writing in our language tests. An interesting quote from Fei-Fei Li, Chief scientist at Google and Stanford Professor describes AI like this:

“I often tell my students not to be misled by the name ‘artificial intelligence’ — there is nothing artificial about it; A.I. is made by humans, intended to behave [like] humans and, ultimately, to impact human lives and human society.”

AI in formative and summative assessments will never replace the role of teachers. AI will support teachers, provide endless opportunities for students to improve, and provide a solution to slow, unreliable and often unfair high-stakes assessments.

Examples of AI assessments in ELT

At app, we have developed a range of assessments using AI technology.

Versant

The Versant tests are a great tool to help establish language proficiency benchmarks in any school, organization or business. They are specifically designed for placement tests to determine the appropriate level for the learner.

PTE Academic

The  is aimed at those who need to prove their level of English for a university place, a job or a visa. It uses AI to score tests and results are available within five days. 

More blogs from app

  • A woman sat outdoors reading a booklet

    Seven ways to develop independent learners

    By Richard Cleeve

    What is independent learning?

    Students who are actively involved in deciding what and how they learn are typically more engaged and motivated.

    That’s not surprising, because independent learners are extremely focused on their personal learning objectives.

    , independent learning is “a process, a method and a philosophy of education whereby a learner acquires knowledge by his or her own efforts and develops the ability for inquiry and critical evaluation."

  • A young woman taking notes in a lecture theatre, she is sat by other young people.

    Preparing your learners for university study abroad

    By Richard Cleeve

    Whether your learners are going for a single semester, academic year or an entire university course, studying abroad is an excellent opportunity for them. They’ll have the chance to discover a new culture, develop new skills and make new friends.

    University study in another country also poses several challenges. But as a teacher, you can equip them for this experience and prepare them for future academic success.

    Why study abroad?

    Most people think that studying at university is hard enough, without the added difficulty of doing it overseas. But that doesn’t stop hundreds of thousands of university students from leaving the support of family and friends and relocating to a foreign country.

    People apply to study in another country for a range of reasons. A university program abroad might offer the student better tuition and a greater promise of future employment or simply represent better value for money. And in the case of very specialist university courses, studying abroad may be the only option.

    Whatever the reason, the decision to study in a foreign country is likely to involve a high level of proficiency in another language – and more often than not, that language is English.

    A move towards English language in higher education

    There has been a significant shift in higher education in the last ten years, as many European institutions look to internationalize their programs. As a result, across Europe, we have seen a sharp growth in the number of university courses taught in English. English-taught bachelor’s programs offered by universities in the have multiplied dramatically over the last decade.

    What challenges do learners face?

    Academic skills

    There are a whole range of academic skills that students are expected to know when they start university. From research and evaluation, to note-making and referencing, many learners will enter higher education lacking many of the essential skills they require.

    Studying in a foreign language

    Not only will they have to master new skills, but they may need to do them in a second language. What’s more, even everyday things that fluent speakers may take for granted, such as understanding lectures, reading academic papers, writing essays and even socializing with new friends, will take a lot more effort if English isn’t your first language.

    Administrative issues

    There are many potential pitfalls for a student in a new academic setting. From the administrative process and campus regulations to the types of lessons and assessments, there may be a lot of differences to deal with. Even understanding the etiquette of addressing and interacting with professors can be daunting.

    Problems integrating

    Another challenge is integrating into another culture. Even if the host country is culturally similar, adapting to new surroundings is not always straightforward. There can also be a certain amount of ghettoization, where international students might stick together and remain isolated from the local student population.

    Homesickness

    for international students to deal with. Depending on how far they travel to study, your learners may be unable to return home easily, visit their families and alleviate their homesickness.

    Mental health

    Moving abroad and living in a completely new place can be very stressful and overwhelming, and many factors can exacerbate/cause . Making it harder to do day-to-day tasks, socialize and study.

    Money worries

    Without a grant or a scholarship, studying abroad can be very expensive. If your learners currently live at home with their parents, the cost of accommodation may be formidable. The higher cost of living could mean they have to look for a part-time job to supplement their income. Understanding a country's can also be confusing and hard to calculate into their budget.

    What can you do to get your students ready?

    All of the challenges mentioned above have one thing in common. If a student cannot communicate effectively, these situations can be exacerbated. Language is key, whether it’s accessing support, communicating with professors or getting to grips with a new culture.

    Here are some things you can do to help your learners prepare for university life:

    1) Put them in touch with past students

    It’s important that your learners have a clear idea of what university study abroad entails. Creating a chance for them to speak to other students who have already gone through that experience can be extremely valuable.

    Students who have returned from studying abroad can help with your learners' doubts and put their minds at rest. They might be able to provide essential advice about a specific country or university or simply tell their story. Either way, it’s a great way to reassure and encourage your learners.

    2) Use appropriate authentic content

    In preparation for your learner’s time abroad, the language course that you teach should align with their future linguistic needs. One of the main aims should be to develop the language skills required to perform successfully and confidently in their new context.

    3) Teach them academic study skills

    Think back to when you were at university and what you struggled with. Group work, presentations, critical thinking and exam skills are all things which your learners will need to be proficient in, so the more you practice them in class the better.

    4) Promote autonomous learning

    Success at university is deeply rooted in a student’s ability to work independently and develop practical self-study skills. Giving your learners more choice in the language learning process is one way to encourage autonomy.

  • A diverse group of people standing together in a group

    The importance of gender equality within learner content

    By Richard Cleeve

    Gender equality in the publishing industry

    The impact of any learning material goes far beyond its subject matter and pedagogical objectives. Everything included, from the choice of language, to the imagery, to the text and front covers, has the potential to reinforce stereotypes unintentionally. This can shape a learner’s sense of self and others around them and affect how they feel and behave in a social setting.

    A wealth of evidence suggests that early gender bias influences future inequality. It can affect career aspirations, influence the choice of school subjects and ultimately contribute to gender disparity as children grow into adults.This is a challenge for all sectors and industries across society. Guidelines have been developed for app to ensure that our materials are gender equal and showcase positive female role models.

    The guidelines are broken down into three different areas surrounding gender equality:

    1. The representation of people and characters in content

    The guidelines help to ensure that women are represented equally to men in our learning and teaching materials. This includes ensuring that women's representation does not reinforce negative stereotypes. For example, content that shows women as single parents can also present them as single parents and workers. The idea is to .

    Another common example is with regards to science materials.

    Often, when students are asked to describe a famous scientist, they describe a character similar to Albert Einstein with white hair and a white coat. Female scientists are often overlooked in this respect, and historically, they have not been given as much attention as their male counterparts.

    This type of unconscious bias is something the guidelines aim to help change. Our goal is to represent both women and men from various backgrounds across all subjects. For example, some content shows women in traditionally male roles, such as pilots, engineers and soldiers. The objective is to highlight that .

    Another issue is the objectification of women. Often, women are presented as not having agency or purpose, and too much focus is placed on their appearance, rather than their intentions, behavior and aspirations. The new guidelines set out to change this.

    2. The use of language

    Our language is gendered and therefore steeped in stereotyping. We aim to promote the use of terminology that is non-gendered. For example, using ‘police officer’, ‘firefighter’ and ‘maintenance worker’ instead of ‘policeman’, ‘fireman’ and ‘handyman’. Although this is a small change, it contributes to removing the unconscious bias surrounding jobs and professions.

    Adjectives can also play a role in perpetuating gender inequality. We often associate particular adjectives with genders. For example, words like ‘hysterical’, ‘shrill’, or ‘frumpy’ are typically used for women. Whereas men can be described as ‘assertive’, women are more likely to be seen as ‘bossy’.

    Furthermore, parallel language is something that needs to be looked at. Words like ‘girls and boys’ can be replaced with ‘students’. In this way, the guidelines are here to ensure that there is no gendering within materials. This will influence gender equality among our users.

    3. Referencing third-party content

    Another key issue involved in the material is the referencing of third-party content. For example, stories based on classic fairy tales are often used to represent certain points, and these typically show the strong male hero saving the weak female damsel in distress.

    Although these are stories that our society has grown up with, they could be more helpful in offering a gender-balanced view of society. app’s guidelines are in place to ensure that students see women and men as equals throughout the materials.

    What can teachers do to help in the classroom?

    To help fight against gender inequality, teachers can think about incorporating more female stories and role models into their lessons. For example, rather than simply focusing on Issac Newton or Albert Einstein in science class.

    At a management level, schools can be more aware of what materials they are choosing to bring in, assessing whether the content is balanced, before accepting it. These simple actions can help our learners grow up with a more balanced view of gender.