How good are today’s language engines for Arabic?

AI-supported transcription and translations of the Arabic language is a challenge, mainly for two reasons: Firstly todays engines have not as much training data on Arabic as for English. Secondly, there are quite a number of Arabic dialects, which differ from Modern Standard Arabic (MSA), which is used as a standard for writing.
We wanted to better understand the current status from the perspective of native speakers and spoke with Walid Al-Saqaf, associate professor at the Doha Institute in Katar. For some weeks now students of journalistic studies have access to plain X for some weeks and use the software for transcriptions, translations and subtitling.
Could you briefly introduce yourself?

Walid Al-Saqaf: I am an associate professor in journalism at the Doha Institute in Qatar. My niche is using technology and how it can be leveraged for journalistic practices, including tools for transcription or translation. My regional focus is increasingly the Middle East and North Africa region, where I am based.
What is your current view on how AI can help journalism?
Walid Al-Saqaf: AI is a disruptive technology, and journalists must understand it. It’s an opportunity and a challenge. AI can help make information gathering and publishing more efficient, saving time. However, there are many ethical, legal, and technical constraints, especially with generative AI.
Our current impression is that AI is becoming an assistant for many people for a variety of tasks such as research or writing. Would you agree?
Walid Al-Saqaf: I agree. In some ways, it’s analogous to using a keyboard instead of a pen – it makes work faster, more efficient, with fewer errors and more standardisation. As AI improves, it will embed itself into our mindset, suggesting topics and other aspects. It’s like tapping into a massive encyclopaedia, but there may be biases and algorithmic aspects we aren’t aware of. The sources of data AI feeds on are necessarily going to be Pro-Western and come from a Western perspective. A lot of massive existing information in the Arab world and the global South is not tapped into that much. So, you need to use AI, be aware of its potential, but also be mindful of what it lacks.
What are your views regarding the plain X platform after some of your students started using the software?
Walid Al-Saqaf: Allowing us to use the tool has been very helpful. The most remarkable and effective use case for our students has been transcribing interviews. Transcribing hour-long interviews used to take many hours, but Plainix cuts this significantly, allowing students to go through main ideas and jump to points easily. I’d say it cuts the time by well over 50%, making them more productive. Another very useful feature is subtitling, important for accessibility. We are very glad for the dialects aspect, which makes us more inclusive and allows us to interview many more people who don’t have the same grasp of proper modern Arabic. The technical features like subtitling and dubbing are very useful.
How do you judge the quality of the engines specifically for Arabic languages and variations? Would most native speakers have an expectation of lower quality compared to languages like English?
Walid Al-Saqaf: Yes, that’s actually true. Arabic is still lagging in terms of both detectability and generating text. For example, tools to detect AI-generated text in Arabic are not as effective as for English. One area I found noteworthy about plain X is that the most effective way to get it to work properly is to be as close to modern Arabic as possible. The further away you are from classical Arabic and don’t specify the accent, the more trouble you get. I had a student interview someone speaking “Francophone Arabic” (French plus Arabic) in the North Africa/Maghrib area. The student could use the tool but needed to make a long list of corrections. Having the first draft still helped her tremendously. The further away you are from the dialect and classical Arabic, the more time you need to correct. Mixed language is really a challenge for most engines because that is not expected.
What would you hope for in terms of language technology improvement for Arabic in the coming years?
Walid Al-Saqaf: One particular area extremely relevant for journalism departments, especially where Modern Standard Arabic (MSA) is valued, is having the ability to produce MSA as the output regardless of the input accent. For subtitles, you could decide whether to have the subtitling in the original accent or the MSA equivalent. If you have a documentary with different regional accents, you could have two tracks – one with the accent and the other in MSA, which would be understood by anyone. If you have the MSA version, you can then use translation tools to convert it to any language globally, as it’s the most classical and well-understood approach to Arabic. This is something we currently do manually.
What would be the expectation regarding AI language tools beyond MSA? For example, which dialects you think need more technical resources?
Walid Al-Saqaf: Further improvements are needed for other dialects. The Yemeni dialect, for example, isn’t well-supported due to a lack of training data. Constantly embedding context into AI is also important so it doesn’t misquote or misunderstand what is being said. This could be AI-driven based on new data or based on a prompt you provide at the beginning, giving the AI awareness of the environment and a contextual understanding to be more accurate. Glossaries are also useful, allowing users to build their own local glossary to handle words spoken differently frequently, which helps avoid repeated corrections. While human interpreters are currently still needed for their contextualisation and ability to convey meaning eloquently, combining context and language capacity in an AI agent could potentially bridge this gap. Supporting low-resource languages, including specific dialects and regional variations, is crucial.

The Doha Institute (DI) is a graduate institute located in Qatar. The institute offers master’s and PhD programs, primarily in Arabic. As of 2024-2025 it enrolled around 600 graduate students from over 100 countries. A journalism program has been established only in recent years, the focus is to equip students with practical tools for the evolving media landscape, with increasing attention to the Middle East and North Africa region. The institute also hosts a Language Center that supports students’ Arabic and English proficiency, and its journalism program places strong emphasis on the use of Modern Standard Arabic (MSA). The DI uses programs in its structure and not departments
Link:
https://www.dohainstitute.edu.qa