Skip to content

Meta (Facebook) presents AI capable of translating 200 different languages

Goal has announced the development of NLLB-200, a model based on Artificial intelligence (AI), capable of translating into 200 different languages, including languages ​​such as Kambra, Lao or Igbo, which are spoken in different African countries.

Meta AI researchers have developed this system as part of the ‘No Language Left Behind’ (NLLB) initiative, which seeks to create advanced machine translation capabilities for most of the world’s languages.

Specifically, NLLB-200 can translate into 200 languages ​​that either did not exist until now in the most used translation tools or did not work correctly, according to the company in a statement sent to Europa Press.

READ ALSO: They create AI that predicts crimes a week in advance and 90% accuracy

Meta has highlighted these shortcomings, indicating that fewer than 25 African languages ​​are included in the current translatorsa problem that it tries to solve with this model, which includes 55 African languages.

The company has open sourced the NLLB-200 model and other tools so that other researchers can extend this work to more languages ​​and design more inclusive technologies.

With this, it has announced that it wants to grant grants of up to 200,000 dollars to non-profit organizations (NGOs) that want to apply this new technology in real environments.

Thus, he believes these advances will be able to provide more than 25 million translations a day in the news section of Facebook, Instagram and the rest of the platforms it develops.

READ ALSO: Infallible? Minerva is Google’s artificial intelligence capable of solving mathematical operations

With this commitment to the NLLB-200 model, Meta also hopes to offer accurate translations that can help detect harmful content and misinformation, as well as protect the integrity of political processes such as elections or stop cases of sexual exploitation and human trafficking on the Internet.

PROBLEMS IN TRANSLATION SYSTEMS

After presenting this AI model, Meta has mentioned what are the challenges they have had to face to develop their new NLLB-200 model.

In the first place, you have recalled that these services are trained with data, a training that consists of millions of sentences paired between combinations of other languages.

LOOK: Ready for 2040? Sky Cruise, the nuclear-powered flying hotel that uses an AI to navigate

The problem is that there are many combinations for which there are no parallel sentences that can serve as translations, which makes some of these translations include grammatical errors or inconsistencies.

Meta has pointed out that another great difficulty is optimizing a single model so that it works with different languages ​​without harming or compromising the translation.

In addition, he has pointed out that these translation models produce errors that are difficult to identify and, since there are fewer data sets for languages ​​with fewer resources, it is complex to test and improve them.

To overcome these difficulties, he initially worked on the M2M-100 100-language translation model, which prompted the creation of new methods to collect data and improve results.

SEE ALSO: Petvation, the facial recognition device that identifies your pet to let it enter your home

In order to reach the 200 languages ​​included in the NLLB-200, Meta had to focus mainly on three aspects: expanding the training resources available, adjusting the size of the model without sacrificing performance, and mitigation and evaluation tools for 200 Languages.

First of all, the company has pointed out that, in order to collect parallel texts for more accurate translations in other languages, has improved its Language Agnostic Sentence Representations (LASER) tool. zero-shot transfer.

Specifically, the new version of LASER uses a trained Transformer model with automatic supervision. In addition, the company has announced that it has improved performance by using a model based on teacher-student learning and creating specific encoders for each group of languages.

Also, to create concrete and correct grammatical forms, it has developed toxicity lists for all 200 languages ​​and has used them to evaluate and filter errors in order to reduce the risk of so-called ‘hallucination toxicity’. This occurs when the system mistakenly inserts problematic content during translations.

READ: AI learns to play Minecraft after watching 70 thousand hours of video on the Internet: how did it do it?

On the other hand, the company has recognized that there are still “great challenges ahead” to expand the model from 100 to 200 languages” and has focused especially on three aspects: regularization and curricular learning, learning of automatic supervision and diversification of back translation (that is, re-translate what was previously translated into the source language).

Finally, FLORES-200 has been presented, an evaluation data set that allows researchers to evaluate the performance of their latest AI-based model in more than 40,000 addresses between different languages.

Specifically, FLORES-200 can be used in different areas, such as health information brochures or cultural content (films or books) in countries or regions where languages ​​with few resources are spoken.

“We believe that NLLB can contribute to the preservation of different languages ​​when sharing content, instead of using one as an intermediary, which can lead to a misconception or convey a feeling that was not what was intended”, has pointed out Meta in this release.

READ ALSO: They develop software that would make robots work in a ‘swarm’

So that other researchers can learn about LASER3’s multilingual embedding method, it has published this program in open source, as well as FLORES-200.

I WORK WITH WIKIPEDIA

With the aim of creating a tool accessible to all users, the technology company has announced that it is collaborating with the Wikipedia Foundationthe non-profit organization that provides the server to Wikipedia and other free access projects.

Meta considers that there is a great imbalance around the availability of the different languages ​​that are spoken throughout the world that hosts this service. To do this, he has given the example that exists between the 3,260 Wikipedia articles written in Lingala (a language spoken by 45 million people in African countries) and the 2.5 million publications written in Swedish (a language that only 10 million speak). people in Sweden and Finland).

LOOK: Microsoft chose Uruguay to launch its first AI lab in Latin America

It has also stressed that Wikipedia editors are using NLLB-200 technology through the Wikimedia Foundation’s content translation tool to translate their entries into more than 20 languages ​​with few resources.

These are the ones that do not have rich enough data sets to train AI systems. These include 10 languages ​​that were previously unavailable.

Source: Elcomercio

Share this article:
globalhappenings news.jpg
most popular