Posted by Isaac Caswell & Bowen Liang, Software Engineers, Google Research

Advances in machine learning (ML) have driven improvements lớn automated translation, including the GNMT neural translation mã sản phẩm introduced in Translate in 2016, that have enabled great improvements to lớn the chất lượng of translation for over 100 languages. Nevertheless, state-of-the-art systems lag significantly behind human performance in all but the most specific translation tasks. Và while the research community has developed techniques that are successful for high-resource languages like Spanish and German, for which there exist copious amounts of data, performance on low-resource languages, like Yoruba or Malayalam, still leaves much khổng lồ be desired. Many techniques have demonstrated significant for low-resource languages in controlled research settings (e.g., the WMT Evaluation, however these results on smaller, publicly datasets may not easily transition to large, web-crawled datasets.

Bạn đang xem: Google dịch google dịch google dịch google dịch google dịch

In this post, we cốt truyện some recent progress we have made in translation quality for supported languages, especially for those that are low-resource, by synthesizing & expanding a variety of recent advances, and demonstrate how they can be applied at scale lớn noisy, web-mined data. These techniques span improvements to model architecture &, improved treatment of noise in datasets, increased multilingual transfer learning through M4 modeling, & use of monolingual data. The chất lượng improvements, which averaged +5 BLEU score over all 100+ languages, are visualized below.

BLEU score of Google Translate models since shortly after its inception in 2006. The improvements since the implementation of the new techniques over the last year are highlighted at the kết thúc of the animation.

Advances for Both High- and Low-Resource Languages

Hybrid model Architecture

Four years ago we introduced the RNN-based GNMT model, which yielded large unique improvements và enabled Translate to lớn cover many more languages. Following our work decoupling different aspects of model performance, we have replaced the original GNMT system, instead models with a transformer encoder & an RNN decoder, implemented in Lingvo (a TensorFlow framework). Transformer models have been demonstrated lớn be generally more effective at machine translation than RNN models, but our work suggested that most of these quality were from the transformer encoder, and that the transformer decoder was not significantly better than the RNN decoder. Since the RNN decoder is much faster at inference time, we applied a variety of optimizations before coupling it with the transformer encoder. The resulting hybrid models are higher-quality, more stable in, và exhibit lower latency.

Web Crawl

Neural Machine Translation (NMT) models are using examples of translated sentences & documents, which are typically collected from the public web. Compared to lớn phrase-based machine translation, NMT has been found khổng lồ be more sensitive to data quality. As such, we replaced the previous data collection system with a new data miner that focuses more on precision than recall, which allows the collection of higher unique data from the public web. Additionally, we switched the web crawler from a dictionary-based mã sản phẩm to an embedding based mã sản phẩm for 14 large language, which increased the number of sentences collected by an average of 29 percent, without loss of precision.

Modeling Data Noise:

Data with significant noise is not only redundant but also lowers the chất lượng of models on it. In order to address data noise, we used our results on denoising NMT khổng lồ assign a score khổng lồ every example using preliminary models on noisy data và fine-tuned on clean data. We then treat as a curriculum learning problem — the models start out on all data, & then gradually on smaller and cleaner subsets.


Widely adopted in state-of-the-art machine translation systems, back-translation is especially helpful for low-resource languages, where parallel data is scarce. This technique augments parallel data (where each sentence in one language is with its translation) with synthetic parallel data, where the sentences in one language are written by a human, but their translations have been generated by a neural translation model. By incorporating back-translation into Google Translate, we can make use of the more abundant monolingual text data for low-resource languages on the web for our models. This is especially helpful in increasing fluency of mã sản phẩm output, which is an area in which low-resource translation models underperform.

M4 Modeling

A technique that has been especially helpful for low-resource languages has been M4, which uses a single, giant mã sản phẩm to translate between all languages và English. This allows for transfer learning at a massive scale. As an example, a lower-resource language like Yiddish has the benefit of with a wide array of other related Germanic languages (e.g., German, Dutch, Danish, etc.), as well as almost a hundred other languages that may not chia sẻ a known linguistic connection, but may provide useful signal khổng lồ the model.

Judging Translation Quality

A popular metric for automatic chất lượng evaluation of machine translation systems is the BLEU score, which is based on the similarity between a system’s translation và reference translations that were generated by people. With these latest updates, we see an average BLEU of +5 points over the previous GNMT models, with the 50 lowest-resource languages seeing an average of +7 BLEU. This improvement is comparable lớn the observed four years ago when transitioning from phrase-based translation to lớn NMT.

Xem thêm: Lớp Học Quản Trị Kinh Doanh Buổi Tối, Khóa Học Quản Trị Kinh Doanh Online, Ngắn Hạn

Although BLEU score is a well-known approximate measure, it is known to have various pitfalls for systems that are already high-quality. For instance, several works have demonstrated how the BLEU score can be biased by translationese effects on the source side or target side, a phenomenon where translated text can sound awkward, attributes (like word order) from the source language. For this reason, we performed human side-by-side evaluations on all new models, which confirmed the in BLEU.

In addition lớn general quality improvements, the new models show increased robustness to lớn machine translation hallucination, a phenomenon in which models produce strange “translations” when given nonsense input. This is a common problem for models that have been on small amounts of data, & affects many low-resource languages. For example, when given the string of Telugu characters “ష ష ష ష ష ష ష ష ష ష ష ష ష ష ష”, the old mã sản phẩm produced the nonsensical output “Shenzhen Shenzhen Shaw International (SSH)”, seemingly trying to lớn make sense of the sounds, whereas the new mã sản phẩm correctly learns to lớn transliterate this as “Sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh sh”.


Although these are impressive strides forward for a machine, one must remember that, especially for low-resource languages, automatic translation unique is far from perfect. These models still fall prey to lớn typical machine translation errors, including poor performance on particular genres of subject matter (“”), conflating different dialects of a language, producing overly literal translations, and poor performance on informal and spoken language.

Nonetheless, with this update, we are proud to provide automatic translations that are relatively coherent, even for the lowest-resource of the 108 supported languages. We are grateful for the research that has enabled this from the active community of machine translation researchers in academia và industry.


This effort is built on contributions from Tao Yu, Ali Dabirmoghaddam, Klaus Macherey, Pidong Wang, Ye Tian, Jeff Klingner, Jumpei Takeuchi, Yuichiro, Hideto Kazawa, Apu Shah, Manisha, Keith Stevens, Fangxiaoyu Feng, Chao Tian, John Richardson, Rajat Tibrewal, Orhan Firat, Mia Chen, Ankur Bapna, Naveen Arivazhagan, Dmitry Lepikhin, Wei Wang, Wolfgang Macherey, Katrin Tomanek, Qin Gao, Mengmeng Niu, and Macduff Hughes.