Language Translation using Hugging Face and Python in 3 lines of code

Learn to perform language translation using the transformers library from Hugging Face in just 3 lines of code with Python.

The transformers library provides thousands of pre-trained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, and more in over 100 languages. Its aim is to make cutting-edge NLP easier to use for everyone.

First, let us install the transformers library and its dependencies for language translation,

pip install transformers sentencepiece -q

Next, importing the pipeline function from the transformers library,

# Importing the pipeline function from the transformers library
from transformers import pipeline
Pipeline method for Language Translation

The pipeline method is responsible for:

  • Pre-processing: Converting raw text input to numerical input for a given pre-trained model
  • Model Inference: Making a prediction using a pre-trained model
  • Post-processing: Converting prediction to a proper output
# Creating a Text2TextGenerationPipeline for language translation
pipe = pipeline(task='text2text-generation', model='facebook/m2m100_418M')

M2M100 is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation. The model can directly translate between 100 different languages without relying on English data. You can learn more about it from Facebook’s blog post.

# Converting 
pipe("That is a flower", forced_bos_token_id=pipe.tokenizer.get_lang_id(lang='hi'))

Here, to force the target language id as the first generated token, we pass the forced_bos_token_id parameter.

