Language Translation using Hugging Face and Python in 3 lines of code

Greetings! Some links on this site are affiliate links. That means that, if you choose to make a purchase, The Click Reader may earn a small commission at no extra cost to you. We greatly appreciate your support!

Learn to perform language translation using the transformers library from Hugging Face in just 3 lines of code with Python.

The transformers library provides thousands of pre-trained models to perform tasks on texts such as classification, information extraction, question answering, summarization, translation, text generation, and more in over 100 languages. Its aim is to make cutting-edge NLP easier to use for everyone.

First, let us install the transformers library and its dependencies for language translation,

pip install transformers sentencepiece -q

Next, importing the pipeline function from the transformers library,

# Importing the pipeline function from the transformers library
from transformers import pipeline
Pipeline method for Language Translation

The pipeline method is responsible for:

  • Pre-processing: Converting raw text input to numerical input for a given pre-trained model
  • Model Inference: Making a prediction using a pre-trained model
  • Post-processing: Converting prediction to a proper output
# Creating a Text2TextGenerationPipeline for language translation
pipe = pipeline(task='text2text-generation', model='facebook/m2m100_418M')

M2M100 is a multilingual encoder-decoder (seq-to-seq) model trained for Many-to-Many multilingual translation. The model can directly translate between 100 different languages without relying on English data. You can learn more about it from Facebook’s blog post.

# Converting 
pipe("That is a flower", forced_bos_token_id=pipe.tokenizer.get_lang_id(lang='hi'))

Here, to force the target language id as the first generated token, we pass the forced_bos_token_id parameter.

Language Translation using Hugging Face and Python in 3 lines of codeLanguage Translation using Hugging Face and Python in 3 lines of code

Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:

  1. Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
  2. Introduction to Data Science  in Python- 400,000+ students already enrolled!
  3. Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
  4. Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!

Leave a Comment