This week, Google Brain researchers published a paper called ‘Switch Transformers: Scaling To Trillion Parameter Models With Simple And Efficient Sparsity’, which claims that they have trained a language model containing more than a trillion parameters. OpenAI’s GPT-3 only has about 175 billion parameters in comparison.

The model uses a sparsely activated technique called Switch Transformers using 32 TPU cores. The model was trained using the Colossal Clean Crawled Corpus which is an 800+ GB dataset of text scraped from Wikipedia, Reddit, and other sources. Also, this model was able to achieve a 4x-speedup over the T5-XXL model (previously largest model by Google).
You can read the paper in detail from here: Read paper.
Do you want to learn Python, Data Science, and Machine Learning while getting certified? Here are some best selling Datacamp courses that we recommend you enroll in:
- Introduction to Python (Free Course) - 1,000,000+ students already enrolled!
- Introduction to Data Science in Python- 400,000+ students already enrolled!
- Introduction to TensorFlow for Deep Learning with Python - 90,000+ students already enrolled!
- Data Science and Machine Learning Bootcamp with R - 70,000+ students already enrolled!