Paper Review(14)
-
[Paper Review] Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Abstract In this work, we present our techniques for training very large transformer models and implement a simple, efficient intra-layer model parallel approach that enables training transformer models with billions of parameters. We sustain 15.1 PetaFLOPs across the entire application with 76% scaling efficiency when compared to a strong single GPU baseline that sustains 39 TeraFLOPs, wh..
2022.06.07 -
[Paper Review] LaMDA: Language Models for Dialog Applications
Abstract We demonstrate that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of 1). safety and 2). factual grounding. We also explore the use of LaMDA in 3). the domains of education and content recommendations to investigate its potential and shortcomings. 1. Introduction According t..
2022.05.24