강정노트

Paper Review(14)

[Paper Review] Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Abstract In this work, we present our techniques for training very large transformer models and implement a simple, efficient intra-layer model parallel approach that enables training transformer models with billions of parameters. We sustain $15.1$ PetaFLOPs across the entire application with $76$% scaling efficiency when compared to a strong single GPU baseline that sustains $39$ TeraFLOPs, wh..
2022.06.07
[Paper Review] LaMDA: Language Models for Dialog Applications
Abstract We demonstrate that fine-tuning with annotated data and enabling the model to consult external knowledge sources can lead to significant improvements towards the two key challenges of 1). safety and 2). factual grounding. We also explore the use of LaMDA in 3). the domains of education and content recommendations to investigate its potential and shortcomings. 1. Introduction According t..
2022.05.24

1 2 3

티스토리툴바