Paper Review(14)
-
[Paper Review] Language Models are Few-Shot Learners
Abstract 이번 연구에서는, "언어 모델 확장이 모든 task의 few-shot 성능을 크게 향상시킴"을 보여준다. 그 중 몇 가지 task는 이전 state-of-the-art를 능가하는 성능을 보여준다. GPT-3는 parameter 수정 및 fine-tuning을 하지 않고, 오직 텍스트(=입력 토큰)로 제공된 task 설명 및 시연(=예시 지문(or 보기)와 정답)만으로 모든 task를 수행한다. 더보기 Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-..
2022.10.26 -
[Paper Review] Language Models are Unsupervised Multitask Learners
Abstract 이번 연구에서는, (수백만 개 웹페이지로 이루어진) WebText로 학습한 언어 모델이 지도 학습 없이 여러 가지 테스크(e.g., QA, translation)를 수행할 수 있음을 보여준다. 모델 크기가 증가할수록 여러 테스크 성능이 덩달아 향상되기에, 모델 크기는 zero-shot task transfer의 핵심 요소라 할 수 있다. 이러한 실험 결과들을 통해, 자연어 테스크 설명(=입력 토큰)으로 테스크를 학습하여 수행하는 language processing systems을 만듬으로써, 새로운 방향성을 제시한다. 더보기 We demonstrate that language models begin to learn these tasks without any explicit supervis..
2022.10.07 -
[Paper Review] Zero-Shot Text-to-Image Generation
1. Introduction Recent advances fueled by large-scale generative models suggest a possible route for further improvements. Specifically, when compute, model size, and data are scaled carefully, autoregressive transformers have achieved impressive results in several domains such as text, image, and audio. 최근 연구에 따르면, 1) 계산량, 2) 모델 크기 그리고 3) 데이터 크기를 scaling하여 자기회귀 트렌스포머 모델(e.g., GPT-2)를 학습시키면 대부분 ..
2022.08.19 -
[Paper Review] Neural Discrete Representation Learning
1. Introduction 동기(Motivation) challenging tasks such as few-shot learning, domian adaptation, or reinforcement learning heavily rely on learnt representations from raw data, but the usefulness of generic representations trained in an unsupervised fashion is still far from being the dominant approach. Maximum likelihood and reconstruction error are two common objectives used to train unsupervise..
2022.08.02 -
[Paper Review] Auto-Encoding Variational Bayes
1. Introduction How can we perform efficient approximate inference and learning with directed probabilistic models whose continuous latent variables and/or parameters have intractable posterior distribution? 논문 목표는 intractable한 $p_{Z|X}(z|x) \ (z \ \text{is continuous})$를 추정하는 것이다. 글쓴이 뇌피셜 "In the AEVB algorithm we make inference and learning especially efficient by using the SGVB estimator to o..
2022.07.20 -
[Paper Review] ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
1. Extended Introduction DP(Data Parallelism) runs out of memory for models with more than 1.4B parameters on current generation of GPUs with 32GB memory. MP(Model Parallelism) requires model refactoring and have significant communication overhead. To overcome this limitations, we first analyze the full spectrum of memory consumption of the existing systems on model training and classify it into..
2022.06.27