[LLM] Foundation Models 논문 모음
쏟아지는 기반 모델 관련 논문들
대용량 언어 모델(LLM)
기반 모델(Foundation Models) 관련 논문 모음
1. 기반 모델
기반 모델 및 응용 분야
- On the Opportunities and Risks of Foundation Models
- Multimodal Foundation Models: From Specialists to General-Purpose Assistants
- Large Multimodal Models: Notes on CVPR 2023 Tutorial
- Towards Generalist Biomedical AI
- A Comprehensive Survey on Pretrained Foundation Models: A History from BERT to ChatGPT
- Interactive Natural Language Processing
- Towards Reasoning in Large Language Models: A Survey
순환 신경망(RNN) 및 합성곱 신경망(CNN)
- Recurrent Neural Networks (RNNs): A gentle Introduction and Overview
- Highway Networks
- Long Short-Term Memory
- Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
- Effective Approaches to Attention-based Neural Machine Translation
- An Introduction to Convolutional Neural Networks
- ImageNet Classification with Deep Convolutional Neural Networks
- U-Net: Convolutional Networks for Biomedical Image Segmentation
- Deep Residual Learning for Image Recognition
- Densely Connected Convolutional Networks
- Aggregated Residual Transformations for Deep Neural Networks
- A ConvNet for the 2020s
자연어 처리(NLP) 및 컴퓨터 비전(CV)
- Sequence to Sequence Learning with Neural Networks
- Thumbs up? Sentiment Classification using Machine Learning Techniques
- A Survey of Named Entity Recognition and Classification
- Teaching Machines to Read and Comprehend
- Deep Neural Networks for Acoustic Modeling in Speech Recognition
- A Neural Attention Model for Sentence Summarization
- Microsoft COCO: Common Objects in Context
- Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation
- Fully Convolutional Networks for Semantic Segmentation
- DeepFace: Closing the Gap to Human-Level Performance in Face Verification
- DeepPose: Human Pose Estimation via Deep Neural Networks
- Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
2. 트랜스포머(Transformers) 아키텍쳐
셀프 어텐션(Self Attention) 및 트랜스포머(Transformers)
- Neural Machine Translation by Jointly Learning to Align and Translate
- Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
- Attention Is All You Need
- The Annotated Transformer
- Image Transformer
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
효율적인 트랜스포머
- Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
- Perceiver: General Perception with Iterative Attention
- Random Feature Attention
- Longformer: The Long-Document Transformer
- Generating Long Sequences with Sparse Transformers
- Linformer: Self-Attention with Linear Complexity
- Efficiently Modeling Long Sequences with Structured State Spaces
매개변수 효율적 튜닝
- Parameter-Efficient Transfer Learning for NLP
- LoRA: Low-Rank Adaptation of Large Language Models
- The Power of Scale for Parameter-Efficient Prompt Tuning
- It’s Not Just Size That Matters: Small Language Models Are Also Few-Shot Learners
- Making Pre-trained Language Models Better Few-shot Learners
- Few-Shot Parameter-Efficient Fine-Tuning is Better and Cheaper than In-Context Learning
- Towards a Unified View of Parameter-Efficient Transfer Learning
언어 모델 사전 학습
- Deep Contextualized Word Representations
- BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
- Unified Language Model Pre-training for Natural Language Understanding and Generation
- ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
- RoBERTa: A Robustly Optimized BERT Pretraining Approach
- ERNIE: Enhanced Language Representation with Informative Entities
- ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators
3. 대용량 언어 모델
대용량 언어 모델
- Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
- Language Models are Few-Shot Learners
- LaMDA: Language Models for Dialog Applications
- Language Models are Unsupervised Multitask Learners
- Evaluating Large Language Models Trained on Code
- PaLM: Scaling Language Modeling with Pathways
- Llama 2: Open Foundation and Fine-Tuned Chat Models
- Mixtral of Experts
스케일링 법칙(Scaling Law)
- Scaling Laws for Neural Language Models
- Scaling Laws for Transfer
- Emergent Abilities of Large Language Models
- Training Compute-Optimal Large Language Models
- Transcending Scaling Laws with 0.1% Extra Compute
- Inverse Scaling can become U-shaped
- Are Emergent Abilities of Large Language Models a Mirage?
지시 튜닝(Instruction Tuning) 및 인간 피드백 기반 강화학습(RLHF)
- Training Language Models to Follow Instructions with Human Feedback
- Finetuned Language Models Are Zero-Shot Learners
- Multitask Prompted Training Enables Zero-Shot Task Generalization
- LIMA: Less Is More for Alignment
- Direct Preference Optimization: Your Language Model is Secretly a Reward Model
- Zephyr: Direct Distillation of LM Alignment
효율적인 대용량 언어모델 학습
- Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
- ZeRO: Memory Optimizations Toward Training Trillion Parameter Models
- Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
- RoFormer: Enhanced Transformer with Rotary Position Embedding
- Fast Transformer Decoding: One Write-Head is All You Need
- GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints
- FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
효율적인 대용량 언어모델 추론
- BERT Loses Patience: Fast and Robust Inference with Early Exit
- Confident Adaptive Language Modeling
- Fast Inference from Transformers via Speculative Decoding
- Efficient Memory Management for Large Language Model Serving with PagedAttention
- Flash-Decoding for long-context inference
- Break the Sequential Dependency of LLM Inference Using Lookahead Decoding
대용량 언어 모델 압축과 희소화
- Efficient Large Scale Language Modeling with Mixtures of Experts
- GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding
- CoLT5: Faster Long-Range Transformers with Conditional Computation
- LLM.int8(): 8-bit Matrix Multiplication for Transformers at Scale
- 8-bit Optimizers via Block-wise Quantization
- QLoRA: Efficient Finetuning of Quantized LLMs
- BitNet: Scaling 1-bit Transformers for Large Language Models
대용량 언어 모델 프롬프팅(Prompting)
- Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
- Self-Consistency Improves Chain of Thought Reasoning in Language Models
- Tree of Thoughts: Deliberate Problem Solving with Large Language Models
- Program of Thoughts Prompting: Disentangling Computation from Reasoning for Numerical Reasoning Tasks
- Least-to-Most Prompting Enables Complex Reasoning in Large Language Models
- Measuring and Narrowing the Compositionality Gap in Language Models
- ReAct: Synergizing Reasoning and Acting in Language Models
- Self-Refine: Iterative Refinement with Self-Feedback
4. 멀티모달(Multimodal) 모델
비전 트랜스포머
- An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
- Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
- Training data-efficient image transformers & distillation through attention
- Emerging Properties in Self-Supervised Vision Transformers
- SwiftFormer: Efficient Additive Attention for Transformer-based Real-time Mobile Vision Applications
- MetaFormer Is Actually What You Need for Vision
- Masked Autoencoders Are Scalable Vision Learners
디퓨전(Diffusion) 모델
- Maximum Likelihood Training of Score-Based Diffusion Models
- Score-Based Generative Modeling through Stochastic Differential Equations
- Denoising Diffusion Implicit Models
- Denoising Diffusion Probabilistic Models
- DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps
- Consistency Models
이미지 생성
- High-Resolution Image Synthesis with Latent Diffusion Models
- Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
- Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
- Hierarchical Text-Conditional Image Generation with CLIP Latents
- PIXART-α: Fast Training of Diffusion Transformer for Photorealistic Text-to-Image Synthesis
- Adversarial Diffusion Distillation
멀티모달 모델 사전학습
- Learning Transferable Visual Models From Natural Language Supervision
- BLIP: Bootstrapping Language-Image Pre-training for Unified Vision-Language Understanding and Generation
- CoCa: Contrastive Captioners are Image-Text Foundation Models
- Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
- VinVL: Revisiting Visual Representations in Vision-Language Models
- Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
- Sigmoid Loss for Language Image Pre-Training
대용량 멀티모달 모델
- Flamingo: a Visual Language Model for Few-Shot Learning
- Visual Instruction Tuning
- InstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning
- PaLI: A Jointly-Scaled Multilingual Language-Image Model
- PaLI-3 Vision Language Models: Smaller
- Generative Multimodal Models are In-Context Learners
- InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks
- Gemini: A Family of Highly Capable Multimodal Models
5. 증강(Augmentation) 기반 모델
도구(Tool) 증강
- Toolformer: Language Models Can Teach Themselves to Use Tools
- ART: Automatic Multi-step Reasoning and Tool-use for Large Language Models
- ToolLLM: Facilitating Large Language Models to Master 16000+ Real-world APIs
- ToolkenGPT: Augmenting Frozen Language Models with Massive Tools via Tool Embeddings
- AgentBench: Evaluating LLMs as Agents
- CogAgent: A Visual Language Model for GUI Agents
- WebArena: A Realistic Web Environment for Building Autonomous Agents
검색(Retrieval) 증강
- REALM: Retrieval-Augmented Language Model Pre-Training
- Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks
- Improving Language Models by Retrieving from Trillions of Tokens
- Self-RAG: Learning to Retrieve
- REPLUG: Retrieval-Augmented Black-Box Language Models
열심히 읽자!