Using and Finetuning Pretrained Transformers
Tips for LLM Pretraining and Evaluating Reward Models
Research Papers in February 2024: A LoRA Successor, Small Finetuned LLMs Vs Generalist LLMs, and Transparent LLM Research
Improving LoRA: Implementing Weight-Decomposed Low-Rank Adaptation (DoRA) from Scratch
Supporting Ahead of AI
Research Papers in Jan 2024: Model Merging, Mixtures of Experts, and Towards Smaller LLMs
Understanding and Coding Self-Attention, Multi-Head Attention, Cross-Attention, and Causal-Attention in LLMs