LLaMA-Cult-and-More

Keeping Track of Affordable LLMs

Base Models

Model Spec

project base model data finetune hardware/Cost
Stanford/Alpaca LLaMA-7B 52K instruction-followling dataset, generate in self-instruct style using text-davinci-003 SFT 3 hours on 8 80GB A100s, $500(data) + $100(train)
NLPCloud/instruct-gpt-j GPT-J-6B 52K Alpaca SFT fp16 model deploy well on 16GB Tesla T4
LianjiaTech/BELLE BLOOMZ-7B1-mt 2M chinese data generated in a Alpaca way SFT 8-bit GPTQ quantization using 12GB GPU
LianjiaTech/BELLE LLaMA-7B same SFT 4-bit ggml quantization work well on M1 chip Mac
Alpaca-LoRA LLaMA-7B 52K Alpaca; update to MSFT LLaMA-GPT4 dataset SFT with LoRA hours on a single RTX 4090(24GB)
Databricks/Dolly-v1-6B GPT-J-6B 52K Alpaca SFT  
Databricks/Dolly-v2-12B Pythia-12b databricks-dolly-15k generated by Databricks employees in capability domains from the InstructGPT paper SFT about 3.5 hours on 8 V100s with fp16 to complete 1 epoch
GPT4All LLaMA-7B ~800k GPT-3.5-Turbo Generations SFT with LoRA  
HIT&HFL/Chinese-LLaMA-Alpaca LLaMA-7B/13B ahout 2M chinese and english dataset add 20K chinese sentencepiece tokens to vocab to improve chinese decoding effciency; using DeepSpeed Zero-2 pretrain on 20GB general chinese corpus on 16 A100s; SFT with LoRA on 16 A100s
HIT&HFL/Chinese-LLaMA-Plus-7B LLaMA-7B re-pretrain LLaMA on larger(120G) general corpus, fine-tune with 4M instruction dataset SFT with LoRA(bigger rank)  
THUDM/ChatGLM-6B        
LLaMA-Adaptor LLaMA-7B 52K Alpaca SFT with LLaMA-Adaptor reduce 3 hours to 1 hour, 1.2M instead of 7B
FastChat/Vicuna LLaMA-7B/13B 70K user-shared conversations gathered from ShareGPT.com SFT, 40x larger dataset and 4x sequence length 4/8 A100s, $140/300 for training, Impressing GPT-4 with ~90% ChatGPT Quality
BAIR/Koala LLaMA-13B Around 60K dialogues shared by users on ShareGPT; Human ChatGPT Comparison Corpus (HC3), Open Source Data… SFT with JAX/Flax 2 epochs in 6 hours using 8 A100s, beat ChatGPT on 180 real user queries
Baize LLaMA-7B/13B/30B 100k dialogs generated by letting ChatGPT chat with itself; QA and healthcare dataset SFT with LoRA run on A100(80GB)s
Firefly bloom-1b4/2b6-zh 1.1M instruction dataset build from 23 chinese NLP tasks, BELLE-0.5M-cn reduce vocab from 25w to 4.6w, SFT  
Arxiv Chat     build on ChatGPT(QA), LangChain(main logic) and h2oai(UI)  
huggingface/StackLLaMA LLaMA-7B Stack Exchange dataset(10M<N<100M) SFT + RLHF (2+8)*7B=70GB, 80GB A100 works fine, LoRA/PEFT makes 50-60B model works on a single A100 possible
MSFT/LLaMA-GPT4 LLaMA-7B 52K Alpaca prompt input using GPT-4 SFT, RM  
MSFT/DeepSpeed Chat     support SFT, RM, RLHF Efficiency and Affordability
ColossalAI/ColossalChat     support SFT, RM, RLHF quick preview
Phoenix LLaMA-7B/13B vast collection of popular multilingual open source dataset SFT  
fudan/MOSS-003 MOSS-16B ~1.1M text-davinci-003 generated self-instruct dataset, include ~300k plugins dataset as text-to-image/equations/.etc SFT fp16 finetune on 2 A100s or 4/8-bit finetune on single 3090
replit/replit-code-v1-3b 2.7B entirely code, 525B tokens   10 days, benchmark better CodeX

Fine-tune Stages

Typology of efficient LLM Training

Instruction Dataset

LLM evaluation