| Stanford/Alpaca |
LLaMA-7B |
52K instruction-followling dataset, generate in self-instruct style using text-davinci-003 |
SFT |
3 hours on 8 80GB A100s, $500(data) + $100(train) |
| NLPCloud/instruct-gpt-j |
GPT-J-6B |
52K Alpaca |
SFT |
fp16 model deploy well on 16GB Tesla T4 |
| LianjiaTech/BELLE |
BLOOMZ-7B1-mt |
2M chinese data generated in a Alpaca way |
SFT |
8-bit GPTQ quantization using 12GB GPU |
| LianjiaTech/BELLE |
LLaMA-7B |
same |
SFT |
4-bit ggml quantization work well on M1 chip Mac |
| Alpaca-LoRA |
LLaMA-7B |
52K Alpaca; update to MSFT LLaMA-GPT4 dataset |
SFT with LoRA |
hours on a single RTX 4090(24GB) |
| Databricks/Dolly-v1-6B |
GPT-J-6B |
52K Alpaca |
SFT |
 |
| Databricks/Dolly-v2-12B |
Pythia-12b |
databricks-dolly-15k generated by Databricks employees in capability domains from the InstructGPT paper |
SFT |
about 3.5 hours on 8 V100s with fp16 to complete 1 epoch |
| GPT4All |
LLaMA-7B |
~800k GPT-3.5-Turbo Generations |
SFT with LoRA |
 |
| HIT&HFL/Chinese-LLaMA-Alpaca |
LLaMA-7B/13B |
ahout 2M chinese and english dataset |
add 20K chinese sentencepiece tokens to vocab to improve chinese decoding effciency; using DeepSpeed Zero-2 |
pretrain on 20GB general chinese corpus on 16 A100s; SFT with LoRA on 16 A100s |
| HIT&HFL/Chinese-LLaMA-Plus-7B |
LLaMA-7B |
re-pretrain LLaMA on larger(120G) general corpus, fine-tune with 4M instruction dataset |
SFT with LoRA(bigger rank) |
 |
| THUDM/ChatGLM-6B |
 |
 |
 |
 |
| LLaMA-Adaptor |
LLaMA-7B |
52K Alpaca |
SFT with LLaMA-Adaptor |
reduce 3 hours to 1 hour, 1.2M instead of 7B |
| FastChat/Vicuna |
LLaMA-7B/13B |
70K user-shared conversations gathered from ShareGPT.com |
SFT, 40x larger dataset and 4x sequence length |
4/8 A100s, $140/300 for training, Impressing GPT-4 with ~90% ChatGPT Quality |
| BAIR/Koala |
LLaMA-13B |
Around 60K dialogues shared by users on ShareGPT; Human ChatGPT Comparison Corpus (HC3), Open Source Data… |
SFT with JAX/Flax |
2 epochs in 6 hours using 8 A100s, beat ChatGPT on 180 real user queries |
| Baize |
LLaMA-7B/13B/30B |
100k dialogs generated by letting ChatGPT chat with itself; QA and healthcare dataset |
SFT with LoRA |
run on A100(80GB)s |
| Firefly |
bloom-1b4/2b6-zh |
1.1M instruction dataset build from 23 chinese NLP tasks, BELLE-0.5M-cn |
reduce vocab from 25w to 4.6w, SFT |
 |
| Arxiv Chat |
 |
 |
build on ChatGPT(QA), LangChain(main logic) and h2oai(UI) |
 |
| huggingface/StackLLaMA |
LLaMA-7B |
Stack Exchange dataset(10M<N<100M) |
SFT + RLHF |
(2+8)*7B=70GB, 80GB A100 works fine, LoRA/PEFT makes 50-60B model works on a single A100 possible |
| MSFT/LLaMA-GPT4 |
LLaMA-7B |
52K Alpaca prompt input using GPT-4 |
SFT, RM |
 |
| MSFT/DeepSpeed Chat |
 |
 |
support SFT, RM, RLHF |
Efficiency and Affordability |
| ColossalAI/ColossalChat |
 |
 |
support SFT, RM, RLHF |
quick preview |
| Phoenix |
LLaMA-7B/13B |
vast collection of popular multilingual open source dataset |
SFT |
 |
| fudan/MOSS-003 |
MOSS-16B |
~1.1M text-davinci-003 generated self-instruct dataset, include ~300k plugins dataset as text-to-image/equations/.etc |
SFT |
fp16 finetune on 2 A100s or 4/8-bit finetune on single 3090 |
| replit/replit-code-v1-3b |
2.7B |
entirely code, 525B tokens |
 |
10 days, benchmark better CodeX |