Formosa Foundation Model Licensing List

注意事項

Inference setting: Inference Quantization = FP16，KV Cache Quantization = FP16/BF16，Batch size = 1
GPU memory includes model weights, activations, KV cache, and framework overhead.

FFM Series	Models	Sequence length	Minimum GPU memory requirements*
Llama3.3-FFM	Llama3.3-FFM-70B	32K**	198 GB
Llama3.2-FFM	Llama3.2-FFM-11B-V	32K**	52 GB
Llama3.1-FFM	Llama3.1-FFM-8B	32K**	35 GB
	Llama3.1-FFM-70B	32K**	185 GB
	Llama3.1-FFM-405B	32K**	915 GB
Llama3-FFM	Llama3-FFM-8B	8K	27 GB
Llama3-FFM	Llama3-FFM-70B	8K	165 GB
FFM-Mistral	FFM-Mistral-7B	32K	34 GB
FFM-Mistral	FFM-Mixtral-8x7B	32K	48 GB
FFM-Llama2-v2	FFM-Llama2-v2-7B	4K	17 GB
	FFM-Llama2-v2-13B	4K	30 GB
	FFM-Llama2-v2-70B	4K	154 GB
FFM-Llama2	FFM-Llama2-7B	4K	17 GB
	FFM-Llama2-13B	4K	30 GB
	FFM-Llama2-70B	4K	154 GB
FFM-embedding	FFM-embedding-v2.1	8K	2 GB
	FFM-embedding-v2	8K	2 GB
	FFM-embedding	2K	2 GB
FFM-BLOOMZ	FFM-BLOOMZ-7B	4K	20 GB
FFM-BLOOMZ	FFM-BLOOMZ-176B	4K	389 GB

^⁕For single model deployment
^**Sequence Length can be 128K while the GPU memory has to be reestimated.