11.2 Popular Foundation Models

The open-source community, largely centered around the Hugging Face Hub, has produced a diverse range of powerful foundation models. These models serve as the backbone for countless applications and research projects. Understanding the key players and their characteristics is essential for selecting the right tool for a given task.

Interactive Model Explorer

The visualization below showcases popular foundation model families. Bubbles are sized by their parameter count. Click a bubble to see details, or use the buttons to group them.

{{ ctrl.selectedModel.name }} ({{ ctrl.selectedModel.params }}B)

Family: {{ ctrl.selectedModel.family }} | Type: {{ ctrl.selectedModel.type }}

{{ ctrl.selectedModel.description }}

Key Model Families

Originally released by Meta, the LLaMA models set a new standard for open-source LLMs. Their high performance at various sizes (7B, 13B, 70B) made them a popular base for further fine-tuning.
Mistral AI released highly capable models, including Mistral 7B, known for its efficiency. Mixtral 8x7B introduced the Sparse Mixture-of-Experts (SMoE) architecture, offering the performance of a much larger model with the inference speed of a smaller one.
Developed by Google, Gemma models are derived from the same research as the Gemini models. They are offered in smaller sizes (2B, 7B) and are optimized for responsible AI development and efficient deployment.
The Text-to-Text Transfer Transformer (T5) reframes all NLP tasks as a text-to-text problem. It's a versatile encoder-decoder model that remains influential for tasks like summarization and translation.