And a special shoutout to Sathwik, best co-lead anyone could ask for.
And a special shoutout to Sathwik, best co-lead anyone could ask for.
🧰 Engineers: fine-tune it
🧪 Builders: break it
Tell us what you find.
Apriel-5B models are permissively licensed (MIT) and ready to chat.
#Apriel #LLM #AI #OpenWeights #FastLLM #SLAM #ServiceNow #ServiceNowResearch
🧰 Engineers: fine-tune it
🧪 Builders: break it
Tell us what you find.
Apriel-5B models are permissively licensed (MIT) and ready to chat.
#Apriel #LLM #AI #OpenWeights #FastLLM #SLAM #ServiceNow #ServiceNowResearch
🧪 Fast, cheap, high-quality model training
📦 Compact models that generalize well
This is just the start.
🧪 Fast, cheap, high-quality model training
📦 Compact models that generalize well
This is just the start.
🖥️ 480 x H100s
⏱️ ~91,000 H100-hours
🧮 4.8B params, bfloat16
💸 2.3 x fewer GPU hours than OLMo-2-7B
Thanks to Fast-LLM, github.com/ServiceNow/F..., our custom training stack for speed and scale. No hacks. Just better infra.
🖥️ 480 x H100s
⏱️ ~91,000 H100-hours
🧮 4.8B params, bfloat16
💸 2.3 x fewer GPU hours than OLMo-2-7B
Thanks to Fast-LLM, github.com/ServiceNow/F..., our custom training stack for speed and scale. No hacks. Just better infra.
💥 Beats OLMo-2-7B-Instruct and Mistral-Nemo-12B-Instruct on avg
💥 Competitive with LLama-3.1-8B-Instruct, beats it in math benchmarks and IF Eval
💥 Beats OLMo-2-7B-Instruct and Mistral-Nemo-12B-Instruct on avg
💥 Competitive with LLama-3.1-8B-Instruct, beats it in math benchmarks and IF Eval
🧠 Apriel-5B-Base: pretrained, general-purpose decoder
🧑🏫 Apriel-5B-Instruct: chat-style variant for aligned outputs
Trained on 4.5T+ tokens.
👉 huggingface.co/ServiceNow-AI/Apriel-5B-Base
👉 huggingface.co/ServiceNow-AI/Apriel-5B-Instruct
🧠 Apriel-5B-Base: pretrained, general-purpose decoder
🧑🏫 Apriel-5B-Instruct: chat-style variant for aligned outputs
Trained on 4.5T+ tokens.
👉 huggingface.co/ServiceNow-AI/Apriel-5B-Base
👉 huggingface.co/ServiceNow-AI/Apriel-5B-Instruct