Snowflake Arctic, quantized to FP8 by FriendliAI, is now available on Friendli!

FriendliAI · Advanced ·🧠 Large Language Models ·1y ago
We’re excited to announce that Snowflake Arctic Instruct, quantized to FP8 by FriendliAI is now available on Friendli! With Friendli, even without advanced optimizations, Arctic can run at about 100 tokens per second with a batch size of 1. Additionally, Friendli allows Arctic to run with a large batch size on a single node. 🚀 Try now 👉 https://suite.friendli.ai #FriendliAI #Friendli #Arctic #Snowflake #LLM #Inference #Serving
Watch on YouTube ↗ (saves to browser)
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Next Up
5 Levels of AI Agents - From Simple LLM Calls to Multi-Agent Systems
Dave Ebbelaar (LLM Eng)