Cerebras Released World’s Fastest AI Inference

– The Wafer-Scale Engine from Cerebras has proven to be faster than Groq at generating AI inference.

[{"selector":"#anim-2a9e5cf8-df3e-41e6-b6e0-9d801d9a6b6c","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}]

With the 8B model, Cerebras Inference can process up to 1,800 tokens per second

[{"selector":"#anim-3f8caa90-6840-4348-91be-ee837744a811","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}]

with the 70B model, it can process 450 tokens per second.

[{"selector":"#anim-32c7463c-5981-4126-b01d-e02904eae6bc","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}]

– In contrast, Groq can operate with 8B and 70B models at up to 750 T/s and 250 T/s, respectively.

[{"selector":"#anim-4eb9ae23-094c-4df7-b04c-1af7700416dc","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}]

Cerebras AI Inference has finally opened access to its Wafer-Scale Engine (WSE)

[{"selector":"#anim-938f334e-538d-4539-9d75-0516701a277d","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}]

the bigger Llama 3.1 70B variant. Prior to Cerebras, Groq held the title of quickest AI inference supplier.

[{"selector":"#anim-7019a6e9-7e69-4ef6-9e2b-6544f292efbd","keyframes":{"opacity":[0,1]},"delay":0,"duration":600,"easing":"cubic-bezier(0.4, 0.4, 0.0, 1)","fill":"both"}]

Read Full News

[{"selector":"#anim-85fcaa35-48a7-497a-a155-8556861ad2fe","keyframes":{"transform":["scale(1)","scale(1.5)","scale(0.95)","scale(1)"],"offset":[0,0.33,0.66,1]},"delay":0,"duration":1450,"easing":"ease-in-out","fill":"both","iterations":1}] Arrow PS5 Pro Design Leak