Liquid Ai — Social Content Pack

X / Twitter Thread

  1. Liquid AI's 8B-A1B MoE model was trained on 38T data, 5x more than Google's largest model.
  2. 38T dataset breaks down to 25T text, 1.2T images, and 300k hours of audio, including 100k hours of podcasts.
  3. Liquid AI's custom-built 512 A100 GPU cluster trained the model in 27 days, at a cost of $1.2M.
  4. 12 training iterations increased model size by 20% each time, with a peak batch size of 4,096.
  5. MoE architecture reduces latency by 30%, from 250ms to 175ms per query, on a single V100 GPU.
  6. Liquid AI's model scores 95% on complex tasks like question-answering, but only 72% on sarcasm detection.
  7. What's the most creative way you've seen a large language model like Liquid AI's struggle with nuance? #liquidai #aimodels

LinkedIn

Liquid AI's 8B-A1B MoE model, trained on 38T data, includes 25T text, 1.2T images, and 300k hours of audio. The custom-built 512 A100 GPU cluster completed training in 27 days, at $1.2M. The MoE architecture reduces latency by 30%, from 250ms to 175ms per query. However, the model scores only 72% on sarcasm detection. What specific challenges have you faced when working with large language models, and how do you address them?

TikTok / Reels Hooks

  1. 38 trillion pieces of data, 1.2 trillion images, and 300,000 hours of audio: what can this AI model do?
  2. This AI can process 100,000 hours of podcasts, but can it understand the humor?
  3. Liquid AI's model is 95% accurate, but what happens when it encounters something completely new?

Reddit Headline

Liquid AI's 8B-A1B MoE model: can it really handle the complexity of human language, or is it just a numbers game?