Run 100B+ language models at home, BitTorrent‑style

  • Run large language models like BLOOM-176B
    collaboratively — you load a small part of the model, then team up with people serving the other parts
    to run inference or fine-tuning.
  • Single-batch inference runs at ≈ 1 sec per step (token) —
    up to 10x faster than offloading, enough for
    chatbots and other interactive apps.
    Parallel inference reaches hundreds of tokens/sec.
  • Beyond classic language model APIs —
    you can employ any fine-tuning and sampling methods, execute custom paths through the model, or see its hidden states.
    You get the comforts of an API with the flexibility of PyTorch.

Join our Discord
or subscribe via email

to follow Petals development:

We sent you an email to confirm your address. Click it and you’re in!

Featured on:

This project is a part of the BigScience research workshop.

Read More