Skip to main content

Quickstart

Predibase provides the fastest way to fine-tune and serve open-source LLMs. It's built on top of open-source LoRAX.

  • Fine-tuning: Fine-tune and serve a model in just a few steps using the SDK or UI
  • Shared endpoints: Try the Python SDK or the Web Playground to prompt serverless endpoints for quick iteration and prototyping
  • Production-ready private serverless inference: Deploy your base model to serve an unlimited number of adapters using the SDK or UI

Deploy and prompt a model in 6 lines of code

Enterprise and VPC customers can deploy private serverless endpoints using the following six lines of code. Shared endpoints are available for all users.

pip install predibase
python
>>> from predibase import Predibase, DeploymentConfig
>>> pb = Predibase(api_token=<TOKEN>)
>>> pb.deployments.create(name='my-model', config=DeploymentConfig(base_model="mistral-7b"))
>>> print(pb.deployments.client('my-model').generate("What is a Large Language Model?", max_new_tokens=50).generated_text)
# "Large Language Models (LLMs) are a type of artificial intelligence (AI)..."

Read on for more details!

Run inference using the SDK or REST

  1. Create an account here.
  2. Navigate to the Settings page and click Generate API Token.
  3. Setup a venv and install the Python SDK (if running locally - not required for Google Collab or similar):
python3.9 -m venv .venv
source .venv/bin/activate
pip install -U predibase
  1. See available shared deployments. (Note: Enterprise customers can deploy a private serverless deployment, and VPC customers are required to.)
from predibase import Predibase, FinetuningConfig, DeploymentConfig

pb = Predibase(api_token="<PREDIBASE API TOKEN>")

# Optionally get a list of available models by calling pb.deployments.list()
lorax_client = pb.deployments.client("mistral-7b-instruct-v0-2") # Insert deployment name here
resp = lorax_client.generate("[INST] What are some popular tourist spots in San Francisco? [/INST]")
print(resp.generated_text)
info

Note the explicit use of special tokens before and after the prompt. These are used with instruction- and chat-tuned models to improve response quality. See Instruction Templates for details on how these should be applied for each of the serverless model endpoints.

Streaming

from predibase import Predibase, FinetuningConfig, DeploymentConfig

pb = Predibase(api_token="<PREDIBASE API TOKEN>")

lorax_client = pb.deployments.client("mistral-7b-instruct-v0-2") # Insert deployment name here

for resp in lorax_client.generate_stream("[INST] What are some popular tourist spots in San Francisco? [/INST]"):
if not resp.token.special:
print(resp.token.text, sep="", end="", flush=True)

Next steps

  • Try out the full example to fine-tune and prompt an adapter in Predibase using the SDK
  • Don't want to code at all? Use the UI to connect a dataset and start fine-tuning an adapter.
  • Coming from OpenAI? Check out our migration guides for serving
  • Explore additional complete examples
  • See how you Predibase integrates with other frameworks in the ecosystem

Get in touch

Reach out to us at support@predibase.com or join us on Discord for any questions, comments, or feedback!