sia.hackernoon.com

The Problem

Calling multiple LLM providers involves messy code - each provider has it’s own package and different input/output. Langchain is too bloated and doesn’t provide consistent I/O across all LLM APIs.

I remember distinctly, when we added support for Azure and Cohere on our ‘chat-with-your-data’ application. APIs can fail (e.g. Azure readtimeout errors), so we wrote a fallback strategy to iterate through a list of models in case one failed (e.g. if Azure fails, try Cohere first, OpenAI second etc.).

Provider-specific implementations meant our for-loops became increasingly large (think: multiple ~100 line if/else statements), and since made LLM API calls in multiple places in our code, our debugging problems exploded. Because now we had multiple for-loop chunks across our codebase.

The Solution: simplified LLM API calls

Abstraction. That’s when we decided to abstract our api calls behind a single class. We needed I/O that just worked, so we could spend time improving other parts of our system (error-handling/model-fallback logic, etc.).

This class needed to do 3 things really well:

Consistent I/O: Remove the need for multiple if/else statements, I can call all models the same way, and expect responses in the same format for each (including consistent exception types if it fails).
Be reliable: the class shouldn’t be the reason I drop requests in prod.
Be observable: No obscure errors. If a request did fail - what happened? and why?.

That’s when we built LiteLLM - a simple package to call Azure, Anthropic, OpenAI, Cohere and Replicate.

from litellm import completion

## set ENV variables
os.environ["OPENAI_API_KEY"] = "openai key"
os.environ["COHERE_API_KEY"] = "cohere key"

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion("command-nightly", messages)

It’s already live in production for us (and 500+ others) and has handled 50k+ queries.

LiteLLM manages:

Calling all LLM APIs using the OpenAI format - completion(model, messages)
Consistent output (incl. token usage) for all LLM APIs, text responses will always be available at ['choices'][0]['message']['content']
Consistent Exceptions for all LLM APIs, we map RateLimit, Context Window, and Authentication Error exceptions across all providers to their OpenAI equivalents. see Code

In case of error, LiteLLM also provides:

Logging - see exactly what the raw model request/response is by plugging in your own function completion(.., logger_fn=your_logging_fn) and/or print statements from the package litellm.set_verbose=True
Callbacks - automatically send your data to Sentry, Posthog, Slack, Supabase, Helicone, etc. - litellm.success_callbacks, litellm.failure_callbacks see Callbacks

Conclusion

LiteLLM simplifies calling LLM providers, with a drop-in replacement for the OpenAI ChatCompletion’s endpoint. Making it easy for you to add new models to your system in minutes (using the same exception-handling, token logic, etc. you already wrote for OpenAI).

LiteLLM: Call Every LLM API Like It's OpenAI

The Problem

The Solution: simplified LLM API calls

Conclusion