The Problem

Calling multiple LLM providers involves messy code - each provider has it’s own package and different input/output. Langchain is too bloated and doesn’t provide consistent I/O across all LLM APIs.

I remember distinctly, when we added support for Azure and Cohere on our ‘chat-with-your-data’ application. APIs can fail (e.g. Azure readtimeout errors), so we wrote a fallback strategy to iterate through a list of models in case one failed (e.g. if Azure fails, try Cohere first, OpenAI second etc.).

Provider-specific implementations meant our for-loops became increasingly large (think: multiple ~100 line if/else statements), and since made LLM API calls in multiple places in our code, our debugging problems exploded. Because now we had multiple for-loop chunks across our codebase.

The Solution: simplified LLM API calls

Abstraction. That’s when we decided to abstract our api calls behind a single class. We needed I/O that just worked, so we could spend time improving other parts of our system (error-handling/model-fallback logic, etc.).

This class needed to do 3 things really well:

That’s when we built LiteLLM - a simple package to call Azure, Anthropic, OpenAI, Cohere and Replicate.

from litellm import completion

## set ENV variables
os.environ["OPENAI_API_KEY"] = "openai key"
os.environ["COHERE_API_KEY"] = "cohere key"

messages = [{ "content": "Hello, how are you?","role": "user"}]

# openai call
response = completion(model="gpt-3.5-turbo", messages=messages)

# cohere call
response = completion("command-nightly", messages)

It’s already live in production for us (and 500+ others) and has handled 50k+ queries.

LiteLLM manages:

In case of error, LiteLLM also provides:

Conclusion

LiteLLM simplifies calling LLM providers, with a drop-in replacement for the OpenAI ChatCompletion’s endpoint. Making it easy for you to add new models to your system in minutes (using the same exception-handling, token logic, etc. you already wrote for OpenAI).