Built-in Models
HuggingFace hub
Out-of the box we support models implemented with the transformers framework and hosted on HuggingFace model hub.
Here is how to work with HuggingFace models:
from redlite.model.hf_model import HFModel
model = HFModel('mistralai/Mistral-7B-Instruct-v0.2', device_map='auto')
Please see Reference documentation for more detail and available parameters.
OpenAI Conversational Models
OpenAI conversational models (such as gpt-3.5 or gpt-4) are available. Note that one needs a valid
OpenAI api key to do the prediction with these models.
Here is how to use OpenAI models:
from redlite.model.openai_model import OpenAIModel
model = OpenAIModel(...)
Please see Reference documentation for more detail and available parameters.
Google Gemini models
A family of Google Gemini models is available. Please make sure that you have enabled generative API access in Google Cloud console. You also need to create an API key, see https://aistudio.google.com/u/1/apikey.
Here is how to use a Gemini model:
from redlite.model.gemini_model import GeminiModel
model = GeminiModel(...)
Please see Reference documentation for more detail and available parameters.
Model servers compatible with OpenAI API
Use OpenAIModel to access third-party services that are compatible with OpenAI API (for example, NVIDIA research).
Set the endpoint parameter base_url to point
Here is how to use OpenAI API to access third-party servies:
from redlite.model.openai_model import OpenAIModel
base_uri = "https://integrate.api.nvidia.com/v1"
model = OpenAIModel(base_uri=base_uri, model="meta/llama3-instruct", ...)
Please see Reference documentation for more detail and available parameters.
Anthropic Chat Models
Anthropic conversational models (e.g. Claude) are available. Note that one needs a valid Anthropic api key to do the prediction with these models.
Here is how to use Anthropic models:
from redlite.model.anthropic_model import AnthropicModel
model = AnthropicModel(...)
Please see Reference documentation for more detail and available parameters.
AWS Bedrock Text Generation Models
Use AwsBedrockModel to access models hosted on AWS Bedrock. Note that one needs a valid AWS key pair.
Here is how to use AWS Bedrock models:
from redlite.model.aws_bedrock_model import AwsBedrockModel
model = AwsBedrockModel(...)
Please see Reference documentation for more detail and available parameters.
LlamaCpp Models
[https://github.com/ggerganov/llama.cpp]
Use LlamaCppModel to access models runnable by 'llama-cpp' inference engine. For example, models in GGUF format. Due to highly optimized
inference path, many smaller models can be evaluated on CPU with reasonable performance.
Here is how to use LlamaCppModel class:
from redlite.model.llamacpp_model import LlamaCppModel
model = LlamaCppModel('models/mistral-instruct-7b-Q4-K-M.gguf', n_ctx=512, max_tokens=512)
Please see Reference documentation for more detail and available parameters.
Dumb models for testing
ParrotModel
A model that parrots back the last user message. Useful to establish performance baselines.
from redlite.model import ParrotModel
model = ParrotModel()
assert model([{"role": "user", "content": "Hello"}]) == "Hello"
CannedModel
A model that returns the same (canned) response regardless of user input. Useful to establish performance baselines.
from redlite.model import CannedModel
model = CannedModel("Bye")
assert model([{"role": "user", "content": "Hello"}]) == "Bye"
Model Wrappers
See the Model Wrappers documentation for utility classes that wrap models to modify their behavior.
Custom models
Custom models can be easily integrated, see the Customization Guide.