Quick start
Best way to learn redlite api is to run few scripts. Lets do it!
Overview
- Install
redliteand dependencies - Write your script
- Load dataset from HuggingFace
- Define your own model
- Define your own metric
- Call the
runfunction
- Run the benchmark
- Review the results
Installation
python3.11 -m venv .venv
. .venv/bin/activate
pip install redlite[all]
Note: we've chosen to install all dependencies for simplicity. In production we advise
to load only necessary components, to avoid bloat and transitive dependency conflicts.
Write your script
Create a *.py file.
You may check samples for inspiration.
Here we will write the one from scratch.
Load dataset from HuggingFace
First, import load_dataset function and call it to download a dataset:
from redlite import load_dataset
dataset = load_dataset('hf:innodatalabs/rt-factcc')
This loads the dataset from https://huggingface.co/datasets/innodatalabs/rt-factcc.
Define your model
We will create a simple model that always says "Hello, humans! I am alive!", regardless of the
context.
from redlite import NamedModel
def happy(messages):
return "Hello, humans! I am alive!"
model = NamedModel('happy', happy)
We first defined a function that takes the conversation (which is a list of messages), and produces the response string.
Then we created a model and gave it name "happy", and passed our function as the second argument.
Note: it is important to be disciplined when naming the models. Analytical tools of redlite identify models by their names. If two different models have the same name grouping and score aggregations will be messed up.
Define your metric
Metric is a function that takes expected string and actual response and grades the response against the
expected one, returning a number from 0.0 (bad) to 1.0 (great).
from redlite import NamedMetric
def score(expected, actual):
if expected == actual:
return 1.0
if 'happy' in actual:
return 0.5
return 0.0
metric = NamedMetric('simple-metric', score)
We first defined a function that computes the score.
Then we created a metric object with name "simple-metric" and passed scoring function as second argument.
Note: it is important to be disciplined when naming your metrics. Make sure that metric name is unique. Just like with model naming, analytical tools consider name as metric identity. Having two different metrics use the same name will bring havoc into the analytis.
Call the run function
Finaly, we take dataset, model and metric and pass them to the run() function.
Here is the complete script:
from redlite import load_dataset, NamedModel, NamedMetric
dataset = load_dataset('hf:innodatalabs/rt-factcc')
def happy(messages):
return "Hello, humans! I am alive!"
model = NamedModel('happy', happy)
def score(expected, actual):
if expected == actual:
return 1.0
if 'happy' in actual:
return 0.5
return 0.0
metric = NamedMetric('simple-metric', score)
run(
model=model,
dataset=dataset,
metric=metric,
)
Run the benchmark
To run the benchmark just execute the script.
Assuming that we named script file my_script.py, here is the command:
python my_script.py
You should see it running. Since the model is pretty much fake, and metric computation is very light, the benchmark will finish in few seconds.
You may get the following output on your terminal screen:
RedLite run forward-coordinator-1:
model : happy
dataset: hf:innodatalabs/rt-factcc
metric : simple-metric
100%|█████████████████████| 100/100 [00:00<00:00, 382.20it/s]
Smile! All done!
Review the results
redlite server
This command will start server on port 8000. Open your browser and navigate to http://localhost:8000.
You should now see the UI.
Advanced: Re-scoring existing runs
If you want to apply a different metric to an existing run, use rescore function.
Consider this scenario: you ran benchmark on your dataset. It was long and/or expensive run. The name of that
run is, say, "hello-gai-42".
Now, you want to see how the same answers will be scored by a different metric.
This can be efficiently done like this:
from redlite import rescore
metric = MyNewExcitingMetric()
rescore(
run="hello-gai-42",
metric=metric,
)
This will create a new run from the existing one, re-computing all scores.