Skip to content

Quick start

Best way to learn redlite api is to run few scripts. Lets do it!

Overview

  1. Install redlite and dependencies
  2. Write your script
    • Load dataset from HuggingFace
    • Define your own model
    • Define your own metric
    • Call the run function
  3. Run the benchmark
  4. Review the results

Installation

python3.11 -m venv .venv
. .venv/bin/activate
pip install redlite[all]

Note: we've chosen to install all dependencies for simplicity. In production we advise to load only necessary components, to avoid bloat and transitive dependency conflicts.

Write your script

Create a *.py file. You may check samples for inspiration.

Here we will write the one from scratch.

Load dataset from HuggingFace

First, import load_dataset function and call it to download a dataset:

from redlite import load_dataset

dataset = load_dataset('hf:innodatalabs/rt-factcc')

This loads the dataset from https://huggingface.co/datasets/innodatalabs/rt-factcc.

Define your model

We will create a simple model that always says "Hello, humans! I am alive!", regardless of the context.

from redlite import NamedModel

def happy(messages):
    return "Hello, humans! I am alive!"

model = NamedModel('happy', happy)

We first defined a function that takes the conversation (which is a list of messages), and produces the response string.

Then we created a model and gave it name "happy", and passed our function as the second argument.

Note: it is important to be disciplined when naming the models. Analytical tools of redlite identify models by their names. If two different models have the same name grouping and score aggregations will be messed up.

Define your metric

Metric is a function that takes expected string and actual response and grades the response against the expected one, returning a number from 0.0 (bad) to 1.0 (great).

from redlite import NamedMetric

def score(expected, actual):
    if expected == actual:
        return 1.0
    if 'happy' in actual:
        return 0.5
    return 0.0

metric = NamedMetric('simple-metric', score)

We first defined a function that computes the score.

Then we created a metric object with name "simple-metric" and passed scoring function as second argument.

Note: it is important to be disciplined when naming your metrics. Make sure that metric name is unique. Just like with model naming, analytical tools consider name as metric identity. Having two different metrics use the same name will bring havoc into the analytis.

Call the run function

Finaly, we take dataset, model and metric and pass them to the run() function.

Here is the complete script:

from redlite import load_dataset, NamedModel, NamedMetric

dataset = load_dataset('hf:innodatalabs/rt-factcc')

def happy(messages):
    return "Hello, humans! I am alive!"

model = NamedModel('happy', happy)

def score(expected, actual):
    if expected == actual:
        return 1.0
    if 'happy' in actual:
        return 0.5
    return 0.0

metric = NamedMetric('simple-metric', score)

run(
    model=model,
    dataset=dataset,
    metric=metric,
)

Run the benchmark

To run the benchmark just execute the script. Assuming that we named script file my_script.py, here is the command:

python my_script.py

You should see it running. Since the model is pretty much fake, and metric computation is very light, the benchmark will finish in few seconds.

You may get the following output on your terminal screen:

RedLite run forward-coordinator-1:
        model  : happy
        dataset: hf:innodatalabs/rt-factcc
        metric : simple-metric
100%|█████████████████████| 100/100 [00:00<00:00, 382.20it/s]
Smile! All done!

Review the results

redlite server

This command will start server on port 8000. Open your browser and navigate to http://localhost:8000.

You should now see the UI.

Advanced: Re-scoring existing runs

If you want to apply a different metric to an existing run, use rescore function.

Consider this scenario: you ran benchmark on your dataset. It was long and/or expensive run. The name of that run is, say, "hello-gai-42". Now, you want to see how the same answers will be scored by a different metric.

This can be efficiently done like this:

from redlite import rescore

metric = MyNewExcitingMetric()

rescore(
    run="hello-gai-42",
    metric=metric,
)

This will create a new run from the existing one, re-computing all scores.