-
Couldn't load subscription status.
- Fork 491
Open
Labels
bugSomething isn't workingSomething isn't workingreproquestion and issues related to reproducibilityquestion and issues related to reproducibility
Description
I ran the model on the BRIGHT benchmark using the following code:
import torch
import mteb
prompts_dict = {
"BrightRetrieval": "Given a Post, retrieve relevant passages that help answer the post",
}
tasks = mteb.get_tasks(tasks=["BrightRetrieval"])
evaluation = mteb.MTEB(tasks=tasks)
model = mteb.get_model(
"ReasonIR/ReasonIR-8B",
model_kwargs={"torch_dtype": torch.bfloat16},
prompts_dict=prompts_dict,
)
evaluation.run(
model,
save_predictions=True,
output_folder="results",
encode_kwargs={"batch_size": 1},
)The results are as follows:
| Bio. | Earth. | Econ. | Psy. | Rob. | Stack. | Sus. | Leet. | Pony | AoPS | TheoQ. | TheoT. | Avg. | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ReasonIR | 24.31 | 30.83 | 24.27 | 28.95 | 18.40 | 21.68 | 20.57 | 18.14 | 9.49 | 4.84 | 18.21 | 26.42 | 20.51 |
Originally posted by @whybe-choi in #3221 (comment)
Possible solution will be to create different tasks per subset.
Metadata
Metadata
Assignees
Labels
bugSomething isn't workingSomething isn't workingreproquestion and issues related to reproducibilityquestion and issues related to reproducibility
