-
Notifications
You must be signed in to change notification settings - Fork 97
docs: reasoning quickstart #110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
docs: reasoning quickstart #110
Conversation
✅ Deploy Preview for vllm-semantic-router ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
👥 vLLM Semantic Team NotificationThe following members have been identified for the changed files in this PR and have been automatically assigned: 📁
|
|
can refer to it: https://docusaurus.io/zh-CN/docs/next/migration/v3#common-mdx-problems |
Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
Signed-off-by: Jintao Zhang <zhangjintao9020@gmail.com>
6668536
to
927f1bc
Compare
|
||
# Map concrete model names to a reasoning family | ||
model_config: | ||
"deepseek-v3": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: deepseek-v31 (since v3 is not a reasoning model)
"deepseek-v3": | ||
reasoning_family: "deepseek" | ||
preferred_endpoints: ["endpoint1"] | ||
"qwen3-7b": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: qwen3-30b (which is a reasoning model)
score: 1.0 | ||
- model: deepseek-v3 | ||
score: 0.8 | ||
- model: qwen3-7b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qwen3-30b
model_scores: | ||
- model: openai/gpt-oss-20b | ||
score: 1.0 | ||
- model: deepseek-v3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deepseek-v31
- model: qwen3-7b | ||
score: 0.8 | ||
|
||
- name: general |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at the moment, all the categories must be from mmlu-pro. There is no general
there. You can create an issue to support generic, free style categories and we can map the mmlu-pro categories to them.
score: 0.8 | ||
|
||
# A safe default when no category is confidently selected | ||
default_model: qwen3-7b |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qwen3-30b
@tao12345666333 thanks for writing this up! Some nit but most looks good to me. |
- A model only gets reasoning fields if it has a model_config.<MODEL>.reasoning_family that maps to a reasoning_families entry. | ||
- DeepSeek/Qwen3 (chat_template_kwargs): the router injects chat_template_kwargs only when reasoning is enabled. When disabled, no chat_template_kwargs are added. | ||
- GPT/GPT-OSS (reasoning_effort): when reasoning is enabled, the router sets reasoning_effort based on the category (fallback to default_reasoning_effort). When reasoning is disabled, if the request already contains reasoning_effort and the model’s family type is reasoning_effort, the router preserves the original value; otherwise it is absent. | ||
- For more stable classification, you can add category descriptions in config and keep them semantically distinctive. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rootfs It seems like OpenAIRouter.CategoryDescriptions
Category.ReasoningDescription
Category.Description
aren't used in the code (i.e., don't affect performance) currently. Will we use it in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch! the reasoning description is no-op, information only.
Option B: Docker Compose | ||
- docker compose up -d | ||
- Exposes Envoy at http://localhost:8801 (proxying /v1/* to backends via the router) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think #73 (review) can be satisfied here?
What type of PR is this?
docs: reasoning quickstart
What this PR does / why we need it:
Which issue(s) this PR fixes:
Fixes #51
Release Notes: Yes/No