-
Notifications
You must be signed in to change notification settings - Fork 30.3k
Open
Labels
Description
System Info
transformers
version: 4.53.3- Platform: Linux-6.14.0-1014-gcp-x86_64-with-glibc2.39
- Python version: 3.12.3
- Huggingface_hub version: 0.34.4
- Safetensors version: 0.6.2
- Accelerate version: not installed
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.7.1+cu126 (CUDA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: No
- Using GPU in script?: No
- GPU type: NVIDIA L4
Who can help?
No response
Information
- The official example scripts
- My own modified scripts
Tasks
- An officially supported task in the
examples
folder (such as GLUE/SQuAD, ...) - My own task or dataset (give details below)
Reproduction
vllm command to run the model :
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 VLLM_USE_CUDA=1 vllm serve ibm-granite/granite-20b-functioncalling --tool-call-parser granite --enable-auto-tool-choice --tensor-parallel-size 2 --port 8000 --trust-remote-code --chat-template granite_template.jinja
with the chat-template file :
{% for message in messages %}
{% if message['role'] == 'system' %}
<|start_of_role|>system<|end_of_role|>{{ message['content'] }}<|end_of_text|>
{% elif message['role'] == 'user' %}
<|start_of_role|>user<|end_of_role|>{{ message['content'] }}<|end_of_text|>
{% elif message['role'] == 'assistant' %}
<|start_of_role|>assistant<|end_of_role|>{{ message['content'] }}<|end_of_text|>
{% elif message['role'] == 'tool' or message['role'] == 'function' %}
<|start_of_role|>tool<|end_of_role|>{{ message['content'] }}<|end_of_text|>
{% endif %}
{% endfor %}
<|start_of_role|>assistant<|end_of_role|>
curl request :
curl --location 'http://0.0.0.0:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer EMPTY' \
--data '{
"model": "ibm-granite/granite-20b-functioncalling",
"messages": [
{
"role": "system",
"content": "When outputting a tool call, include all the arguments in the tool call token stream. Don'\''t emit partial tool calls. For instance, for the add tool call, output both arguments x and y. Please use tools as much as possible, instead of relying on your own knowledge when it makes more sense to use the tools. You may ONLY call tools that are explicitly listed in the provided tool schema. Never invent tool names or parameters that are not in the schema. If no tool is appropriate, answer the user naturally without calling a tool. When calling a tool, include all required arguments exactly as specified in the schema."
},
{
"role": "user",
"content": "tell me the weather of San Fransisco today"
}
],
"temperature": 0.3,
"top_p": 0.9,
"tool_choice": "auto",
"tools": [
{
"type": "function",
"function": {
"name": "add",
"description": "adds two numbers",
"parameters": {
"type": "object",
"properties": {
"x": { "type": "number" },
"y": { "type": "number" }
},
"required": ["x", "y"]
}
}
},
{
"type": "function",
"function": {
"name": "multiply",
"description": "multiply two numbers",
"parameters": {
"type": "object",
"properties": {
"x": { "type": "number" },
"y": { "type": "number" }
},
"required": ["x", "y"]
}
}
},
{
"type": "function",
"function": {
"name": "search_web",
"description": "Search the internet for current information, news, and real-time data. MUST be used for any queries about recent events, current news, latest information, or anything that requires up-to-date data beyond the model'\''s training cutoff",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string"}
},
"required": ["query"]
}
}
}
]
}'
Expected behavior
the response to the curl request:
{
"id": "chatcmpl-6bf374155cef412383696be302539941",
"object": "chat.completion",
"created": 1757492261,
"model": "ibm-granite/granite-20b-functioncalling",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "<function_call> {\"name\": \"get_weather\", \"arguments\": '{\"location\": \"San Fransisco\", \"date\": \"today\"}'} ",
"refusal": null,
"annotations": null,
"audio": null,
"function_call": null,
"tool_calls": [],
"reasoning_content": null
},
"logprobs": null,
"finish_reason": "stop",
"stop_reason": null
}
],
"service_tier": null,
"system_fingerprint": null,
"usage": {
"prompt_tokens": 205,
"total_tokens": 239,
"completion_tokens": 34,
"prompt_tokens_details": null
},
"prompt_logprobs": null,
"kv_transfer_params": null
}
the response shows function call get_weather
while no such function call exists in the schema provided in the curl request. It should have instead used the search_web
function call.