Granite 20B Function Calling model hallucinating the tool call

### System Info

- `transformers` version: 4.53.3
- Platform: Linux-6.14.0-1014-gcp-x86_64-with-glibc2.39
- Python version: 3.12.3
- Huggingface_hub version: 0.34.4
- Safetensors version: 0.6.2
- Accelerate version: not installed
- Accelerate config: not found
- DeepSpeed version: not installed
- PyTorch version (accelerator?): 2.7.1+cu126 (CUDA)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using distributed or parallel set-up in script?: No
- Using GPU in script?: No
- GPU type: NVIDIA L4

### Who can help?

_No response_

### Information

- [ ] The official example scripts
- [ ] My own modified scripts

### Tasks

- [ ] An officially supported task in the `examples` folder (such as GLUE/SQuAD, ...)
- [ ] My own task or dataset (give details below)

### Reproduction

vllm command to run the model : 
```
VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 VLLM_USE_CUDA=1 vllm serve ibm-granite/granite-20b-functioncalling --tool-call-parser granite --enable-auto-tool-choice --tensor-parallel-size 2 --port 8000    --trust-remote-code --chat-template granite_template.jinja
```

with the chat-template file : 
```
{% for message in messages %}
{% if message['role'] == 'system' %}
<|start_of_role|>system<|end_of_role|>{{ message['content'] }}<|end_of_text|>
{% elif message['role'] == 'user' %}
<|start_of_role|>user<|end_of_role|>{{ message['content'] }}<|end_of_text|>
{% elif message['role'] == 'assistant' %}
<|start_of_role|>assistant<|end_of_role|>{{ message['content'] }}<|end_of_text|>
{% elif message['role'] == 'tool' or message['role'] == 'function' %}
<|start_of_role|>tool<|end_of_role|>{{ message['content'] }}<|end_of_text|>
{% endif %}
{% endfor %}
<|start_of_role|>assistant<|end_of_role|>
```

curl request : 
```
curl --location 'http://0.0.0.0:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer EMPTY' \
--data '{
  "model": "ibm-granite/granite-20b-functioncalling",
  "messages": [
    {
      "role": "system",
      "content": "When outputting a tool call, include all the arguments in the tool call token stream. Don'\''t emit partial tool calls. For instance, for the add tool call, output both arguments x and y. Please use tools as much as possible, instead of relying on your own knowledge when it makes more sense to use the tools. You may ONLY call tools that are explicitly listed in the provided tool schema. Never invent tool names or parameters that are not in the schema. If no tool is appropriate, answer the user naturally without calling a tool. When calling a tool, include all required arguments exactly as specified in the schema."
    },
    {
      "role": "user",
      "content": "tell me the weather of San Fransisco today"
    }
  ],
  "temperature": 0.3,
  "top_p": 0.9,
  "tool_choice": "auto",
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "add",
        "description": "adds two numbers",
        "parameters": {
          "type": "object",
          "properties": {
            "x": { "type": "number" },
            "y": { "type": "number" }
          },
          "required": ["x", "y"]
        }
      }
    },
    {
      "type": "function",
      "function": {
        "name": "multiply",
        "description": "multiply two numbers",
        "parameters": {
          "type": "object",
          "properties": {
            "x": { "type": "number" },
            "y": { "type": "number" }
          },
          "required": ["x", "y"]
        }
      }
    },
    {
      "type": "function",
      "function": {
        "name": "search_web",
        "description": "Search the internet for current information, news, and real-time data. MUST be used for any queries about recent events, current news, latest information, or anything that requires up-to-date data beyond the model'\''s training cutoff",
        "parameters": {
          "type": "object",
          "properties": {
            "query": {"type": "string"}
          },
          "required": ["query"]
        }
      }
    }
  ]
}'
```

### Expected behavior

the response to the curl request: 
```
{
    "id": "chatcmpl-6bf374155cef412383696be302539941",
    "object": "chat.completion",
    "created": 1757492261,
    "model": "ibm-granite/granite-20b-functioncalling",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "<function_call> {\"name\": \"get_weather\", \"arguments\": '{\"location\": \"San Fransisco\", \"date\": \"today\"}'} ",
                "refusal": null,
                "annotations": null,
                "audio": null,
                "function_call": null,
                "tool_calls": [],
                "reasoning_content": null
            },
            "logprobs": null,
            "finish_reason": "stop",
            "stop_reason": null
        }
    ],
    "service_tier": null,
    "system_fingerprint": null,
    "usage": {
        "prompt_tokens": 205,
        "total_tokens": 239,
        "completion_tokens": 34,
        "prompt_tokens_details": null
    },
    "prompt_logprobs": null,
    "kv_transfer_params": null
}
```

the response shows function call `get_weather` while no such function call exists in the schema provided in the curl request. It should have instead used the `search_web` function call. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Granite 20B Function Calling model hallucinating the tool call #40785

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Granite 20B Function Calling model hallucinating the tool call #40785

Description

System Info

Who can help?

Information

Tasks

Reproduction

Expected behavior

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions