Skip to content

Granite 20B Function Calling model hallucinating the tool call #40785

@dvn8weil

Description

@dvn8weil

System Info

  • transformers version: 4.53.3
  • Platform: Linux-6.14.0-1014-gcp-x86_64-with-glibc2.39
  • Python version: 3.12.3
  • Huggingface_hub version: 0.34.4
  • Safetensors version: 0.6.2
  • Accelerate version: not installed
  • Accelerate config: not found
  • DeepSpeed version: not installed
  • PyTorch version (accelerator?): 2.7.1+cu126 (CUDA)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?: No
  • Using GPU in script?: No
  • GPU type: NVIDIA L4

Who can help?

No response

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

vllm command to run the model :

VLLM_ALLOW_LONG_MAX_MODEL_LEN=1 VLLM_USE_CUDA=1 vllm serve ibm-granite/granite-20b-functioncalling --tool-call-parser granite --enable-auto-tool-choice --tensor-parallel-size 2 --port 8000    --trust-remote-code --chat-template granite_template.jinja

with the chat-template file :

{% for message in messages %}
{% if message['role'] == 'system' %}
<|start_of_role|>system<|end_of_role|>{{ message['content'] }}<|end_of_text|>
{% elif message['role'] == 'user' %}
<|start_of_role|>user<|end_of_role|>{{ message['content'] }}<|end_of_text|>
{% elif message['role'] == 'assistant' %}
<|start_of_role|>assistant<|end_of_role|>{{ message['content'] }}<|end_of_text|>
{% elif message['role'] == 'tool' or message['role'] == 'function' %}
<|start_of_role|>tool<|end_of_role|>{{ message['content'] }}<|end_of_text|>
{% endif %}
{% endfor %}
<|start_of_role|>assistant<|end_of_role|>

curl request :

curl --location 'http://0.0.0.0:8000/v1/chat/completions' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer EMPTY' \
--data '{
  "model": "ibm-granite/granite-20b-functioncalling",
  "messages": [
    {
      "role": "system",
      "content": "When outputting a tool call, include all the arguments in the tool call token stream. Don'\''t emit partial tool calls. For instance, for the add tool call, output both arguments x and y. Please use tools as much as possible, instead of relying on your own knowledge when it makes more sense to use the tools. You may ONLY call tools that are explicitly listed in the provided tool schema. Never invent tool names or parameters that are not in the schema. If no tool is appropriate, answer the user naturally without calling a tool. When calling a tool, include all required arguments exactly as specified in the schema."
    },
    {
      "role": "user",
      "content": "tell me the weather of San Fransisco today"
    }
  ],
  "temperature": 0.3,
  "top_p": 0.9,
  "tool_choice": "auto",
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "add",
        "description": "adds two numbers",
        "parameters": {
          "type": "object",
          "properties": {
            "x": { "type": "number" },
            "y": { "type": "number" }
          },
          "required": ["x", "y"]
        }
      }
    },
    {
      "type": "function",
      "function": {
        "name": "multiply",
        "description": "multiply two numbers",
        "parameters": {
          "type": "object",
          "properties": {
            "x": { "type": "number" },
            "y": { "type": "number" }
          },
          "required": ["x", "y"]
        }
      }
    },
    {
      "type": "function",
      "function": {
        "name": "search_web",
        "description": "Search the internet for current information, news, and real-time data. MUST be used for any queries about recent events, current news, latest information, or anything that requires up-to-date data beyond the model'\''s training cutoff",
        "parameters": {
          "type": "object",
          "properties": {
            "query": {"type": "string"}
          },
          "required": ["query"]
        }
      }
    }
  ]
}'

Expected behavior

the response to the curl request:

{
    "id": "chatcmpl-6bf374155cef412383696be302539941",
    "object": "chat.completion",
    "created": 1757492261,
    "model": "ibm-granite/granite-20b-functioncalling",
    "choices": [
        {
            "index": 0,
            "message": {
                "role": "assistant",
                "content": "<function_call> {\"name\": \"get_weather\", \"arguments\": '{\"location\": \"San Fransisco\", \"date\": \"today\"}'} ",
                "refusal": null,
                "annotations": null,
                "audio": null,
                "function_call": null,
                "tool_calls": [],
                "reasoning_content": null
            },
            "logprobs": null,
            "finish_reason": "stop",
            "stop_reason": null
        }
    ],
    "service_tier": null,
    "system_fingerprint": null,
    "usage": {
        "prompt_tokens": 205,
        "total_tokens": 239,
        "completion_tokens": 34,
        "prompt_tokens_details": null
    },
    "prompt_logprobs": null,
    "kv_transfer_params": null
}

the response shows function call get_weather while no such function call exists in the schema provided in the curl request. It should have instead used the search_web function call.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions