Claude Tool Use คู่มือ AI Agent 2026

Q: ถ้าตั้ง tool_choice เป็น any แล้ว Claude จะตอบเป็นข้อความได้ไหม?

ไม่ได้. tool_choice: "any" บังคับให้โมเดล emit tool_use block อย่างน้อยหนึ่งตัวเสมอ ถ้าคุณต้องการให้โมเดลเลือกได้ระหว่างเรียก tool กับตอบเอง ใช้ auto และถ้าต้องการให้สรุปผลเป็นข้อความหลัง loop ให้ตั้ง none ใน request สุดท้าย ข้อจำกัดเพิ่มคือ any และ tool ใช้ร่วมกับ extended thinking ไม่ได้ จะ error 400

Q: ควรเริ่ม build agent ด้วยโมเดลตัวไหนเพื่อ optimize cost?

เริ่มที่ claude-haiku-4-5 สำหรับ prototype และ orchestration ที่เรียก tool บ่อย ถ้า eval ไม่ผ่านค่อย upgrade เป็น claude-sonnet-4-6 ซึ่งบาลานซ์คุณภาพและราคาดีที่สุดในไลน์ปัจจุบัน เก็บ claude-opus-4-8 ไว้สำหรับ node ที่ต้อง reasoning หนักหรือมี tool catalog ใหญ่จริงๆ การกระโดดไป Opus ตั้งแต่แรกโดยไม่มี eval มักทำให้ค่าใช้จ่ายบาน 5-10 เท่าโดยคุณภาพเพิ่มไม่ชัด

Q: ถ้า agent loop วนไม่จบ ควร debug จากตรงไหนก่อน?

เริ่มจาก log stop_reason ทุกรอบและดูว่ามันค้างที่ tool_use เพราะอะไร. ส่วนใหญ่เกิดจาก tool คืน is_error ซ้ำๆ โดยไม่มี recovery, หรือ description ของ tool บอกให้ "retry until success" โดยไม่มีเงื่อนไขหยุด แก้โดยตั้ง MAX_ITERS, ใส่ system prompt ให้ตัดสินใจ escalate หลังพยายาม N ครั้ง และตรวจให้แน่ว่า tool_result content เป็น string หรือ JSON ที่ valid ไม่ใช่ object ดิบ

อัปเดต: 31 พฤษภาคม 2026

Claude tool use (หรือ function calling) คือกลไกที่ทำให้โมเดล Claude ตัดสินใจ "เรียกใช้เครื่องมือ" ภายนอก เช่น API, ฐานข้อมูล, หรือฟังก์ชันใน codebase ของเรา ผ่านการประกาศ schema ในพารามิเตอร์ tools แล้วโมเดลจะส่ง tool_use block กลับมาให้เรารัน จากนั้นเราส่ง tool_result กลับเข้า loop จนกว่าจะได้คำตอบสุดท้าย ในปี 2026 ฟีเจอร์นี้คือหัวใจของการสร้าง AI agent production บน Claude Opus 4.8, Sonnet 4.6 และ Haiku 4.5 และผมจะอธิบายแบบเน้นการประเมินผลจริง ไม่ใช่แค่ prompt แล้วหวังว่ามันจะทำงาน

Claude tool use ขับเคลื่อนด้วย agentic loop ที่ตรวจ stop_reason: "tool_use" แล้ว echo tool_result กลับ จนกว่าจะเจอ end_turn, pause_turn, max_tokens, stop_sequence หรือ refusal
โมเดลปัจจุบันที่รองรับ tool use คือ claude-opus-4-8 (28 พ.ค. 2026, context 1M), claude-sonnet-4-6 (17 ก.พ. 2026, 1M), และ claude-haiku-4-5-20251001 ส่วน Sonnet 4 และ Opus 4 ตัวเก่าจะ retire 15 มิ.ย. 2026
Fine-grained tool streaming, programmatic tool calling, tool_search และ structured outputs (strict) เป็น GA แล้วในปี 2026 ไม่ต้องใช้ beta header
การเปลี่ยน tool definition แม้แค่ description จะ invalidate prompt cache ทั้ง tools+system+messages ส่วน tool_choice เปลี่ยนอย่างเดียวจะ invalidate แค่ messages cache
tool_choice: "any" และ {type:"tool"} ใช้ร่วมกับ extended thinking ไม่ได้ จะ error 400 ทันที

Claude tool use คืออะไรและต่างจาก MCP อย่างไร

Claude tool use คือฟีเจอร์ที่ทำให้โมเดล "ขอเรียก" ฟังก์ชันที่เราประกาศไว้ในพารามิเตอร์ tools ของ Messages API แทนที่จะตอบเป็นข้อความอย่างเดียว มันต่างจาก MCP (Model Context Protocol) ตรงที่ tool use คือ contract ระหว่างโมเดลกับ client ของเรา ส่วน MCP คือโปรโตคอลฝั่ง server ที่ห่อ tool หลายๆ ตัวให้แชร์กันได้ระหว่างแอป เวลา Claude เรียก MCP server จริงๆ มันก็ยังใช้ tool use เป็นกลไกเบื้องหลัง แค่ถูก wrap ใน mcp_tool_use block แทน

จากที่ผมเทสมา การแยกสองคำนี้ออกจากกันสำคัญมาก เพราะทีมที่ผมเคยช่วยมักสับสนว่า "ถ้าใช้ MCP แล้วต้องเรียน tool use อีกไหม" คำตอบคือต้องเข้าใจ tool use ก่อน เพราะ schema, tool_choice, agentic loop, การจัดการ stop_reason ทุกอย่างยังเหมือนเดิม ส่วน MCP แค่เปลี่ยนวิธี deliver tool catalog เข้ามา ถ้าคุณยังนึกภาพ loop ไม่ออก ลองอ่าน เอกสาร How tool use works ฉบับทางการ ก่อน แล้วค่อยกลับมาอ่านส่วน MCP ในบทความนี้

โมเดล Claude ที่รองรับ tool use ในปี 2026 ควรเลือกตัวไหน

ณ พฤษภาคม 2026 โมเดลหลักที่ผมแนะนำให้ใช้สำหรับ production agent มีสามตัว ได้แก่ claude-opus-4-8 (เปิดตัว 28 พ.ค. 2026, context 1M, output 128k), claude-sonnet-4-6 (17 ก.พ. 2026, context 1M) และ claude-haiku-4-5-20251001 (alias claude-haiku-4-5, ตั้งแต่ 15 ต.ค. 2025) ส่วนรุ่นเก่าที่ยังใช้ได้ เช่น claude-opus-4-7, claude-opus-4-6, claude-opus-4-5-20251101, claude-opus-4-1-20250805, claude-sonnet-4-5-20250929 แต่ Sonnet 4 และ Opus 4 รุ่น 20250514 จะ retire วันที่ 15 มิ.ย. 2026 ดังนั้นถ้ายังไม่ migrate ก็ควรรีบ

หลักการเลือกของผมคือ Haiku 4.5 สำหรับ orchestration node ที่ต้องเรียก tool บ่อยแต่ตรรกะไม่ซับซ้อน (เช่น classify intent แล้ว route), Sonnet 4.6 สำหรับ agent ทั่วไปที่ต้องคิดเป็นลำดับขั้นและจัดการ tool 5-15 ตัว, และ Opus 4.8 เมื่อเอเจนต์ต้องวางแผนยาว มี tool catalog ใหญ่ หรือมีงาน reasoning ที่เคยพังกับ Sonnet มาก่อน อย่าเริ่มที่ Opus เลยถ้ายังไม่ทำ eval เพราะ overhead system prompt ของ tool use ยังต่างกันตามโมเดล (Opus 4.7 ใช้ 675/804 token, Sonnet 4.6 ใช้ 497/589, Haiku 4.5 ใช้ 496/588 สำหรับ auto/none vs any/tool ตามลำดับ) ซึ่งสะสมเป็นค่าใช้จ่ายเร็วกว่าที่คิด

โครงสร้าง tools, input_schema และ tool_choice ที่ต้องเข้าใจ

แต่ละ entry ใน tools ต้องมี name (ตรง regex ^[a-zA-Z0-9_-]{1,64}$), description (plaintext), input_schema (JSON Schema object) และมี field เสริม เช่น input_examples, cache_control, strict, defer_loading, allowed_callers ส่วน tool_choice มีสี่โหมด ได้แก่ auto (ค่า default เมื่อมี tools), any (บังคับให้เรียก tool อย่างน้อยหนึ่งตัว), tool พร้อม name (บังคับเครื่องมือเฉพาะ) และ none (ค่า default ถ้าไม่มี tools, เพิ่ม 27 ก.พ. 2025 และยังใช้ได้แม้มี tool_use/tool_result block ใน context)

Anthropic แนะนำชัดเจนว่าให้เขียน description ยาว 3-4 ประโยคและใส่ input_examples เพราะ description สั้นๆ บรรทัดเดียวคือสาเหตุอันดับหนึ่งของการ hallucinate parameter โดยเฉพาะกับ Haiku ตารางด้านล่างผมสรุปสิ่งที่ tool_choice แต่ละโหมดทำได้และทำไม่ได้ เพื่อให้เห็นภาพก่อนเขียน code

tool_choice	พฤติกรรม	ใช้กับ extended thinking	เคสที่ผมใช้บ่อย
`auto`	โมเดลตัดสินเองว่าจะเรียก tool หรือตอบตรงๆ	ได้	conversational agent ทั่วไป
`any`	บังคับให้เรียก tool อย่างน้อยหนึ่งตัว (เลือกตัวไหนก็ได้)	ไม่ได้ (error 400)	structured extraction ที่ต้องได้ JSON เสมอ
`{"type":"tool","name":"..."}`	บังคับเรียก tool ที่ระบุ	ไม่ได้ (error 400)	step ที่รู้แน่ว่าต้องเรียกอันไหน
`none`	ห้ามเรียก tool ตอบเป็นข้อความเท่านั้น	ได้	step สรุปผลตอน loop จบ
`disable_parallel_tool_use:true`	จำกัด tool_use 1 block ต่อ turn	ขึ้นกับ tool_choice หลัก	เมื่อ tool มี side effect ที่ต้อง serialize

วิธีเขียน agentic loop ที่ไม่หลุดและไม่วนไม่จบ

หัวใจของ Claude agent คือ while-loop ที่อ่าน stop_reason ของ response. ถ้าเท่ากับ "tool_use" ให้ดึงทุก tool_use block ออกมา รันฟังก์ชันจริง แล้วส่ง user message กลับเข้าไปใหม่ โดยใส่ {type:"tool_result", tool_use_id, content, is_error?} หนึ่ง block ต่อหนึ่งการเรียก ทำซ้ำจนกว่าจะเจอ end_turn, max_tokens, stop_sequence, refusal หรือเฉพาะ server tool pause_turn ซึ่งต้องส่ง conversation เดิมกลับเพื่อให้ Anthropic ทำต่อ ไม่ใช่ treat เป็น end_turn

โค้ดด้านล่างเป็น loop ขั้นต่ำที่ผมใช้เป็น template สำหรับงานจริง รวมการป้องกัน loop ไม่จบด้วย hard cap iterations และการ log stop_reason ทุกรอบเพื่อ debug ทีหลัง (สำหรับงานที่ต้อง retrieval ฉลาดขึ้น ผมแนะนำให้อ่านเรื่อง agentic RAG กับ AI agent เพิ่ม)

import anthropic, json

client = anthropic.Anthropic()

# ประกาศ tool: description ยาวพอให้โมเดลรู้ว่าควรเรียกเมื่อไหร่
tools = [{
    "name": "get_order_status",
    "description": (
        "ดึงสถานะคำสั่งซื้อจากระบบ ERP โดยใช้เลข order_id แบบ 8 หลัก "
        "ใช้เมื่อผู้ใช้ถามถึงสถานะการจัดส่ง, การคืนเงิน, หรือ tracking number "
        "ห้ามเรียกถ้ายังไม่มี order_id ที่ผู้ใช้ระบุชัดเจน"
    ),
    "input_schema": {
        "type": "object",
        "properties": {
            "order_id": {"type": "string", "pattern": "^[0-9]{8}$"}
        },
        "required": ["order_id"]
    },
    "input_examples": [{"order_id": "12345678"}]
}]

def run_tool(name, args):
    # ในระบบจริงตรงนี้จะเรียก ERP API
    if name == "get_order_status":
        return {"status": "shipped", "tracking": "TH9001234567"}
    raise ValueError(f"unknown tool: {name}")

messages = [{"role": "user", "content": "ขอเช็คสถานะออเดอร์ 12345678 หน่อย"}]
MAX_ITERS = 8  # hard cap กัน loop ไม่จบ

for step in range(MAX_ITERS):
    resp = client.messages.create(
        model="claude-sonnet-4-6",
        max_tokens=1024,
        tools=tools,
        tool_choice={"type": "auto"},
        messages=messages,
    )
    print(f"[step {step}] stop_reason={resp.stop_reason}")
    messages.append({"role": "assistant", "content": resp.content})

    if resp.stop_reason != "tool_use":
        break  # จบ loop ที่ end_turn, max_tokens, refusal ฯลฯ

    tool_results = []
    for block in resp.content:
        if block.type == "tool_use":
            try:
                output = run_tool(block.name, block.input)
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": json.dumps(output, ensure_ascii=False),
                })
            except Exception as e:
                tool_results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": str(e),
                    "is_error": True,
                })
    messages.append({"role": "user", "content": tool_results})
else:
    raise RuntimeError("agent loop exceeded MAX_ITERS")

จุดที่ผมเน้นเสมอเวลา review code agent ของคนอื่นคือ (1) ตรวจ stop_reason แบบ explicit ไม่ใช่เช็คแค่ว่ามี tool_use block ไหม, (2) มี MAX_ITERS เสมอ, และ (3) จัดการ error ของ tool เป็น is_error: true ไม่ใช่ throw exception ออกจาก loop เพราะโมเดลจะไม่ได้เรียนรู้ว่าเรียกผิด

Prompt caching กับ tools วาง cache_control ตรงไหน

เวลาทำ agent ที่ tool catalog ใหญ่ การไม่ทำ cache คือการเผาเงิน. Anthropic กำหนด cache prefix order ชัดเจนว่า tools → system → messages แปลว่าควรวาง cache_control: {"type": "ephemeral"} บน tool ตัวสุดท้ายของ array เพื่อ cache ทั้ง block tools ทั้งหมด แต่จุดที่คนมักพลาดคือ การเปลี่ยน tool definition (แม้แค่แก้ description ทีละคำ) จะ invalidate cache ทั้งสามชั้น ในขณะที่การเปลี่ยน tool_choice อย่างเดียวจะ invalidate แค่ messages cache เท่านั้น ผมเคยเจอเคสที่ทีมหนึ่งเขียน Go แล้ว JSON key order ไม่คงที่ ทำให้ cache miss 100% โดยไม่รู้ตัว (debug อยู่หลายชั่วโมงกว่าจะเจอ)

1-hour TTL เป็น GA แล้ว และตั้งแต่ 19 ก.พ. 2026 automatic caching ก็เปิดให้ใช้โดยไม่ต้องตั้งค่าเอง แต่ผมยังแนะนำให้ใส่ cache_control ชัดๆ เพื่อควบคุม breakpoint (สูงสุด 4 จุด) สำหรับรายละเอียดเชิงลึกเรื่องค่าใช้จ่ายและการคำนวณ break-even ผมเขียนแยกไว้ในบทความ prompt caching ของ Claude API ลองอ่านควบคู่กับเอกสาร Prompt caching ทางการของ Anthropic

import anthropic

client = anthropic.Anthropic()

# tool ใหญ่ๆ หลายตัว ใส่ cache_control บนตัวสุดท้ายเพื่อ cache ทั้ง block
tools = [
    {"name": "search_kb", "description": "...", "input_schema": {...}},
    {"name": "create_ticket", "description": "...", "input_schema": {...}},
    {"name": "lookup_user", "description": "...", "input_schema": {...}},
    {
        "name": "escalate_to_human",
        "description": "ส่งเรื่องต่อให้ทีม support เมื่อเคสซับซ้อนเกิน automation",
        "input_schema": {
            "type": "object",
            "properties": {"reason": {"type": "string"}},
            "required": ["reason"],
        },
        "cache_control": {"type": "ephemeral"},  # cache ครอบทุก tool ข้างบน
    },
]

resp = client.messages.create(
    model="claude-opus-4-8",
    max_tokens=2048,
    tools=tools,
    tool_choice={"type": "auto"},
    system=[{
        "type": "text",
        "text": "คุณคือ support agent ระดับ tier-1...",
        "cache_control": {"type": "ephemeral"},
    }],
    messages=[{"role": "user", "content": "เปิด ticket เรื่อง login fail หน่อย"}],
)

# ดู cache hit ผ่าน usage
print(resp.usage)  # cache_creation_input_tokens, cache_read_input_tokens

การ stream tool call และ input_json_delta

เวลา stream tool use ผ่าน SSE คุณจะเห็น event ลำดับนี้: content_block_start ที่มี type=tool_use และ input ว่าง, ตามด้วย input_json_delta หลายๆ ครั้งที่มี partial_json (ต้องเอามาต่อกันเอง) และจบด้วย content_block_stop ข่าวดีคือ fine-grained tool streaming เป็น GA ตั้งแต่ 5 ก.พ. 2026 ไม่ต้องส่ง header fine-grained-tool-streaming-2025-05-14 อีกแล้ว. ฟีเจอร์นี้ทำให้ delta เป็น JSON ที่ valid ทีละชิ้น parse ได้ก่อนจะรับครบ ช่วยลด latency ใน UI ที่ต้องโชว์ progress ของ agent (ผมเจอเองตอนทำ agent UI ครั้งล่าสุด ลด time-to-first-token ที่ผู้ใช้รับรู้ได้เกือบครึ่ง)

แต่ระวัง ถ้าไม่เปิด fine-grained streaming หรือใช้ SDK เวอร์ชันเก่า คุณ ต้อง สะสม partial_json ทุก delta แล้วค่อย json.loads ตอน content_block_stop เท่านั้น ไม่งั้นจะเจอ JSONDecodeError ตลอด

import anthropic, json

client = anthropic.Anthropic()

with client.messages.stream(
    model="claude-haiku-4-5",
    max_tokens=1024,
    tools=tools,  # tools เดียวกับตัวอย่างก่อนหน้า
    messages=[{"role": "user", "content": "เช็คออเดอร์ 12345678"}],
) as stream:
    accum = {}  # เก็บ partial json ต่อ tool_use block
    for event in stream:
        if event.type == "content_block_start" and event.content_block.type == "tool_use":
            accum[event.index] = {"name": event.content_block.name, "buf": ""}
        elif event.type == "content_block_delta" and event.delta.type == "input_json_delta":
            accum[event.index]["buf"] += event.delta.partial_json
        elif event.type == "content_block_stop" and event.index in accum:
            entry = accum[event.index]
            args = json.loads(entry["buf"]) if entry["buf"] else {}
            print(f"tool={entry['name']} args={args}")

    final = stream.get_final_message()
    print("stop_reason:", final.stop_reason)

Claude tool use vs MCP เมื่อไหร่ใช้อะไร

ถ้าคุณเขียน tool เองในแอปเดียว ใช้ tools ปกติพอ. มันเร็ว, debug ง่าย, ควบคุม latency ได้เต็มที่. แต่ถ้า tool catalog ของคุณต้อง share ระหว่างหลาย client (เช่น web app, IDE plugin, Slack bot ใช้ tool ชุดเดียวกัน) MCP คือคำตอบที่ถูก. Anthropic เปิด MCP connector ผ่าน beta header mcp-client-2025-11-20 ปัจจุบัน (header เก่า mcp-client-2025-04-04 deprecated แล้ว) request จะใส่ mcp_servers: [{type:"url", url, name, authorization_token?}] ควบคู่กับ tools: [{type:"mcp_toolset", mcp_server_name, default_config?, configs?}] และ response จะมี block ชื่อ mcp_tool_use / mcp_tool_result พร้อม server_name

ผมเทียบให้เห็นง่ายๆ คือ tool use เปรียบเหมือน "ฟังก์ชันที่ฝังในแอป" ส่วน MCP คือ "service contract ระหว่างแอป". สอง pattern นี้ไม่ทดแทนกัน คุณใช้ทั้งคู่ใน request เดียวได้ ถ้าอยากเริ่มสร้าง MCP server เองอ่านได้จากบทความเรื่อง การสร้าง MCP server ด้วย Python ที่ผมเขียนแยกไว้ ซึ่งจะลงรายละเอียดเรื่อง schema ฝั่ง server ที่บทความนี้ไม่ครอบคลุม

Production pitfalls คือ hallucinated tool, infinite loop และ schema drift

หลังจาก ship Claude agent มาหลายตัว ผมเก็บ pitfall ที่พบบ่อยที่สุดได้ห้าข้อ คือ (1) loop ไม่จบ เพราะไม่เช็ค stop_reason explicit, โซลูชันคือ MAX_ITERS hard cap และ track total tool tokens, (2) hallucinated tool เมื่อ description สั้นเกินไป, ใส่ 3-4 ประโยคและ input_examples เสมอ, (3) cache miss แบบเงียบ เพราะ tool serialization ไม่ stable, (4) บังคับ tool ผิดบริบท เช่นใช้ tool_choice: "any" กับ extended thinking, และ (5) treat pause_turn เป็น end_turn ทำให้คำตอบจาก server tool ถูกตัดกลางคัน. ผมโดนข้อ (5) มาเองตอน ship feature ค้นเว็บแรกๆ debug อยู่สองวันเต็มกว่าจะเข้าใจ

สำหรับ tool catalog ใหญ่กว่า 30-40 ตัว ใช้ tool_search (GA 17 ก.พ. 2026) ให้โมเดลค้น tool ที่เกี่ยวข้องเองแทนที่จะโยนทั้งหมดเข้า context, ประหยัด token หลักพันต่อ request และยังลด hallucination ด้วย. ส่วนงานที่ต้อง orchestrate tool หลายตัวพร้อมข้อมูล intermediate ใหญ่ ผมจะเปิด programmatic tool calling ให้ Claude เขียน code ใน sandbox เรียก tool เอง ผลลัพธ์ระหว่างทางจะไม่ไหลกลับเข้า model context เลย ลด token cost ได้ชัดเจน ดูประกาศ GA ใน Claude Platform release notes

วิธีทำ AI agent ด้วย Claude tool use สำหรับ production

สิ่งที่ผมจะไม่ยอม ship agent โดยไม่มีคือ (1) eval harness ที่มี case อย่างน้อย 50 เคสจริงพร้อม expected tool calls, ถ้าไม่มีนี่ก็แค่ demo, (2) strict tool use (strict: true) บน tool ที่มี side effect พร้อม tool_choice: {type:"any"} เพื่อ guarantee schema match, (3) token budget ต่อเคส และ alerting เมื่อ p95 ทะลุ, (4) structured logging ทุก stop_reason และทุก tool input/output พร้อม tool_use_id เพื่อ replay ทีหลัง, (5) circuit breaker สำหรับ tool ที่ external ล้มเหลวต่อเนื่อง

การออกแบบ tool boundary ก็สำคัญไม่แพ้กัน. ผมพยายามไม่ทำ tool แบบ "do_everything" แต่จะแยกเป็น tool เล็กๆ ที่ description ชัด รับ input ไม่เกิน 4-5 field และคืน output ที่ schema นิ่ง เพราะ tool ใหญ่ทำให้ Claude ตัดสินใจผิดบ่อย และ debug ยาก. สำหรับ server tool อย่าง web_search, web_fetch, code_execution, tool_search อย่าลืมว่าทุกตัวอาจคืน pause_turn ได้ ต้องวน loop ส่ง conversation เดิมกลับให้ Anthropic ทำต่อ ไม่ใช่ความผิดพลาด. สุดท้ายอย่าลืมว่าโมเดล Anthropic-schema client tool (bash, text_editor, computer, memory) มี schema เฉพาะที่ปรับเอง ไม่ต้องประกาศ JSON Schema ใหม่ ใช้ทับด้วย type-name มาตรฐาน รายละเอียดอ่านจาก เอกสาร Tool use overview ของ Anthropic

คำถามที่พบบ่อย

Claude function calling ต่างจาก OpenAI function calling อย่างไรในปี 2026?

ทั้งสองเจ้าใช้ JSON Schema เป็นภาษากลางสำหรับประกาศ tool แต่ Claude แยก tool_use และ tool_result เป็น content block ภายใน messages ทำให้ trace การเรียก tool ง่ายกว่า และมี stop_reason ที่ครอบคลุมกว่า (เช่น pause_turn สำหรับ server tool, refusal สำหรับ safety) นอกจากนี้ Claude มี server tool ฝั่ง Anthropic อย่าง web_search, code_execution, tool_search ที่ไม่ต้อง host เอง ส่วน prompt caching ของ Claude ครอบ tools block ตรงๆ ซึ่ง OpenAI ยังไม่มี equivalent ที่ทำงานเหมือนกัน

ถ้าตั้ง tool_choice เป็น any แล้ว Claude จะตอบเป็นข้อความได้ไหม?

ไม่ได้. tool_choice: "any" บังคับให้โมเดล emit tool_use block อย่างน้อยหนึ่งตัวเสมอ ถ้าคุณต้องการให้โมเดลเลือกได้ระหว่างเรียก tool กับตอบเอง ใช้ auto และถ้าต้องการให้สรุปผลเป็นข้อความหลัง loop ให้ตั้ง none ใน request สุดท้าย ข้อจำกัดเพิ่มคือ any และ tool ใช้ร่วมกับ extended thinking ไม่ได้ จะ error 400

ยังต้องส่ง fine-grained-tool-streaming header อยู่ไหมในปี 2026?

ไม่ต้อง. fine-grained tool streaming เป็น GA ตั้งแต่ 5 ก.พ. 2026 ใช้ผ่าน Messages API streaming ตามปกติได้เลย แต่ถ้าใช้ SDK เวอร์ชันเก่ามากๆ ก็อัปเดตเป็น anthropic Python SDK รุ่นล่าสุดเพื่อให้ event type ใหม่ทำงานถูกต้อง

ควรเริ่ม build agent ด้วยโมเดลตัวไหนเพื่อ optimize cost?

เริ่มที่ claude-haiku-4-5 สำหรับ prototype และ orchestration ที่เรียก tool บ่อย ถ้า eval ไม่ผ่านค่อย upgrade เป็น claude-sonnet-4-6 ซึ่งบาลานซ์คุณภาพและราคาดีที่สุดในไลน์ปัจจุบัน เก็บ claude-opus-4-8 ไว้สำหรับ node ที่ต้อง reasoning หนักหรือมี tool catalog ใหญ่จริงๆ การกระโดดไป Opus ตั้งแต่แรกโดยไม่มี eval มักทำให้ค่าใช้จ่ายบาน 5-10 เท่าโดยคุณภาพเพิ่มไม่ชัด

ถ้า agent loop วนไม่จบ ควร debug จากตรงไหนก่อน?

เริ่มจาก log stop_reason ทุกรอบและดูว่ามันค้างที่ tool_use เพราะอะไร. ส่วนใหญ่เกิดจาก tool คืน is_error ซ้ำๆ โดยไม่มี recovery, หรือ description ของ tool บอกให้ "retry until success" โดยไม่มีเงื่อนไขหยุด แก้โดยตั้ง MAX_ITERS, ใส่ system prompt ให้ตัดสินใจ escalate หลังพยายาม N ครั้ง และตรวจให้แน่ว่า tool_result content เป็น string หรือ JSON ที่ valid ไม่ใช่ object ดิบ