This guide shows you how to add inline citations to Knowledge Graph responses and customize query behavior using the query_config parameter. This allows you to get source-verified responses with inline citations and tune your Knowledge Graph queries for specific use cases.

Response with inline citations

Below is an example of a response from a Knowledge Graph that includes inline citations:
Acme Corp's flagship product line includes three
main categories: industrial tools, consumer
electronics, and automotive parts. The industrial
tools division offers precision manufacturing
equipment with advanced automation capabilities
[Acme-Product-Catalog.pdf](a1b2c3d4-e5f6-7890-abcd-ef1234567890).

Corresponding references array

Below is the corresponding references object that contains a direct snippet from the source file that was used to support the response:
{"files": [
  {
    "text": "Industrial Tools Division: Our precision manufacturing equipment features advanced automation capabilities with real-time monitoring systems that reduce operational downtime by up to 40% through predictive maintenance algorithms.",
    "fileId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "score": 0.95,
    "page": 12,
    "cite": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
  }
]}

Understand query configuration

The query_config parameter allows you to add inline citations to Knowledge Graph responses and fine-tune how Knowledge Graphs search, rank, and retrieve content. Inline citations show which specific sources support each part of the response, enabling you to verify information and trace claims back to their origins. You can use this parameter with:
  • Chat completions with Knowledge Graph tools
  • Direct Knowledge Graph queries via the /v1/graphs/question endpoint

Example

Here’s a chat completion example that enables inline citations in Knowledge Graph responses. The inline_citations: true parameter adds source references directly in the response text, while other parameters control search behavior:
  • inline_citations: true - Enables inline citations showing which sources support each part of the response
  • grounding_level: 0.2 - Keeps responses closely tied to source material (20% creative interpretation allowed)
  • search_weight: 60 - Balances keyword and semantic search (60% keyword search, 40% semantic search)
  • keyword_threshold: 0.6 - Requires 60% keyword match for content to be included
  • semantic_threshold: 0.8 - Requires 80% semantic similarity for content to be included
curl --location --request POST 'https://api.writer.com/v1/chat' \
  --header "Authorization: Bearer $WRITER_API_KEY" \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "model": "palmyra-x5",
    "messages": [
      {
        "role": "user",
        "content": "What are the key features of our product?"
      }
    ],
    "tools": [
      {
        "type": "graph",
        "function": {
          "description": "Search company knowledge base",
          "graph_ids": ["<GRAPH_ID>"],
          "subqueries": true,
          "query_config": {
            "grounding_level": 0.2,
            "search_weight": 60,
            "keyword_threshold": 0.6,
            "semantic_threshold": 0.8,
            "inline_citations": true
          }
        }
      }
    ]
  }'

Configure request parameters

NameTypeRangeDefaultDescription
inline_citationsBooleanTrue/FalseFalseWhether to include inline citations within the response text. See Inline citations for details.
max_subquestionsInteger1-106Maximum number of sub-questions to generate when processing complex queries. Set higher to improve detail, set lower to reduce response time. See Max sub-questions for details.
search_weightInteger0-10050Controls the balance between keyword and semantic search in ranking results. See Search weight for details.
grounding_levelNumber0.0-1.00.0Controls how closely responses must match to source material. Set lower for grounded outputs, higher for creativity. See Grounding level for details.
max_snippetsInteger5-25 (recommended)30Maximum number of text snippets to retrieve from the Knowledge Graph for context. Works in concert with search_weight to control best matches vs broader coverage. Note: While technically supports 1-60, values below 5 may return no results due to RAG implementation. Recommended range is 5-25. See Max snippets for details.
max_tokensInteger100-80004000Maximum number of tokens the model can generate in the response. See Max tokens for details.
keyword_thresholdNumber0.0-1.00.7Threshold for keyword-based matching when searching Knowledge Graph content. Set higher for stricter relevance, lower for broader range. See Keyword threshold for details.
semantic_thresholdNumber0.0-1.00.7Threshold for semantic similarity matching when searching Knowledge Graph content. Set higher for stricter relevance, lower for broader range. See Semantic threshold for details.

Parameter details

Inline citations

If you enable inline_citations, inline citations in the response text correspond to entries in the references array. Each citation contains a file name and a cite ID that you can use to locate the specific source.

Correlate citations with references

Inline citations in the response text correspond to entries in the references array. Each citation uses the cite field from the references array as its identifier. If you turn off inline citations, the cite field is null. Citation format: citations appear as [filename.pdf](cite) where the cite value matches the cite field from the references array. Example correlation: Here’s an example showing how inline citations correlate with the references array: Response text:
Acme Corp's flagship product line includes three main categories: industrial tools, consumer electronics, and automotive parts. The industrial tools division offers precision manufacturing equipment with advanced automation capabilities [Acme-Product-Catalog.pdf](a1b2c3d4-e5f6-7890-abcd-ef1234567890). These tools feature real-time monitoring and predictive maintenance systems that reduce downtime by up to 40% [Acme-Product-Catalog.pdf](a1b2c3d4-e5f6-7890-abcd-ef1234567890).
References object:
{"files": [
  {
    "text": "Industrial Tools Division: Our precision manufacturing equipment features advanced automation capabilities with real-time monitoring systems that reduce operational downtime by up to 40% through predictive maintenance algorithms.",
    "fileId": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
    "score": 0.95,
    "page": 12,
    "cite": "a1b2c3d4-e5f6-7890-abcd-ef1234567890"
  },
  {
    "text": "Consumer Electronics: Smart thermostat series with AI-powered learning algorithms that automatically adapt to user behavior patterns, optimizing energy consumption and comfort levels.",
    "fileId": "b2c3d4e5-f6g7-8901-bcde-f23456789012",
    "score": 0.88,
    "page": 25,
    "cite": "b2c3d4e5-f6g7-8901-bcde-f23456789012"
  }
]}
Citation mapping:
  • Citation [Acme-Product-Catalog.pdf](a1b2c3d4-e5f6-7890-abcd-ef1234567890) uses the cite field from the first reference object
  • Citation [Acme-Product-Catalog.pdf](b2c3d4e5-f6g7-8901-bcde-f23456789012) uses the cite field from the second reference object

Max sub-questions

Maximum number of sub-questions to generate when processing complex queries. Higher values allow the system to break down complex questions into more detailed sub-queries. How it works:
  • Higher values: Improve detail by generating more sub-questions for thorough analysis
  • Lower values: Reduce response time by generating fewer sub-questions
When to adjust:
  • Increase for complex, multi-part questions that need thorough analysis
  • Decrease for simple, direct questions to reduce processing time

Search weight

Controls the balance between keyword and semantic search in ranking results. How it works:
  • Higher values (closer to 100): Prioritize keyword-based matching
  • Lower values (closer to 0): Prioritize semantic similarity matching
When to adjust:
  • Increase for searches where exact keyword matches matter most
  • Decrease for searches where conceptual similarity is more important

Grounding level

Controls how closely responses must be tied to source material. This is different from typical LLM temperature parameters - it specifically controls grounding to Knowledge Graph sources. How it works:
  • Higher values: Higher creativity - allow more creative interpretation of source material
  • Lower values (closer to 0.0): Grounded outputs - stick closely to source material with minimal creativity
Examples:
  • 0.0: “According to the documentation, the API supports JSON responses” (direct quote/paraphrase)
  • 0.5: “The API documentation indicates that JSON responses are supported, which suggests additional capabilities” (interpretive)
  • 1.0: “Based on the available information, users can expect JSON responses, though other formats might be possible” (highly interpretive)
When to adjust:
  • Increase for higher creativity when you want more interpretive responses
  • Decrease for grounded outputs when factual reporting or accuracy is critical

Max snippets

Maximum number of text snippets to retrieve from the Knowledge Graph for context. Works in concert with search_weight to control best matches vs broader coverage. How it works:
  • Lower values (5-15): Best matches - retrieve fewer, more relevant snippets
  • Higher values (15-25): Broader coverage - retrieve more snippets for comprehensive context
  • Values below 5: May return no results due to RAG implementation limitations
Important notes:
  • Recommended range: 5-25 (default is 30, which is higher than recommended)
  • Edge case: Due to RAG system behavior, you may see more snippets than requested. Use the recommended range unless you have a measured need to change it.
When to adjust:
  • Increase for broader coverage when you need comprehensive research or more context
  • Decrease for best matches when you want focused queries or concise results

Max tokens

Maximum number of tokens the model can generate in the response. This controls the length of the AI’s answer. How it works:
  • Higher values: Allow longer, more detailed responses
  • Lower values: Generate shorter, more concise responses
When to adjust:
  • Increase for detailed analysis or comprehensive answers
  • Decrease for quick summaries or when you need faster responses

Keyword threshold

Threshold for keyword-based matching when searching Knowledge Graph content. How it works:
  • Higher values: Stricter relevance - require stronger keyword matches
  • Lower values: Broader range - allow more lenient keyword matching
When to adjust:
  • Increase for stricter relevance when you want very precise keyword matches
  • Decrease for broader range when you want to include content with related but not exact keywords

Semantic threshold

Threshold for semantic similarity matching when searching Knowledge Graph content. How it works:
  • Higher values: Stricter relevance - require stronger semantic similarity
  • Lower values: Broader range - allow more lenient semantic matching
Examples:
  • 0.9: Very strict - only content with high semantic similarity (for example, searching “user authentication” only finds content about “user authentication”, “login”, “sign-in”)
  • 0.7: Moderate - includes conceptually related content (for example, searching “user authentication” finds “authentication”, “security”, “access control”, “user management”)
  • 0.3: Lenient - includes tangentially related content (for example, searching “user authentication” finds “authentication”, “security”, “user management”, “database”, “API”, “web development”)
When to adjust:
  • Increase for stricter relevance when you want very semantically similar content
  • Decrease for broader range when you want to include tangentially related content

Parameter interactions and performance

How parameters work together

Some parameters interact in ways that affect both results and performance:
  • max_snippets and search_weight: Work together to control best matches vs broader coverage. max_snippets controls how many snippets are fed to the LLM in RAG.
  • max_snippets and max_tokens: max_snippets controls input context size, while max_tokens controls output response length.
  • keyword_threshold and semantic_threshold: Both filters are applied - content must pass both thresholds to be included.
  • Search weight and thresholds: Higher thresholds reduce the candidate pool, then search weight determines the balance between keyword and semantic ranking of remaining results.

Search weight vs semantic threshold

These two parameters work at different stages of the search process:
  • Semantic threshold is applied during the initial search phase, before ranking
  • Search weight is applied after filtering, when ranking the remaining results
Semantic threshold acts as a gatekeeper that determines which content gets included, while search weight controls the balance between keyword and semantic scoring in final ranking. Example: with semantic_threshold = 0.8 and search_weight = 30, you get a small set of very relevant documents ranked more by semantic similarity than keyword matching.

Performance considerations

  • Higher max_tokens: Increases processing time and cost
  • Lower thresholds: May return more results but with lower relevance
  • Higher max_subquestions: Increases processing time for complex queries
  • inline_citations: true: Minimal performance impact, slight increase in response size
For most use cases, use the default values and adjust only if you need specific behavior. The defaults are designed to work well for general Knowledge Graph queries.

Response format

Knowledge Graph responses include a references array with different structures depending on the source type. The array contains file objects for file-based sources and web objects for web-based sources.

File sources (references.file objects)

FieldTypeDescription
textstringThe actual text snippet from the source document that supports the response
fileIdstringUnique identifier for the source file
scorenumberInternal score used during the retrieval process for ranking and selecting relevant snippets
pageintegerPage number where the snippet was found in the source document
citestringUnique identifier used in inline citations within the response text

Web sources (references.web objects)

FieldTypeDescription
textstringThe actual text snippet from the web page that supports the response
urlstringURL of the source web page
titlestringTitle of the web page
scorenumberInternal score used during the retrieval process for ranking and selecting relevant snippets
citestringUnique identifier used in inline citations within the response text

Usage examples

Chat completions with Knowledge Graph tool

curl --location --request POST 'https://api.writer.com/v1/chat' \
  --header "Authorization: Bearer $WRITER_API_KEY" \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "model": "palmyra-x5",
    "messages": [
      {
        "role": "user",
        "content": "What are the key features of our product?"
      }
    ],
    "tools": [
      {
        "type": "graph",
        "function": {
          "description": "Search company knowledge base",
          "graph_ids": ["<GRAPH_ID>"],
          "subqueries": true,
          "query_config": {
            "grounding_level": 0.2,
            "search_weight": 60,
            "keyword_threshold": 0.6,
            "semantic_threshold": 0.8,
            "inline_citations": true
          }
        }
      }
    ]
  }'

Direct Knowledge Graph query

curl --location --request POST 'https://api.writer.com/v1/graphs/question' \
  --header "Authorization: Bearer $WRITER_API_KEY" \
  --header 'Content-Type: application/json' \
  --data-raw '{
    "graph_ids": ["<GRAPH_ID>"],
    "question": "What are the key features of our product?",
    "query_config": {
      "grounding_level": 0.2,
      "keyword_threshold": 0.6,
      "semantic_threshold": 0.8,
      "inline_citations": true
    }
  }'

Common configuration patterns

Research and analysis

Use this configuration for comprehensive research tasks where you need thorough analysis with source verification. Higher sub-questions and snippets provide more context, while inline citations help track sources.
{
  "max_subquestions": 8,
  "search_weight": 60,
  "grounding_level": 0.1,
  "max_snippets": 50,
  "max_tokens": 7000,
  "keyword_threshold": 0.6,
  "semantic_threshold": 0.7,
  "inline_citations": true
}

Quick answers

Use this configuration for fast, focused responses where speed and precision matter more than comprehensive analysis. Strict thresholds ensure high relevance with minimal processing time.
{
  "max_subquestions": 3,
  "search_weight": 80,
  "grounding_level": 0.0,
  "max_snippets": 15,
  "max_tokens": 2000,
  "keyword_threshold": 0.8,
  "semantic_threshold": 0.8,
  "inline_citations": false
}

Creative content generation

Use this configuration when you want the AI to interpret and build upon source material creatively. Lower thresholds allow more diverse content, while higher grounding level enables interpretive responses.
{
  "max_subquestions": 6,
  "search_weight": 40,
  "grounding_level": 0.6,
  "max_snippets": 30,
  "max_tokens": 5000,
  "keyword_threshold": 0.5,
  "semantic_threshold": 0.6,
  "inline_citations": false
}

Best practices

  1. Start with defaults: Begin with the default configuration and adjust based on your specific needs
  2. Test incrementally: Change one parameter at a time to understand its effect
  3. Consider your use case: Different applications, like research, Q&A, and content generation, benefit from different configurations
  4. Monitor performance: Track how different configurations affect response quality and processing time
  5. Balance precision and recall: Higher thresholds give more precise results but may miss relevant content

Next steps