# Ensemble Reasoning

Ensemble Reasoning is a new methodology to join LLMs that Reason with an ensemble of modules that can further the reasoning process. These modules include other supporting LLMs, machine learning models (like GNNs), Knowledge Graph Query services, and others. Ensemble Reasoning dramatically improves A.I. Agents in speed, cost, and capability by moving much of "tool use" into the reasoning process of the LLM.

An article which discusses Ensemble Reasoning at a high level: <https://blog.vital.ai/2025/01/13/agents-and-ensemble-reasoning/>

The implementation of the core functionality is found in the github repository:\
<https://github.com/vital-ai/vital-llm-reasoner>

A deployable server incorporating the core functionality is found in the github repository:\
<https://github.com/vital-ai/vital-llm-reasoner-server>

The server can be deployed as a Docker container in an ARM Linux environment that includes NVIDIA GPU(s).

The current implementation uses the QwQ 32B Preview Model:\
<https://huggingface.co/Qwen/QwQ-32B-Preview>

Other reasoning models will be supported going forward.

The reasoning model inference uses vLLM or Llama.cpp and this server infrastructure is used to serve the Ensemble.

Reasoning tokens are consumed in a streaming context from the primary reasoning model and piped to the ensemble.  Tokens produced by the ensemble are streamed back into the primary model and appended to the current reasoning trace to further the reasoning process.

There many be highly specific dependencies on CUDA-specific functions, versions of vLLM/Llama.cpp, versions of PyTorch, or the specific reasoning model version in order to support manipulating the token inference stream of the primary model, especially in a performant way.

Ensemble members (aka "ensemble tools") are being implemented in the core github repository.  These implementations generally are wrappers/connectors to tools such as KGraphService for knowledge graph queries.

Currently in development is a framework in the core github repository to manage the requests to the ensemble tools that optimizes the flow of information back into the primary model without stalling the inference.  This may involve reordering the ensemble requests and optimizing the reasoner prompts to arrive at a JIT framework which delivers knowledge to the reasoner just as the reasoner requires it.  Wherever possible the ensemble members should be running entirely in the same container as the primary model, utilizing a cache of off-container data (including warming/pre-populating it), and/or have highly optimized query requests if the queries go outside the container over the network.

Ensemble Members

* KGraphService via [KGraphLang](/knowledge-graph/kgraphlang.md)

&#x20;


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.vital.ai/agent/ensemble-reasoning.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
