Hybrid Retrieval

CoordiNode's defining capability: graph traversal, vector similarity, and full-text search in a single atomic query. No application-side join. No cross-database round-trip.

The Problem with Separate Databases

A typical AI retrieval stack looks like:

Graph DB query → get related node IDs
Vector DB query (Pinecone / Weaviate) → filter by embedding similarity
Application code → intersect result sets
Hope that nothing changed between steps (no transactional guarantee)

This is slow, error-prone, and expensive. Any data written between steps 1 and 2 may create inconsistencies.

One Query, Three Modalities

cypher

MATCH (topic:Concept {name: "machine learning"})-[:RELATED_TO*1..2]->(related)
MATCH (related)<-[:ABOUT]-(doc:Document)
WHERE vector_distance(doc.embedding, $query_vec) < 0.4
  AND text_match(doc.body, "transformer attention")
RETURN doc.title, vector_distance(doc.embedding, $query_vec) AS relevance
ORDER BY relevance LIMIT 5

This query:

Graph traversal — follows RELATED_TO edges 1-2 hops from "machine learning"
Vector filter — keeps only documents whose embedding is within L2 distance 0.4 of $query_vec
Full-text filter — keeps only documents whose body contains "transformer" or "attention"
Ranks and returns — sorted by vector relevance, top 5

All three filters apply within a single MVCC snapshot — consistent by construction.

Extension Functions

Function	Description	Example
`vector_distance(a, b)`	L2 (Euclidean) distance	`WHERE vector_distance(n.emb, $q) < 0.5`
`vector_similarity(a, b)`	Cosine similarity (higher = more similar)	`ORDER BY vector_similarity(n.emb, $q) DESC`
`vector_dot(a, b)`	Dot product	`RETURN vector_dot(n.emb, $q) AS score`
`vector_manhattan(a, b)`	Manhattan distance	`WHERE vector_manhattan(n.emb, $q) < 2.0`
`text_match(prop, "query")`	Full-text BM25 match	`WHERE text_match(n.body, "attention transformer")`
`text_score(prop, "query")`	BM25 relevance score (0.0–1.0)	`RETURN text_score(n.body, "attention") AS score`
`point.distance(p1, p2)`	Haversine distance in metres	`WHERE point.distance(n.location, $pt) < 1000`

See Cypher Extensions for the full syntax reference including spatial predicates and encrypted search.

Execution Plan

CoordiNode's query planner decides the most selective filter to apply first:

If a vector index exists → apply vector filter early (ANN pre-filter)
If a text index exists → apply BM25 filter in tandem
Graph traversal limits the candidate set before applying modality filters

Use EXPLAIN to inspect the chosen plan:

bash

curl -X POST http://localhost:7081/v1/query/cypher/explain \
  -H "Content-Type: application/json" \
  -d '{"query": "MATCH ...", "parameters": {}}'

Practical Pattern: GraphRAG

A common retrieval pattern for AI applications:

cypher

-- Find documents semantically related to a query,
-- then expand the graph context around each document.
MATCH (doc:Document)
WHERE vector_distance(doc.embedding, $query_vec) < 0.5
WITH doc ORDER BY vector_distance(doc.embedding, $query_vec) LIMIT 20

-- Expand graph neighbourhood for richer context
MATCH (doc)-[:CITES|AUTHORED_BY|TAGGED_WITH*1..2]->(context)
RETURN doc.title, doc.body, collect(context) AS graph_context

The retrieved documents plus their graph neighbourhood form the context window for an LLM prompt — richer than pure vector retrieval, cheaper than fetching entire subgraphs blindly.

Next Step

Data Model — how nodes carry vector and text properties
MVCC Transactions — consistency guarantees
Cypher Extensions — full syntax reference

Hybrid Retrieval ​

The Problem with Separate Databases ​

One Query, Three Modalities ​

Extension Functions ​

Execution Plan ​

Practical Pattern: GraphRAG ​

Next Step ​