ClusterService

Cluster management operations.

Proto source

admin/cluster.proto

Transport

All methods are available via:

gRPC on port 7080 — native high-performance API
REST/JSON on port 7081 — HTTP/JSON transcoding via embedded structured-proxy

Methods

GetClusterStatus

Get the current cluster status.

HTTP: GET /v1/admin/cluster/status

Request: GetClusterStatusRequest

Empty request (no fields).

Response: ClusterStatus

Field	Type	Description
`nodes`	`ClusterNode[]`	—
`leader_id`	`string`	Node ID of the current Raft leader. Empty string if no leader elected.
`raft_term`	`uint64`	Current Raft term. Monotonically increasing across leader elections.

JoinNode

Initiate a cluster join lifecycle for a new node. Adds the target node as a Learner (non-voting) and starts a background monitor that promotes it to Voter once replication lag drops below READINESS_LAG_THRESHOLD (1000 entries). Must be called on the leader. Returns immediately after the Learner is added. Use JoinProgress to stream phase events until COMPLETE or FAILED.

HTTP: POST /v1/admin/cluster/join

Request: JoinNodeRequest

Field	Type	Description
`node_id`	`uint64`	Numeric node ID to assign. Must be unique across the cluster. The target node must already be running and listening on address.
`address`	`string`	gRPC address of the new node in "host:port" format. The leader will replicate log entries to this address.
`pre_seeded`	`bool`	If true, the new node has pre-seeded data from an offline snapshot backup. Used as a hint: the leader logs "Tier 3 skip expected" in the initial progress event. Actual Tier-3 skip behavior runs on the joining node side.

Response: JoinNodeResponse

Field	Type	Description
`node_id`	`uint64`	Numeric node ID of the joining node (echoed from request).
`status`	`string`	Human-readable status. Currently always "JOIN_INITIATED".

JoinProgress

Stream join progress events for a node join in progress. Returns a server-streaming response of JoinStatus messages until the join reaches COMPLETE or FAILED phase. The stream closes automatically after either terminal phase is emitted. Returns NOT_FOUND if JoinNode was not called for this node_id or if the join has already completed.

Transport: Server-streaming gRPC only (no HTTP transcoding)

Request: JoinProgressRequest

Field	Type	Description
`node_id`	`uint64`	Node ID whose join lifecycle to observe (must match a JoinNode call).

Response: stream JoinStatus

Field	Type	Description
`node_id`	`uint64`	Node ID of the joining node.
`phase`	`JoinPhase`	Current join phase.
`lag_entries`	`uint64`	Number of Raft log entries behind the leader. Zero when unknown or complete.
`percent`	`uint32`	Estimated completion percentage (0–100). 100 means complete.
`message`	`string`	Human-readable status message for display in CLI --follow output.

AddNode

Add a node to the cluster (low-level, auto-assigned ID). Deprecated: use JoinNode with an explicit node_id and address instead. This RPC is retained for API compatibility but returns UNIMPLEMENTED.

HTTP: POST /v1/admin/cluster/nodes

Request: AddNodeRequest

Field	Type	Description
`address`	`string`	—

Response: ClusterNode

Field	Type	Description
`node_id`	`string`	Numeric node ID as string (matches --id flag in CLI).
`address`	`string`	gRPC address of the node (host:port). May be empty for follower self-report.
`role`	`NodeRole`	—
`state`	`NodeState`	—
`lag_entries`	`uint64`	Number of Raft log entries this node is behind the leader. Zero for the leader itself. Used for read routing staleness checks.

RemoveNode

Remove a node from the cluster.

HTTP: DELETE /v1/admin/cluster/nodes/{node_id}

Request: RemoveNodeRequest

Field	Type	Description
`node_id`	`string`	—

Response: RemoveNodeResponse

Empty response (no fields).

DecommissionNode

Gracefully decommission a node from the cluster. Implements the Phase 0-2 decommission protocol: Phase 0: Quorum gate — verify removing node_id still leaves ≥ 2 voters. Aborts with FAILED_PRECONDITION if quorum would be lost. Phase 1: Leadership transfer — if node_id is the current Raft leader, leadership is transferred to a peer before removal. For self-decommission (leader removing itself), the request is internally forwarded to the new leader after transfer. Phase 2: Membership remove — change_membership(remove: node_id). Node stops receiving log replication and is removed from quorum. CE behaviour: single Raft group, all nodes hold full data. Pruning (Phase 3) is advisory — operator must delete data on the decommissioned node manually. Must be called on the current Raft leader, or on the node being decommissioned when that node is the leader (self-decommission path). Use force=true for emergency removal of an unreachable node (may lose data).

HTTP: POST /v1/admin/cluster/decommission

Request: DecommissionNodeRequest

Field	Type	Description
`node_id`	`uint64`	Numeric node ID to decommission. Must be in the current voter set. The node will be removed from Raft membership after quorum and drain checks pass.
`pruning`	`bool`	If true, the decommissioned node's data should be wiped after removal. In CE, this is advisory — operator must delete data on node_id manually. The response will set operator_cleanup_required=true as a reminder. In EE, storage is freed progressively as each shard move completes (Phase 1).
`force`	`bool`	Emergency decommission: skip quorum gate and drain, force membership remove even if node_id is unreachable. May cause permanent data loss if node_id held the only copy of any shard. Requires skip_confirmation=true to guard against accidental invocation.
`skip_confirmation`	`bool`	Must be set to true when force=true to confirm awareness of potential data loss. Prevents accidental use of --force without understanding the consequences. Ignored when force=false.

Response: DecommissionNodeResponse

Field	Type	Description
`node_id`	`uint64`	Numeric node ID that was decommissioned.
`message`	`string`	Human-readable status message describing the phases executed.
`operator_cleanup_required`	`bool`	Set when pruning=true was requested. In CE, operator must manually delete data on the decommissioned node. In EE, storage is freed automatically during Phase 1 shard moves.

Types

AddNodeRequest

Field	Type	Description
`address`	`string`	—

ClusterNode

Field	Type	Description
`node_id`	`string`	Numeric node ID as string (matches --id flag in CLI).
`address`	`string`	gRPC address of the node (host:port). May be empty for follower self-report.
`role`	`NodeRole`	—
`state`	`NodeState`	—
`lag_entries`	`uint64`	Number of Raft log entries this node is behind the leader. Zero for the leader itself. Used for read routing staleness checks.

ClusterStatus

Field	Type	Description
`nodes`	`ClusterNode[]`	—
`leader_id`	`string`	Node ID of the current Raft leader. Empty string if no leader elected.
`raft_term`	`uint64`	Current Raft term. Monotonically increasing across leader elections.

DecommissionNodeRequest

Request to gracefully decommission a node from the cluster.

Field	Type	Description
`node_id`	`uint64`	Numeric node ID to decommission. Must be in the current voter set. The node will be removed from Raft membership after quorum and drain checks pass.
`pruning`	`bool`	If true, the decommissioned node's data should be wiped after removal. In CE, this is advisory — operator must delete data on node_id manually. The response will set operator_cleanup_required=true as a reminder. In EE, storage is freed progressively as each shard move completes (Phase 1).
`force`	`bool`	Emergency decommission: skip quorum gate and drain, force membership remove even if node_id is unreachable. May cause permanent data loss if node_id held the only copy of any shard. Requires skip_confirmation=true to guard against accidental invocation.
`skip_confirmation`	`bool`	Must be set to true when force=true to confirm awareness of potential data loss. Prevents accidental use of --force without understanding the consequences. Ignored when force=false.

DecommissionNodeResponse

Response from DecommissionNode.

Field	Type	Description
`node_id`	`uint64`	Numeric node ID that was decommissioned.
`message`	`string`	Human-readable status message describing the phases executed.
`operator_cleanup_required`	`bool`	Set when pruning=true was requested. In CE, operator must manually delete data on the decommissioned node. In EE, storage is freed automatically during Phase 1 shard moves.

GetClusterStatusRequest

No fields.

JoinNodeRequest

Request to initiate a cluster join for a new node.

Field	Type	Description
`node_id`	`uint64`	Numeric node ID to assign. Must be unique across the cluster. The target node must already be running and listening on address.
`address`	`string`	gRPC address of the new node in "host:port" format. The leader will replicate log entries to this address.
`pre_seeded`	`bool`	If true, the new node has pre-seeded data from an offline snapshot backup. Used as a hint: the leader logs "Tier 3 skip expected" in the initial progress event. Actual Tier-3 skip behavior runs on the joining node side.

JoinNodeResponse

Response to JoinNode — the join lifecycle has been initiated.

Field	Type	Description
`node_id`	`uint64`	Numeric node ID of the joining node (echoed from request).
`status`	`string`	Human-readable status. Currently always "JOIN_INITIATED".

JoinProgressRequest

Request to subscribe to join progress events.

Field	Type	Description
`node_id`	`uint64`	Node ID whose join lifecycle to observe (must match a JoinNode call).

JoinStatus

A single join progress event, emitted during the join lifecycle. The stream emits events at each phase transition and every ~500ms lag poll. The stream closes after COMPLETE or FAILED is emitted.

Field	Type	Description
`node_id`	`uint64`	Node ID of the joining node.
`phase`	`JoinPhase`	Current join phase.
`lag_entries`	`uint64`	Number of Raft log entries behind the leader. Zero when unknown or complete.
`percent`	`uint32`	Estimated completion percentage (0–100). 100 means complete.
`message`	`string`	Human-readable status message for display in CLI --follow output.

RemoveNodeRequest

Field	Type	Description
`node_id`	`string`	—

RemoveNodeResponse

No fields.

Enums

JoinPhase

Phase progression for a node join lifecycle.

Value	Number	Description
`JOIN_PHASE_UNSPECIFIED`	`0`	—
`JOIN_PHASE_LEARNER`	`1`	Node added as Learner — receiving log replication, replication lag closing.
`JOIN_PHASE_READY_CHECK`	`2`	Lag below threshold (1000 entries) — promoting to Voter imminently.
`JOIN_PHASE_PROMOTING`	`3`	change_membership in progress — node becoming a Voter.
`JOIN_PHASE_COMPLETE`	`4`	Node is now a Voter. Join complete. Stream closes after this event.
`JOIN_PHASE_FAILED`	`5`	Join failed (see message). Stream closes after this event.

NodeRole

Value	Number	Description
`NODE_ROLE_UNSPECIFIED`	`0`	—
`NODE_ROLE_LEADER`	`1`	Leader — handles all writes and coordinates replication.
`NODE_ROLE_FOLLOWER`	`2`	Voting follower — participates in elections and quorum.
`NODE_ROLE_LEARNER`	`3`	Non-voting learner — receives replication but does not vote.

NodeState

Value	Number	Description
`NODE_STATE_UNSPECIFIED`	`0`	—
`NODE_STATE_HEALTHY`	`1`	Healthy — lag within acceptable threshold.
`NODE_STATE_DEGRADED`	`2`	Degraded — lag exceeds threshold or heartbeat delayed.
`NODE_STATE_DOWN`	`3`	Down — not reachable or no heartbeat received.

ClusterService ​

Methods ​

GetClusterStatus ​

JoinNode ​

JoinProgress ​

AddNode ​

RemoveNode ​

DecommissionNode ​

Types ​

AddNodeRequest ​

ClusterNode ​

ClusterStatus ​

DecommissionNodeRequest ​

DecommissionNodeResponse ​

GetClusterStatusRequest ​

JoinNodeRequest ​

JoinNodeResponse ​

JoinProgressRequest ​

JoinStatus ​

RemoveNodeRequest ​

RemoveNodeResponse ​

Enums ​

JoinPhase ​

NodeRole ​

NodeState ​