02 · core concepts

Core concepts.

OriginChain is a hash-keyed key/value substrate with a Plan tree on top. SQL, vector, full-text, and graph are not separate engines — they are different key shapes and different Plan operators over the same store. Understand the substrate, the keys, and the Plan and the rest of the surface follows.

01 · the substrate

A single hash-keyed k/v store.

The engine is a single B-tree-free, hash-indexed key/value store fronted by a write-ahead log. Every write is appended to the WAL, fsynced, then applied. Reads go through a process-wide page cache. There is no row-store / column-store / vector-engine split — every domain is a different prefix on the same keyspace.

Each tenant gets a single, region-isolated EC2 instance. No shared compute, no noisy neighbour. Writes go to one primary; an optional sync follower ships WAL frames in lockstep for RPO=0 paid tiers. The follower bootstraps from a Frame::Snapshot transfer and then tails.

Fig. — request path: HTTP → plan → engine → WAL → follower + S3

02 · schemas

Declared in TOML.

A schema manifest declares columns, indexes, relations to other schemas, and extractions (chunked text fields, vector fields, FTS fields). The catalog is itself stored as rows — adding a field is a write, not a downtime migration.

# schemas/orders.toml
name    = "orders"
version = 1

[[columns]]
name = "id"        ; type = "ulid"     ; pk = true
[[columns]]
name = "customer"  ; type = "ulid"
[[columns]]
name = "amount"    ; type = "decimal"
[[columns]]
name = "status"    ; type = "string"
[[columns]]
name = "placed"    ; type = "timestamp"

[[indexes]]
columns = ["status"]
[[indexes]]
columns = ["customer", "placed"]

[[relations]]
edge   = "customer"
target = "customers"     # rel|fwd|orders|customer|<order>|<customer>

[[extractions.fts]]
field    = "notes"
analyzer = "english"     # snowball stem + diacritics fold + stop-words

[[extractions.vector]]
field  = "summary_embedding"
dim    = 1024
metric = "cosine"

Indexes, relations, and extractions are honoured at write time — no separate "build index" step. Online migrations follow a strict contract: monotonic version int, one of four allowed shapes per migration, eager 10% backfill, dual-read transform, atomic cutover. See ops → migrations.

03 · key shapes

Ten domain prefixes in production.

Every byte stored on disk lives under one of these prefixes. SQL reads row|* and idx|*. Graph reads rel|*. Full-text reads fts*. Vector reads vec*. The plan cache is intent|*.

Prefix	Byte layout	Purpose
row	row\|<schema>\|<pk_bytes>	The primary user-facing record. PK can be ULID, UUID, string, or composite. Encodes a single row as MessagePack.
idx	idx\|<schema>\|<column>\|<value_bytes>\|<pk_bytes>	Secondary index entries. Hash-keyed range scans work via prefix iteration on (schema, column, value).
rel	rel\|fwd\|<schema>\|<edge>\|<src_pk>\|<dst_pk> · rel\|rev\|<schema>\|<edge>\|<dst_pk>\|<src_pk>	Edges between rows. Always written in pairs (forward + reverse) so neighbours and reverse-neighbours are both O(prefix-scan).
chunk	chunk\|<schema>\|<row_pk>\|<seq_u32>	Document chunks for FTS / vector — splits long text fields into addressable units while keeping the parent row intact.
fts	fts\|<schema>\|<field>\|<token>\|<row_pk>\|<positions>	Inverted-index posting list. Token is post-tokeniser (UAX #29 + optional Snowball stem). Positions back phrase queries.
fts_doclen	fts_doclen\|<schema>\|<field>\|<row_pk>	Per-doc length cache for BM25 scoring. Updated atomically with each fts insert.
fts_corpus	fts_corpus\|<schema>\|<field>	Corpus-wide stats: doc count, total token count, average doc length. One key per (schema, field).
vec	vec\|<schema>\|<field>\|<row_pk>	Raw f32 embedding vector — the value the SIMD distance kernel reads.
vec_idx	vec_idx\|<schema>\|<field>\|<segment_id>	Serialised HNSW graph segment. Loaded once per process into the graph cache, evicted on schema migration.
intent	intent\|<question_hash>	Plan cache entry — the compiled Plan tree for a question template. Skips the LLM compile on cache hit.

04 · the plan tree

Eleven operators, one tree.

Both /sql and /ask compile to the same Plan tree. The tree is JSON-serialisable, cached by question hash under intent|*, and replayable. Every shipped query shape is one of these operators or a composition.

Scan

Full prefix scan of a row keyspace. The fallback when no index applies.

ColumnScan

Projection-aware scan that decodes only the requested fields from the MessagePack body.

IndexScan

Hash-keyed lookup on an idx prefix. Used when WHERE has an indexed equality.

Filter

Predicate evaluation. Pushed under projection where possible by the optimiser.

Project

Column selection — drops fields the user did not ask for before they reach the wire.

Limit

Truncates the stream. Pushed below sort when the sort key admits a top-K shortcut.

Sort

External-merge sort with spill-to-disk over the sealed-segment cache.

Aggregate

GROUP BY with COUNT / SUM / AVG / MIN / MAX. HAVING evaluates after the aggregate buckets finalise.

HashJoin

Build-side hash table on the smaller input, probe with the larger. INNER joins between two tables.

OuterJoin

LEFT, RIGHT, and FULL variants — emits NULL-filled rows for unmatched probe entries. Up to 5-table left-deep chains.

RelationHop

Walks rel|fwd or rel|rev edges. Powers neighbours, BFS, path, and Dijkstra.

-- SELECT c.name, SUM(o.amount) FROM orders o
-- JOIN customers c ON c.id = o.customer
-- WHERE o.status = 'paid' GROUP BY c.name HAVING SUM(o.amount) > 1000;

Aggregate { group: [c.name], having: SUM(o.amount) > 1000 }
└── Project { c.name, o.amount }
    └── HashJoin { o.customer = c.id }
        ├── IndexScan { idx|orders|status = "paid" }
        └── Scan { row|customers }

05 · replication model

Active-passive, sync.

One primary, one optional sync follower. WAL frames replicate before the primary returns 200. A follower joining a running cluster bootstraps via Frame::Snapshot — a chunked transfer of every key in the store — then tails the live frame stream from the snapshot's LSN.

Mode	RPO	RTO	Notes
Primary only	~0.5s (WAL fsync)	~5–10 min (S3 restore)	Whisper tier default. Restore replays sealed segments + tail.
Sync follower	0	~25s (drilled)	Thunder, Storm, and Enterprise tiers. Verified end-to-end with snapshot bootstrap.

Active-passive sync replication is the production path. Commits durably ack only after the follower has the frame on disk. RPO=0, RTO ~25 s. See ops → failover for the promotion procedure.

06 · versioning

Single-row optimistic CAS.

Every row carries an internal _oc_row_version field. The API exposes put_row_cas, get_row_versioned, and delete_row_cas for optimistic concurrency. A CAS that loses the race fails the entire batch with a deterministic error — no partial application. Idempotency keys make retries safe; the same key plus the same body returns the original response, a different body with the same key returns 409.

07 · backups & pitr

Sealed segments + continuous tail.

Two streams flow to S3 in parallel: sealed WAL segments shipped on roll, and a continuous tail-shipper that flushes the open segment every few hundred milliseconds. WAL frame v2 carries an embedded timestamp_micros so restore-to-timestamp resolves below the segment boundary.

# restore an instance to a wall-clock timestamp
oc-pitr restore \
  --tenant acme \
  --target "2026-04-29T18:42:00Z" \
  --into   acme-restore-001

Sealed-segment PITR (segment-boundary granularity, ~5–10 min restore window) is included on every tier. Intra-segment LSN-precise PITR (sub-second granularity, ~0.5–1.5 s data-loss window) is a paid add-on — see pricing.