how-to · insert

Insert data into OriginChain

Every write in OriginChain commits atomically across rows, secondary indexes, vector embeddings, full-text postings, and graph relations on a single WAL frame. Here's how to insert each shape — first independently, then all four together as the differentiator.

Insert a row.

The most common write. Either send SQL or POST a typed JSON body. Both paths hit the same atomic write through the substrate — choose by ergonomics.

SQL

POST /v1/tenants/:t/sql

curl -X POST "https://acme.ap-south-1.db.originchain.ai/v1/tenants/$T/sql" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "sql": "INSERT INTO shop.customers (id, email, region) VALUES ('\''c_1'\'', '\''ada@example.com'\'', '\''IN'\'')"
  }'

Typed HTTP + Python SDK

POST /v1/tenants/:t/rows/:schema

curl -X POST "https://acme.ap-south-1.db.originchain.ai/v1/tenants/$T/rows/shop.customers" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: $(uuidgen)" \
  -d '{
    "id":     "c_1",
    "email":  "ada@example.com",
    "region": "IN"
  }'

db.rows.put(
    "shop.customers",
    { "id": "c_1", "email": "ada@example.com", "region": "IN" },
    idempotency_key="customer-c_1-create",
)

response

{ "ok": true, "lsn": { "segment": 4, "offset": 8421007 } }

why one frame

The row, every secondary index entry it touches, every vector or full-text extraction declared on the manifest, and every relation it points at land on a single WAL frame. The fsync acks once. On crash either the whole row appears, or none of it. The index can never lag the data. See core concepts → substrate.

Bulk insert.

Three ways to land many rows fast. Pick by source shape: a hand-batched SQL list, a JSON array body, or an NDJSON stream. The 8 MiB body cap is lifted on the streaming batch route — chunk size controls how many rows go in one WAL frame.

Multi-row SQL — up to 1000 rows per call

POST /v1/tenants/:t/sql

curl -X POST "https://acme.ap-south-1.db.originchain.ai/v1/tenants/$T/sql" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "sql": "INSERT INTO shop.customers (id, email, region) VALUES ('\''c_1'\'', '\''ada@example.com'\'', '\''IN'\''), ('\''c_2'\'', '\''hopper@example.com'\'', '\''US'\''), ('\''c_3'\'', '\''lovelace@example.com'\'', '\''GB'\'')"
  }'

JSON-array batch — atomic across the whole array

POST /v1/tenants/:t/rows/:schema/_batch

curl -X POST "https://acme.ap-south-1.db.originchain.ai/v1/tenants/$T/rows/shop.customers/_batch" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -H "Idempotency-Key: bulk-2026-05-02-01" \
  -d '[
    { "id": "c_1", "email": "ada@example.com",      "region": "IN" },
    { "id": "c_2", "email": "hopper@example.com",   "region": "US" },
    { "id": "c_3", "email": "lovelace@example.com", "region": "GB" }
  ]'

NDJSON stream — millions of rows, chunked

POST /v1/tenants/:t/rows/:schema/_batch?chunk=1000

# customers.ndjson — one row per line
{"id":"c_1","email":"ada@example.com","region":"IN"}
{"id":"c_2","email":"hopper@example.com","region":"US"}
{"id":"c_3","email":"lovelace@example.com","region":"GB"}

curl -X POST "https://acme.ap-south-1.db.originchain.ai/v1/tenants/$T/rows/shop.customers/_batch?chunk=1000" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/x-ndjson" \
  --data-binary @customers.ndjson

Python — chunked generator

db.rows.put_batch

def gen_customers():
    for i in range(50_000):
        yield { "id": f"c_{i}", "email": f"u{i}@example.com", "region": "IN" }

inserted = db.rows.put_batch(
    "shop.customers",
    gen_customers(),
    chunk=1_000,                    # 1k rows per atomic WAL frame
    idempotency_key="bulk-2026-05-02-01",
)
print(f"{inserted} rows accepted")

When to switch to streaming inserts. Past a few hundred MB of source data, send NDJSON. Each chunk lands as one atomic WAL frame; partial-failure retries pick up from the chunk boundary the server confirmed. See ops → observability for backfill metrics during heavy ingest.

Insert a vector.

Vectors live in the same instance as rows — different key prefix, same WAL, same backups. Send the embedding plus optional metadata; the HNSW graph segment for that table mutates on the same fsync. Per-table dimensionality is enforced.

POST /v1/tenants/:t/vector/:table/put

curl -X POST "https://acme.ap-south-1.db.originchain.ai/v1/tenants/$T/vector/shop.products/put" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "id":        "sku-9281",
    "embedding": [0.0124, -0.0883, 0.0451, /* ... 768 floats ... */],
    "dim":       768,
    "metric":    "cosine",
    "metadata":  { "category": "running-shoes", "price": 129.0 }
  }'

db.vector.put(
    table="shop.products",
    id="sku-9281",
    embedding=embedding_768d,
    metric="cosine",
    metadata={ "category": "running-shoes", "price": 129.0 },
)

Inserting a vector during a row write is also supported — declare an extraction on the manifest and the embedding lands on the same WAL frame as the row. See core concepts → atomic writes.

Insert a full-text document.

Index a text field for BM25 retrieval. Re-indexing the same doc_id retires stale postings in the same frame — no ghost matches. Tokenizer and analyzer pipeline are declared on the manifest.

POST /v1/tenants/:t/fts/:table/:field

curl -X POST "https://acme.ap-south-1.db.originchain.ai/v1/tenants/$T/fts/shop.products/description" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "doc_id": "sku-9281",
    "text":   "Lightweight road runner with a carbon plate, designed for marathon pace."
  }'

db.fts.index(
    table="shop.products",
    field="description",
    doc_id="sku-9281",
    text="Lightweight road runner with a carbon plate, designed for marathon pace.",
)

Tokenizer options (unicode, ascii) and the analyzer pipeline live in the manifest. Browse them on schemas → full-text fields.

Insert a graph relationship.

Edges aren't a separate write. Declare a [[relations]] block on the manifest and the row that holds the foreign-key column emits the forward + reverse edge automatically when you put it. Self-relations work — direction tags resolve the collision. Overwriting the row retires the old edges in the same frame.

POST /v1/tenants/:t/rows/:schema — edge written atomically

curl -X POST "https://acme.ap-south-1.db.originchain.ai/v1/tenants/$T/rows/shop.products" \
  -H "Authorization: Bearer $OC_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "id":          "sku-9281",
    "name":        "Carbon Marathon",
    "supplier_id": "sup-44",
    "price":       129.0
  }'

db.rows.put(
    "shop.products",
    {
        "id":          "sku-9281",
        "name":        "Carbon Marathon",
        "supplier_id": "sup-44",   # declared as a relation in the manifest
        "price":       129.0,
    },
)

Atomic multi-shape insert.

The differentiator. A single product enters the catalog as a row, a 768-dimensional embedding, a BM25-indexed description, and a supplier relationship. One database, one bearer, one endpoint — no ETL between Postgres, Pinecone, Elasticsearch, and Neo4j. Declare the shape once on the manifest; every write that touches the row keeps all four projections in lockstep.

The manifest

# manifest.toml — one product. Row, vector, full-text, graph all declared once.
id = "shop.products"

[primary_key]
columns = ["id"]

[[columns]]
name = "id"          ; type = "str"
[[columns]]
name = "name"        ; type = "str"
[[columns]]
name = "supplier_id" ; type = "str"
[[columns]]
name = "price"       ; type = "f64"
[[columns]]
name = "description" ; type = "str"

[[indexes]]
columns = ["supplier_id"]

[[relations]]
name        = "supplied_by"
column      = "supplier_id"
target      = "shop.suppliers"

[[extractions.fts]]
field    = "description"
analyzer = ["lowercase", "stem:english"]

[[extractions.vector]]
field  = "description"
dim    = 768
metric = "cosine"

Ingest a product

row + vector + full-text + supplier edge

# One product → row + 768d embedding + BM25 description + supplier edge.
# Each call lands in one WAL frame; ordered against the same instance.

product_id  = "sku-9281"
description = "Lightweight road runner with a carbon plate, designed for marathon pace."

# 1. Row + the supplier graph edge (declared as a relation on the manifest)
db.rows.put("shop.products", {
    "id":          product_id,
    "name":        "Carbon Marathon",
    "supplier_id": "sup-44",
    "price":       129.0,
    "description": description,
})

# 2. Vector embedding of the description
db.vector.put(
    table="shop.products",
    id=product_id,
    embedding=embed(description),     # 768d f32
    metric="cosine",
    metadata={ "category": "running-shoes", "price": 129.0 },
)

# 3. BM25 inverted index on description
db.fts.index(
    table="shop.products",
    field="description",
    doc_id=product_id,
    text=description,
)

The same product is now visible to a SQL SELECT, a vector topk, a BM25 search, and a graph neighbors walk — the moment the WAL fsyncs. No reconciliation jobs, no eventual consistency, no second-system staleness.