CODEX-STORE

Field	Value
Name	Codex Store Module
Slug	80
Status	raw
Category	Standards Track
Editor	Codex Team
Contributors	Filip Dimitrijevic [email protected]

Timeline

2026-01-19 — f24e567 — Chore/updates mdbook (#262)
2026-01-16 — 89f2ea8 — Chore/mdbook updates (#258)

Abstract

This specification describes the Store Module, the core storage abstraction in Codex, providing a unified interface for storing and retrieving content-addressed blocks and associated metadata.

The Store Module decouples storage operations from underlying datastore semantics by introducing the BlockStore interface, which standardizes methods for storing and retrieving both ephemeral and persistent blocks across different storage backends. The module integrates a maintenance engine responsible for cleaning up expired ephemeral data according to configured policies.

The Store Module is built on top of the generic DataStore (DS) interface, which is implemented by multiple backends such as SQLite, LevelDB, and the filesystem.

Background / Rationale / Motivation

The primary design goal is to decouple storage operations from the underlying datastore semantics by introducing the BlockStore interface. This interface standardizes methods for storing and retrieving both ephemeral and persistent blocks, ensuring a consistent API across different storage backends.

The DataStore provides a KV-store abstraction with Get, Put, Delete, and Query operations, with backend-dependent guarantees. At a minimum, row-level consistency and basic batching are expected.

The DataStore supports:

Namespace mounting for isolating backend usage
Layering backends (e.g., caching in front of persistent stores)
Flexible stacking and composition of storage proxies

The current implementation has several limitations:

No dataset-level operations or advanced batching support
Lack of consistent locking and concurrency control, which may lead to inconsistencies during crashes or long-running operations on block groups (e.g., reference count updates, expiration updates)

Theory / Semantics

BlockStore Interface

The BlockStore interface provides the following methods:

Method	Description	Input	Output
`getBlock(cid: Cid)`	Retrieve block by CID	CID	`Future[?!Block]`
`getBlock(treeCid: Cid, index: Natural)`	Retrieve block from a Merkle tree by leaf index	Tree CID, index	`Future[?!Block]`
`getBlock(address: BlockAddress)`	Retrieve block via unified address	BlockAddress	`Future[?!Block]`
`getBlockAndProof(treeCid: Cid, index: Natural)`	Retrieve block with Merkle proof	Tree CID, index	`Future[?!(Block, CodexProof)]`
`getCid(treeCid: Cid, index: Natural)`	Retrieve leaf CID from tree metadata	Tree CID, index	`Future[?!Cid]`
`getCidAndProof(treeCid: Cid, index: Natural)`	Retrieve leaf CID with inclusion proof	Tree CID, index	`Future[?!(Cid, CodexProof)]`
`putBlock(blk: Block, ttl: Duration)`	Store block with quota enforcement	Block, optional TTL	`Future[?!void]`
`putCidAndProof(treeCid: Cid, index: Natural, blkCid: Cid, proof: CodexProof)`	Store leaf metadata with ref counting	Tree CID, index, block CID, proof	`Future[?!void]`
`hasBlock(...)`	Check block existence (CID or tree leaf)	CID / Tree CID + index	`Future[?!bool]`
`delBlock(...)`	Delete block/tree leaf (with ref count checks)	CID / Tree CID + index	`Future[?!void]`
`ensureExpiry(...)`	Update expiry for block/tree leaf	CID / Tree CID + index, expiry timestamp	`Future[?!void]`
`listBlocks(blockType: BlockType)`	Iterate over stored blocks	Block type	`Future[?!SafeAsyncIter[Cid]]`
`getBlockExpirations(maxNumber, offset)`	Retrieve block expiry metadata	Pagination params	`Future[?!SafeAsyncIter[BlockExpiration]]`
`blockRefCount(cid: Cid)`	Get block reference count	CID	`Future[?!Natural]`
`reserve(bytes: NBytes)`	Reserve storage quota	Bytes	`Future[?!void]`
`release(bytes: NBytes)`	Release reserved quota	Bytes	`Future[?!void]`
`start()`	Initialize store	—	`Future[void]`
`stop()`	Gracefully shut down store	—	`Future[void]`
`close()`	Close underlying datastores	—	`Future[void]`

Store Implementations

The Store module provides three concrete implementations of the BlockStore interface, each optimized for a specific role in the Codex architecture: RepoStore, NetworkStore, and CacheStore.

RepoStore

The RepoStore is a persistent BlockStore implementation that interfaces directly with low-level storage backends, such as hard drives and databases.

It uses two distinct DataStore backends:

FileSystem — for storing raw block data
LevelDB — for storing associated metadata

This separation ensures optimal performance, allowing block data operations to run efficiently while metadata updates benefit from a fast key-value database.

Characteristics:

Persistent storage via datastore backends
Quota management with precise usage tracking
TTL (time-to-live) support with automated expiration
Metadata storage for block size, reference count, and expiry
Transaction-like operations implemented through reference counting

Configuration:

quotaMaxBytes: Maximum storage quota
blockTtl: Default TTL for stored blocks
postFixLen: CID key postfix length for sharding

┌─────────────────────────────────────────────────────────────┐
│                        RepoStore                            │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────┐              ┌──────────────────────────┐  │
│  │  repoDs     │              │       metaDs             │  │
│  │ (Datastore) │              │  (TypedDatastore)        │  │
│  │             │              │                          │  │
│  │ Block Data: │              │ Metadata:                │  │
│  │ - Raw bytes │              │ - BlockMetadata          │  │
│  │ - CID-keyed │              │ - LeafMetadata           │  │
│  │             │              │ - QuotaUsage             │  │
│  │             │              │ - Block counts           │  │
│  └─────────────┘              └──────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘

NetworkStore

The NetworkStore is a composite BlockStore that combines local persistence with network-based retrieval for distributed content access.

It follows a local-first strategy — attempting to retrieve or store blocks locally first, and falling back to network retrieval via the Block Exchange Engine if the block is not available locally.

Characteristics:

Integrates local storage with network retrieval
Works seamlessly with the block exchange engine for peer-to-peer access
Transparent block fetching from remote sources
Local caching of blocks retrieved from the network for future access

┌────────────────────────────────────────────────────────────┐
│                      NetworkStore                          │
├────────────────────────────────────────────────────────────┤
│                                                            │
│  ┌─────────────────┐           ┌──────────────────────┐    │
│  │ LocalStore - RS │           │   BlockExcEngine     │    │
│  │ • Store blocks  │           │ • Request blocks     │    │
│  │ • Get blocks    │           │ • Resolve blocks     │    │
│  └─────────────────┘           └──────────────────────┘    │
│           │                              │                 │
│           └──────────────┬───────────────┘                 │
│                          │                                 │
│                   ┌─────────────┐                          │
│                   │BS Interface │                          │
│                   │             │                          │
│                   │ • getBlock  │                          │
│                   │ • putBlock  │                          │
│                   │ • hasBlock  │                          │
│                   │ • delBlock  │                          │
│                   └─────────────┘                          │
└────────────────────────────────────────────────────────────┘

CacheStore

The CacheStore is an in-memory BlockStore implementation designed for fast access to frequently used blocks.

This store maintains two separate LRU caches:

Block Cache — LruCache[Cid, Block]
- Stores actual block data indexed by CID
- Acts as the primary cache for block content
CID/Proof Cache — LruCache[(Cid, Natural), (Cid, CodexProof)]
- Maps (treeCid, index) to (blockCid, proof)
- Supports direct access to block proofs keyed by treeCid and index

Characteristics:

O(1) access times for cached data
LRU eviction policy for memory management
Configurable maximum cache size
No persistence — cache contents are lost on restart
No TTL — blocks remain in cache until evicted

Configuration:

cacheSize: Maximum total cache size (bytes)
chunkSize: Minimum block size unit

Storage Layout

Key Pattern	Data Type	Description	Example
`repo/manifests/{XX}/{full-cid}`	Raw bytes	Manifest block data	`repo/manifests/Cd/bafy...Cd → [data]`
`repo/blocks/{XX}/{full-cid}`	Raw bytes	Block data	`repo/blocks/Ab/bafy...Ab → [data]`
`meta/ttl/{cid}`	BlockMetadata	Expiry, size, refCount	`meta/ttl/bafy... → {...}`
`meta/proof/{treeCid}/{index}`	LeafMetadata	Merkle proof for leaf	`meta/proof/bafy.../42 → {...}`
`meta/total`	Natural	Total stored blocks	`meta/total → 12039`
`meta/quota/used`	NBytes	Used quota	`meta/quota/used → 52428800`
`meta/quota/reserved`	NBytes	Reserved quota	`meta/quota/reserved → 104857600`

Workflows

The following flow charts summarize how put, get, and delete operations interact with the shared block storage, metadata store, and quota management systems.

PutBlock

The following flow chart shows how a block is stored with metadata and quota management:

putBlock: blk, ttl
  │
  ├─> Calculate expiry = now + ttl
  │
  ├─> storeBlock: blk, expiry
  │
  ├─> Block empty?
  │   ├─> Yes: Return AlreadyInStore
  │   └─> No: Create metadata & block keys
  │
  ├─> Block metadata exists?
  │   ├─> Yes: Size matches?
  │   │   ├─> Yes: Return AlreadyInStore
  │   │   └─> No: Return Error
  │   └─> No: Create new metadata
  │
  ├─> Store block data
  │
  ├─> Store successful?
  │   ├─> No: Return Error
  │   └─> Yes: Update quota usage
  │
  ├─> Quota update OK?
  │   ├─> No: Rollback: Delete block → Return Error
  │   └─> Yes: Update total blocks count
  │
  ├─> Trigger onBlockStored callback
  │
  └─> Return Success

GetBlock

The following flow chart explains how a block is retrieved by CID or tree reference, resolving metadata if necessary, and returning the block or an error:

getBlock: cid/address
  │
  ├─> Input type?
  │   ├─> BlockAddress with leaf
  │   │   └─> getLeafMetadata: treeCid, index
  │   │       ├─> Leaf metadata found?
  │   │       │   ├─> No: Return BlockNotFoundError
  │   │       │   └─> Yes: Extract block CID from metadata
  │   └─> CID: Direct CID access
  │
  ├─> CID empty?
  │   ├─> Yes: Return empty block
  │   └─> No: Create prefix key
  │
  ├─> Query datastore: repoDs.get
  │
  ├─> Block found?
  │   ├─> No: Error type?
  │   │   ├─> DatastoreKeyNotFound: Return BlockNotFoundError
  │   │   └─> Other: Return Error
  │   └─> Yes: Create Block with verification
  │
  └─> Return Block

DelBlock

The following flow chart shows how a block is deleted when it is unused or expired, including metadata cleanup and quota/counter updates:

delBlock: cid
  │
  ├─> delBlockInternal: cid
  │
  ├─> CID empty?
  │   ├─> Yes: Return Deleted
  │   └─> No: tryDeleteBlock: cid, now
  │
  ├─> Metadata exists?
  │   ├─> No: Check if block exists in repo
  │   │   ├─> Block exists?
  │   │   │   ├─> Yes: Warn & remove orphaned block
  │   │   │   └─> No: Return NotFound
  │   │   └─> Return NotFound
  │   └─> Yes: refCount = 0 OR expired?
  │       ├─> No: Return InUse
  │       └─> Yes: Delete block & metadata → Return Deleted
  │
  ├─> Handle result
  │
  ├─> Result type?
  │   ├─> InUse: Return Error: Cannot delete dataset block
  │   ├─> NotFound: Return Success: Ignore
  │   └─> Deleted: Update total blocks count
  │               └─> Update quota usage
  │                   └─> Return Success
  │
  └─> Return Success

Data Models

Stores

RepoStore* = ref object of BlockStore
  postFixLen*: int
  repoDs*: Datastore
  metaDs*: TypedDatastore
  clock*: Clock
  quotaMaxBytes*: NBytes
  quotaUsage*: QuotaUsage
  totalBlocks*: Natural
  blockTtl*: Duration
  started*: bool

NetworkStore* = ref object of BlockStore
  engine*: BlockExcEngine
  localStore*: BlockStore

CacheStore* = ref object of BlockStore
  currentSize*: NBytes
  size*: NBytes
  cache: LruCache[Cid, Block]
  cidAndProofCache: LruCache[(Cid, Natural), (Cid, CodexProof)]

Metadata Types

BlockMetadata* {.serialize.} = object
  expiry*: SecondsSince1970
  size*: NBytes
  refCount*: Natural

LeafMetadata* {.serialize.} = object
  blkCid*: Cid
  proof*: CodexProof

BlockExpiration* {.serialize.} = object
  cid*: Cid
  expiry*: SecondsSince1970

QuotaUsage* {.serialize.} = object
  used*: NBytes
  reserved*: NBytes

Functional Requirements

Available Today

Atomic Block Operations
- Store, retrieve, and delete operations must be atomic.
- Support retrieval via:
  - Direct CID
  - Tree-based addressing (treeCid + index)
  - Unified block address
Metadata Management
- Store protocol-level metadata (e.g., storage proofs, quota usage).
- Store block-level metadata (e.g., reference counts, total block count).
Multi-Datastore Support
- Pluggable datastore interface supporting various backends.
- Typed datastore operations for metadata type safety.
Lifecycle & Maintenance
- BlockMaintainer service for removing expired data.
- Configurable maintenance intervals (default: 10 min).
- Batch processing (default: 1000 blocks/cycle).

Future Requirements

Transaction Rollback & Error Recovery
- Rollback support for failed multi-step operations.
- Consistent state restoration after failures.
Dataset-Level Operations
- Handle Dataset level meta data.
- Batch operations for dataset block groups.
Concurrency Control
- Consistent locking and coordination mechanisms to prevent inconsistencies during crashes or long-running operations.
Lifecycle & Maintenance
- Cooperative scheduling to avoid blocking.
- State tracking for large datasets.

Non-Functional Requirements

Currently Implemented

Security
- Verify block content integrity upon retrieval.
- Enforce quotas to prevent disk exhaustion.
- Safe orphaned data cleanup.
Scalability
- Configurable storage quotas (default: 20 GiB).
- Pagination for metadata queries.
- Reference counting–based garbage collection.
Reliability
- Metrics collection (codex_repostore_*).
- Graceful shutdown with resource cleanup.

Planned Enhancements

Performance
- Batch metadata updates.
- Efficient key lookups with configurable prefix lengths.
- Support for both fast and slower storage tiers.
- Streaming APIs optimized for extremely large datasets.
Security
- Finer-grained quota enforcement across tenants/namespaces.
Reliability
- Stronger rollback semantics for multi-node consistency.
- Auto-recovery from inconsistent states.

Wire Format Specification / Syntax

The Store Module does not define a wire format specification. It provides an internal storage abstraction for Codex and relies on underlying datastore implementations for serialization and persistence.

Security/Privacy Considerations

Block Integrity: The Store Module verifies block content integrity upon retrieval to ensure data has not been corrupted or tampered with.
Quota Enforcement: Storage quotas are enforced to prevent disk exhaustion attacks. The default quota is 20 GiB, but this is configurable.
Safe Data Cleanup: The maintenance engine safely removes expired ephemeral data and orphaned blocks without compromising data integrity.
Reference Counting: Reference counting–based garbage collection ensures that blocks are not deleted while they are still in use by other components.

Future security enhancements include finer-grained quota enforcement across tenants/namespaces and stronger rollback semantics for multi-node consistency.

Rationale

The Store Module design prioritizes:

Decoupling: By introducing the BlockStore interface, the Store Module decouples storage operations from underlying datastore semantics, allowing for flexible backend implementations.
Performance: The separation of block data (filesystem) and metadata (LevelDB) in RepoStore ensures optimal performance for both types of operations.
Flexibility: The three store implementations (RepoStore, NetworkStore, CacheStore) provide different trade-offs between persistence, network access, and performance, allowing Codex to optimize for different use cases.
Scalability: Reference counting, quota management, and pagination enable the Store Module to scale to large datasets while preventing resource exhaustion.

The current limitations (lack of dataset-level operations, inconsistent locking) are acknowledged and will be addressed in future versions.

nim-datastore
DataStore Interface
chronos - Async runtime
libp2p - P2P networking and CID types
questionable - Error handling
lrucache - LRU cache implementation

Logos LIP

CODEX-STORE

Timeline

Abstract

Background / Rationale / Motivation

Theory / Semantics

BlockStore Interface

Store Implementations

RepoStore

NetworkStore

CacheStore

Storage Layout

Workflows

PutBlock

GetBlock

DelBlock

Data Models

Stores

Metadata Types

Functional Requirements

Available Today

Future Requirements

Non-Functional Requirements

Currently Implemented

Planned Enhancements

Wire Format Specification / Syntax

Security/Privacy Considerations

Rationale

Copyright

References

normative

informative