Skip to content

Conversation

@kriskowal
Copy link
Member

Description

This change introduces an experimental CBOR codec as an alternative to Syrup for OCapN message serialization. The implementation enables OCapN messages to be parsed by any RFC 8949 compliant CBOR decoder, allowing more of the ecosystem's existing tools to come to bear and making the wire format strictly a compact, machine-readable format.

New: CBOR Codec Implementation

  • RFC 8949 compliant canonical encoding
  • All integers encoded as bignums (Tags 2/3) for arbitrary precision
  • Float64 values with canonical NaN representation
  • Symbols via Tag 280
  • Records via Tag 27 (generic array)
  • Tagged values via Tag 55799 (self-described CBOR)
  • BMP-only string validation (rejects surrogate code points)

New: CBOR Diagnostic Notation Codec

A text-based codec for human-readable CBOR representation:

  • src/cbor/diagnostic/encode.js - CBOR bytes → diagnostic string
  • src/cbor/diagnostic/decode.js - Diagnostic string → JavaScript values
  • src/cbor/diagnostic/util.js - Hex conversion and comparison helpers

This enables writing test cases in readable form and debugging encoding issues.

New: Codec Interface Abstraction

src/codec-interface.js defines abstract OcapnReader and OcapnWriter interfaces that both Syrup and CBOR implement. This enables:

  • Swapping codecs without changing application code
  • Type-safe codec injection
  • Future codec negotiation in netlayers

Codec Architecture: Dependency Injection

Applications import only the codec they need:

// For CBOR
import { makeCborWriter, makeCborReader } from '@endo/ocapn/cbor/index.js';

// For Syrup
import { makeSyrupWriter, makeSyrupReader } from '@endo/ocapn/syrup/index.js';

This replaces a hypothetical central factory, ensuring unused implementations are tree-shaken.

Test Infrastructure Enhancements

  • Dual-codec testing: All codec tests now run with both Syrup and CBOR
  • AVA macros: Table-driven interop tests using AVA macro pattern
  • Interop tests: Validates CBOR output parses with the cbor npm package
  • Snapshot updates: All codec snapshots now include both Syrup and CBOR variants

Documentation

  • docs/cbor-encoding.md - Complete specification covering:
    • All OCapN type mappings to CBOR
    • Canonical encoding rules
    • CapTP operation and descriptor formats
    • Passable data and slot reference design
  • docs/codec-usage.md - Usage patterns for both codecs
  • src/cbor/README.md - Implementation overview
  • src/syrup/README.md - Updated with codec ID

Security Considerations

  • Canonical encoding: CBOR output is deterministic, enabling signature verification without re-serialization
  • BMP string validation: Rejects strings with surrogate code points (outside Basic Multilingual Plane), consistent with Syrup
  • Tag 24 for signed envelopes: The desc:sig-envelope design wraps signed content in embedded CBOR (Tag 24) to enable signature verification against bytewise representation

Scaling Considerations

  • 65535-byte message limit consistent with Noise Protocol framing
  • Slot references separated from passable body enables forwarding without re-serialization

Documentation Considerations

The docs/cbor-encoding.md specification is comprehensive and self-contained. Users wanting to implement an OCapN codec in another language should be able to do so from this document alone.

Testing Considerations

Existing Syrup tests parameterized to cover CBOR.

Inludes tests that verify that cross-reference legibility of generated CBOR messages with an off-the-shelf, generic CBOR implementation.

Round-trip and canonicalization tests.

Compatibility Considerations

None.

Upgrade Considerations

None.

@kriskowal kriskowal requested a review from kumavis January 1, 2026 07:38
@kriskowal kriskowal force-pushed the kriskowal-ocapn-cbor branch 8 times, most recently from 1c5f113 to 4d057a5 Compare January 2, 2026 10:11
@kriskowal kriskowal added the agenda Topics for next Endo meeting agenda label Jan 4, 2026
@kriskowal kriskowal force-pushed the kriskowal-ocapn-cbor branch from 4d057a5 to cd93ea0 Compare January 4, 2026 06:40
Copy link
Member

@gibson042 gibson042 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These comments are limited to documentation; I haven't looked at any implementing code.

Comment on lines 37 to 39
| Target (in-band) | Record marker | `27([280("target")])` |
| Promise (in-band) | Record marker | `27([280("promise")])` |
| Error (in-band) | Record with message | `27([280("error"), "TypeError"])` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CBOR tag number 27 suggests (if not requires) that the first element be a string ("The typename [in array [typename, constructargs...]] is usually a class name, or another string that indicates the name of the type"), and I think we should honor that expectation.

Suggested change
| Target (in-band) | Record marker | `27([280("target")])` |
| Promise (in-band) | Record marker | `27([280("promise")])` |
| Error (in-band) | Record with message | `27([280("error"), "TypeError"])` |
| Target (in-band) | Record marker | `27(["target"])` |
| Promise (in-band) | Record marker | `27(["promise"])` |
| Error (in-band) | Record with message | `27(["error", "TypeError"])` |

Comment on lines +350 to +361
OCapN messages contain "passable" data that may include references to remote
objects (targets), promises, and errors. To enable efficient message forwarding
through intermediaries without re-serialization, references are encoded using
**in-band markers** within an **embedded CBOR body**, with **parallel arrays**
mapping each marker to its CapTP table position.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So references don't bear any identifying metadata such as a slot index, and the idea is instead to use straightforward counting with repeats as necessary, such that targets [remotable1, remotable2] vs. [remotable1, remotable1] have the same in-band representation but target arrays that vary like e.g.

-[27(["desc:export", 1), 27(["desc:export", 2)]
+[27(["desc:export", 1), 27(["desc:export", 1)]`

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should be merely [27(["desc:export"]), 27(["desc:export"])] in-band.

Copy link
Member

@gibson042 gibson042 Jan 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, that covers "same in-band representation"... I'm asking about what the (presumably out-of-band) target arrays would look like for [remotable1, remotable2] vs. [remotable1, remotable1].

Comment on lines +369 to +378
The body of passable data is encoded as **Tag 24** (Encoded CBOR data item)
wrapping a byte string. This standard CBOR tag indicates that the byte string
contains valid CBOR data.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the structure of the envelope containing this body and some number of reference arrays? Is it itself a CBOR array or map, are there CBOR tags relevent to that level, etc.?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body is a CBOR encoded array. There is no additional tag, as writ.

@kriskowal kriskowal force-pushed the kriskowal-ocapn-cbor branch 2 times, most recently from bc494f5 to de7e849 Compare January 6, 2026 17:24
A reference to a remote object (target) in the CapTP tables.

```
27([280("target")])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why a target string here? why not a CBOR tag number?

(likewise promise and error markers)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a CBOR tag number would require a trip to IANA. 27 is a CBOR tag number we’re using to denote a Syrup record, which is analogous to a single-tier Cap’n Web array, e.g., ["date",ms]. CBOR tag number 280 indicates a symbol. So, this is a lot like the existing <desc:import-object> family of descriptors, but we’re collapsing all the targets, promises, and errors to place-holders and all other distinctions moving to slots, as is consistent with Endo’s marshaling. We could add back a gratuitous slot index to each of these, which would make it easier for a human to parse the CBOR diagnostic format.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using a CBOR tag number would require a trip to IANA.

'nuff said.

@kriskowal kriskowal force-pushed the kriskowal-ocapn-cbor branch from de7e849 to bac01ff Compare January 9, 2026 07:52
Comment on lines +208 to +210
case 'number-prefix':
// Could be integer or float - check further or read as appropriate
const num = reader.readInteger(); // or readFloat64()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't readInteger() misbehave if the value is actually floating point, and readFloat64() misbehave if the value is actually an integer? I see no value in conflating the two distinct types via a single value from peekTypeHint.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These methods assert the type matches the expected type by direct observation. This is an artifact of our desire to, in many case, avoid reïfying certain values, and in particular, reïfying symbols as strings when it suits us (often).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The explanation is sloppy. Working on that.

Comment on lines +36 to +39
| Error | Record with string label | `27(["error", "message"])` |
| Target (in-band) | Record marker | `27(["target"])` |
| Promise (in-band) | Record marker | `27(["promise"])` |
| Error (in-band) | Record with message | `27(["error", "TypeError"])` |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does Error appear twice?

Comment on lines +46 to +47
| 0 | Unsigned int | Length prefixes only (not for OCapN integers) |
| 1 | Negative int | Length prefixes only (not for OCapN integers) |
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"not for OCapN integers"?

Comment on lines +341 to +353
### Error

An error is a Record with string label "error" containing a message string.

```
D8 1B # Tag 27 (Record)
82 # Array of 2 elements
65 # Text string, 5 bytes
65 72 72 6F 72 # "error"
<message string>
```

**Diagnostic**: `27(["error", "Error message here"])`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems inconsistent with Quick Reference—is the element after "error" a message or a type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My intention is that the second value in the array is a message and that it be carried in-band, while the respective identifier be carried out-of-band. There should actually be a third element with an OCapN Struct with additional data.

@kriskowal kriskowal mentioned this pull request Jan 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

agenda Topics for next Endo meeting agenda

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants