Datasets: Qri’s Building Block

Datasets are recorded structured data. By design, Qri can only store datasets. Unlike general version control systems, all datasets stored in qri can interoperate because they are the same kind of document. Datasets are stored & transmitted in standard data formats (eg. JSON, CSV), allowing outside systems to bypass qri entirely to interact directly with datasets qri produces & consumes.

Datasets are defined to have the following properties by default:

  • Versioned datasets have git-like version histories that track
  • Attributed all changes are signed with keypair cryptography
  • Archival all datasets are immutable, timestamped, and identified by their hash
  • Interoperable datasets can be exported & converted to different data formats
  • Tolerant datasets are designed to still work when data is invalid, or use little-to-no schema definition.

A Dataset can theoretically be any size, but in these early stages we’re targeting datasets that are 1Gig and under in size. We’re doing work today that will allow datasets to function at any scale.