Create a Dataset from a CSV
Introduction
Your local Qri repository is a place where you can manage and version datasets. In this guide, we'll create a new dataset from a CSV, add some metadata, and commit an updated version of the data.
Prerequisites
The Qri CLI installed
An Initalized Qri Repository - Setup Your Local Qri Repository
Directions
Step 1: Start with a CSV file
For this example, we'll use a simple CSV named us-top-10-cities-pop.csv
:
us-top-10-cities-pop.csvcity, state, population_2020"New York City", "NY", 8622357"Los Angeles", "CA", 4085014"Chicago", "IL", 2670406"Houston", "TX", 2378146"Phoenix", "AZ", 1743469"Philadelphia", "PA", 1590402"San Antonio", "TX", 1579504"San Diego", "CA", 1469490"Dallas", "TX", 1400337"San Jose", "CA", 1036242
The csv file can live anywhere on your computer, you'll just need to know its absolute path or its path relative to your working directory.
Step 2: Run Qri Save to Create the Dataset
To create the dataset, run qri save
with the --body
flag followed by the path to the CSV and the name of the dataset you want to create.
qri save --body /path/to/file.csv me/some-dataset-name
Why do we need the --body
flag? Qri datasets contain more than just data, so we store the data in a component called body
. Learn more about Qri Components
Remember, qri datasets always have names formatted as {username}/{datasetname}
. You can use the me
placeholder instead of your own username.
In this example, we give the qri dataset a slightly more verbose name of top-ten-us-cities-by-population
than what is in the CSV's filename.
$ qri save --body ./us-top-10-cities-pop.csv me/top-ten-us-cities-by-populationsaving done [==============================================================================]dataset saved: chriswhong-demo/top-ten-us-cities-by-population@/ipfs/QmQ83pxykRjPDD39hKk1RW7KRLCLPQMAAThTsv5cnzveHT
Voila, a new dataset is born!
Running qri list
will list your collection of datasets showing the new dataset.
$ qri list1 chriswhong-demo/top-ten-us-cities-by-population/ipfs/QmQ83pxykRjPDD39hKk1RW7KRLCLPQMAAThTsv5cnzveHT310 B, 10 entries, 0 errors, 1 version
Running qri log
shows the version history of the dataset. For each version, we can see the commit hash, timestamp, size, and message. Notice that Qri has auto-generated the message created dataset from us-top-10-cities-pop.csv
for this commit because we didn't specify one when running qri save
.
$ qri log me/top-ten-us-cities-by-population1 Commit: /ipfs/QmQ83pxykRjPDD39hKk1RW7KRLCLPQMAAThTsv5cnzveHTDate: Wed Oct 6 16:04:17 EDT 2021Storage: localSize: 310 Bcreated dataset from us-top-10-cities-pop.csv
Step 3: Add Some Metadata
We can store metadata in the dataset's meta
component. Metadata fields like title
and description
are common, and to add them to this dataset, we set up a JSON file meta.json
meta.json
Just as we did with the CSV, running qri save
and providing a file will yield a new version of the dataset. Because we aren't changing the body, we use the --file
flag.
We can also use the --title
flag to describe how this commit changes the dataset.
$ qri save --file meta.json --title "add a title and description to meta" me/top-ten-us-cities-by-populationsaving done [==============================================================================]dataset saved: chriswhong-demo/top-ten-us-cities-by-population@/ipfs/QmP6RSBh1du1Zztqh2Ak9Q4JnPU8nAvLAZHe6yqdmnZghw
Now the dataset has two commits:
$ qri log me/top-ten-us-cities-by-population1 Commit: /ipfs/QmP6RSBh1du1Zztqh2Ak9Q4JnPU8nAvLAZHe6yqdmnZghwDate: Wed Oct 6 16:38:56 EDT 2021Storage: localSize: 310 Badd a title and description to metameta added2 Commit: /ipfs/QmQ83pxykRjPDD39hKk1RW7KRLCLPQMAAThTsv5cnzveHTDate: Wed Oct 6 16:04:17 EDT 2021Storage: localSize: 310 Bcreated dataset from us-top-10-cities-pop.csv
You can continue modifying the csv and json files, and using qri save
to make new versions of this dataset. Each version is immutable, and you can inspect and export the older versions if you ever need them again.
Some things to try:
qri diff me/top-ten-us-cities-by-population
will show you what changed between the latest version and the previous versionqri get body me/top-ten-us-cities-by-population
will get the body as a CSV, which you can save to file or pipe into another command
Additional Resources
- Once your dataset is in good shape, learn how to push it to Qri Cloud so other Qri users can find it.