The transform component binds code to a dataset. A transform script is executed whenever a dataset version is saved with a transform specified, before persisting the dataset itself. Transform scripts by default only execute once, "on the way in". The script itself is embedded within the dataset version it's saved with, and is versioned along with all of the other components of the dataset.
The transform component binds code to a dataset. A transform script is executed whenever a dataset version is saved with a transform is specified, before persisting the dataset itself. Transform scripts by default only execute once, "on the way in". The script itself is embedded within the dataset version it's saved with.
Qri executes transforms in a sandbox, with no access to the local filesystem and staged internet access. The Qri sandbox is intended to make scripts portable. In Qri you can fetch a dataset that someone else has written a transform for, and recall that script, re-execute the transform to produce new dataset versions.
Scripts are written in starlark, which is a dialect of Python 3 with a number of features removed.
A transform script must define a function called
transform. Qri will call this function as the "main function" of a script. Here's an example of a transform script that does nothing:
the two arguments to transform are
ds: a dataset object and
ctx: a transformation context.
The dataset argument represents the latest saved version of the dataset this script is running on. Any modifications to the
ds are saved in the version that's being created.
Here's an example of a transform that sets the dataset's
title in the
def transform(ds, ctx):ds.set_meta("title", "My Great Dataset")
There are many other methods available for mutating a dataset (most transforms will call
ds.set_body to update the body). For more details check the starlark dataset package docs or the transform examples docs page.
The context argument provides additional data a script can use in execution. A classic example are secrets, things like API keys. You can provide secrets to a save command like this:
qri save --file transform.star --secrets random_word,apples
To get the secrets provided to a dataset, we use the context argument:
def transform(ds, ctx):seed = ctx.get_secret("random_word")
Context can also be used for other things, like getting the results of a download function.
Transform scripts can also access the web. This allows you to create transforms that scrape websites or consume JSON or XML APIs. To call a url, we need to define the
download function. Here's an example:
# load the http.star package available as 'http' in our transform scriptload("http.star", "http")# download runs before transform(), and its return value becomes ctx.downloaddef download(ctx):res = http.get("https://example.com/asset.json")return res.json()def transform(ds, ctx):# set the dataset's body to the json we got from the webds.set_body(ctx.download)
If our script defines a
download function, qri will call
transform and set the return value of
ctx.download. The above script passes the results of the download function directly to
ds.set_body(), setting the body of the dataset to the result of the HTTP JSON response.
The only place where a script has open access to the internet is within the
download function. The download function deliberately does not grant access to local datasets.
Qri is perfectly fine to include both manual manipulations and scripted edits in the same commit, with one caveat: manual edits and transform scripts can't change the same field. If you try to edit the title of a dataset, and write a script that also tries to edit the title, Qri will yell at you.
We do this to preserve a meaningful audit trail. If Qri allowed both types of edits in the same commit, there would be no way to know how any edit was generated, which would weaken the provenance provided by scripted transforms. Transform scripts document in exacting detail how a dataset changed over time. By requiring mutually exclusive transforms within a commit transaction, Qri can provide stronger auditability for datasets that use transforms.