Starlark Standard Library Modules


qri module

you can access these methods from the qri module, eg qri.get_config()

To load:

load("qri.sky", "qri")

Function Definitions:

get_config

qri.get_config(key)

returns the value of a config variable, declared in the dataset file:

# in the dataset.yaml file:
transform:
  scriptpath: transform.sky
  config:
    key: value

get_secret

qri.get_secret(key)

returns the value of a secrets variable, declared in the dataset file:

# in the dataset.yaml file:
transform:
  scriptpath: transform.sky
  secrets:
    key: value

list_datasets

qri.list_datasets()

returns list of datasets references available on your qri node

load("qri.sky", "qri")

def transform(ds):
  datasets = qri.list_datasets()
  #
  # prints a list of string dataset references
  print(datasets) 
  #
  # create a dataset that contains a list of your datasets:
  ds.set_body(datasets)
  return ds

load_dataset_body

qri.load_dataset_body(dataset_referece)

returns the body of the specified dataset as a list or dictionary. Read more about dataset references

load("qri.sky", "qri")

def transform(ds):
  # let's say there is a dataset named "2017_billboard_top_100" and a dataset named "2018_billboard_top_100"
  # let's create a dataset of the artists that are on both lists:
  billboard_2017 = qri.load_dataset_body("me/2017_billboard_top_100")
  billboard_2018 = qri.load_dataset_body("me/2018_billboard_top_100")
  #
  artists = []
  for i in range(0, len(billboard_2017)):
    artist = billboard_2017[i]['artist']
    #
    # if we've already encountered this artist,
    # move on to the next one
    if artist in artists:
      continue
    #
    # iterate through billboard_2018, if this artist
    # appears there, add it to the list of artists
    # and break out of the for loop
    for j in range(0, len(billboard_2018)):
      if artist == billboard_2018[j]['artist']:
        artists.append(artist)
        break
  #
  # ensure the list is unique
  artists = list(set(artists))
  ds.set_Body(artists)
  return ds

load_dataset_head

qri.load_dataset_head()

loads all the parts of the dataset, except for the body, as a dictionary with all or some of these keys: meta, structure, commit, transform, viz. If the dataset does not contain a transform, for example, then the dataset head dictionary will not contain a transform field.

load("qri.sky", "qri")

def transform(ds):
  # let's say you want to create a dataset that contains some
  # descriptive elements of a previous dataset
  # in this case, the meta, the description, and the format
  head = qri.load_dataset_head("me/previous_dataset")
  #
  title = ""
  description = ""
  format = ""
  #
  if "meta" in head:
    if "title" in head["meta"]:
      title = head["meta"]["title"]
    if "description" in head["meta"]:
      description = head["meta"]["description"]
  #
  if "structure" in head:
    if "format" in head["structure"]:
      format = head["structure"]["format"]
  #
  ds.set_body({"title":title, "description": description, "format": format})
  return ds

dataset object - ds

you can access these methods from the dataset object. A dataset object gets passed into and returned from the transform and download functions, usually referred to as ds


Function Definitions:

get_body

ds.get_body()

returns the body of the current dataset as a list or dictionary

set_body

ds.set_body(body, raw)

body should usually be a list or a dictionary. raw is a boolean value. If true, it expects body to be a string and will store the body as byte data. Returns None.

set_meta

ds.set_meta(field, value)

Sets a specific field of the meta to the value. field and value are both strings. Returns None.

load("qri.sky", "qri")

def transform(ds):
  ds.set_meta("title", "Reference Transform")
  return ds

set_schema

ds.set_schema(value)

value is a dictionary written as a json schema. Returns None

load("qri.sky", "qri")

def transform(ds):
  schema = {
    "type": "array",
    "items": {
      "type": "array",
      "items": [{
          "description": "type of animal",
          "title": "Animal",
          "type": "string"
        }, {
          "description": "number of legs this animal has",
          "title": "Number of Legs",
          "type": "integer"
        }
      ]
    }
  }

  ds.set_schema(schema)
  ds.set_body([
    ["cat", 4],
    ["bird", 2],
    ["snake", 0]
  ])
  return ds

html module

can only be used in the download function. You must use the html function to parse an html response, to get a document object that you can traverse

To load and parse:

load("html.sky", "html")
load("http.sky", "http")

def download(ds):
  res = http.get("https://some-website.com")
  # parses the response body into something that 
  # can be traversed using a jquery-like syntax
  doc = html(res.content())
  return ds

The following methods can be used on the document object that gets returned by using the html function:


Function Definitions:

attr

selection.attr(attribute)

Returns a string of the given attribute for that selection in the document. attribute is a string.

load("html.sky", "html")

def transform(ds):
  example_html = '<div title="hello world" class="example_class_name"><p>test</p></div>'
  #
  # find returns a list of elements that match the search string:
  divs = html(example_html).find("div")
  #
  # get the first element in the list:
  div = divs.first()
  #
  # get the attr with the key "title"
  attr = div.attr("title")
  print(attr) # prints "hello world"
  #
  ds.set_body([attr])
  return ds

children

selection.children()

gets the child elements of each element in the Selection. It returns a new Selection object containing these elements

load("html.sky", "html")

def transform(ds):
  example_html = '<body><p class="A">a</p><p class="B">b</p><p class="C">c</p><p class="D">d</p></body>'
  doc = html(example_html)
  # gives you a selection of the body element
  body = doc.find("body")
  # gives you a selection made up of the children of body
  children_len = body.children().len()
  print(children_len) # prints 4, specifically the 4 p elements
  return ds

children_filtered

selection.children_filtered(filter)

gets the child elements of each element in the selection, filtered by the specified by the filter string. It returns a new Selection object containing these elements

load("html.sky", "html")

def transform(ds):
  example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="A">a</p><p class="B">b</p></div></body>'
  doc = html(example_html)
  # find the body element and select it's two div element children
  divs = doc.find("body").children()
  # filter the divs to get the 2 p elements with class name ".A"
  p_as = divs.children_filtered(".A") 
  print(p_as.text()) # prints "aa"
  return ds

contents

selection.contents()

gets the children of each element in the Selection, including text and comment nodes. It returns a new Selection object containing these elements

load("html.sky", "html")

def transform(ds):
  example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="C">c</p><p class="D">d</p></div></body>'
  doc = html(example_html)
  # get the body
  body = doc.contents()
  print(body.len()) # prints "2", the 2 div elements
  print(body.text()) # prints "abcd"
  return ds

eq

selection.eq(index)

returns node i as a new selection

load("html.sky", "html")

def transform(ds):
  example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="C">c</p><p class="D">d</p></div></body>'
  doc = html(example_html)
  #
  # get the body
  body = doc.find("p")
  print(body.len()) # prints "4", the 4 p elements
  #
  # create list of the text in each element:
  texts = []
  for i in range(body.len()):
    # use `eq` function to access each node
    texts.append(body.eq(i).text())
  print(texts) # prints ["a", "b", "c", "d"]
  #
  ds.set_body(texts)
  return ds

find

selection.find(selector)

gets the descendants of each element in the current set of matched elements, filtered by a given selector string. It returns a new Selection containing these matched elements.

load("html.sky", "html")

def transform(ds):
  example_html = '<body><p class="A">a</p><p class="B">b</p></div><div><p class="A">a</p><p class="B">b</p></div></body>'
  doc = html(example_html)
  # get the elements with the class ".B"
  p_Bs = doc.find(".B")
  print(p_Bs.len()) # prints "2", the 2 p elements with class ".B"
  print(p_Bs.text()) # prints "bb"
  return ds

first

selection.first()

returns the first element of the selection as a new selection

load("html.sky", "html")

def transform(ds):
  example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="C">c</p><p class="D">d</p></div></body>'
  doc = html(example_html)
  # get the body
  body = doc.find("body")
  # get the children of the body, in this case 2 divs
  divs = body.children()
  # get the first div
  div = divs.first()
  print(div.text()) # prints "ab"
  return ds

filter

selection.filter(selector)

reduces the set of matched elements to those that match the selector string. It returns a new Selection object for this subset of matching elements

load("html.sky", "html")

def transform(ds):
  example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="A">a</p><p class="B">b</p></div></body>'
  doc = html(example_html)
  # get a list of all the p tags
  ps = doc.find("p")
  print(ps.len()) # prints "4", the 4 p elements
  # filter ps to find elements with class ".B"
  p_Bs = ps.filter(".B")
  print(p_Bs.len()) # prints "2", the 2 p elements with class ".B"
  print(p_Bs.text()) # prints "bb"
  return ds

has

selection.has()

reduces the set of matched elements to those that have a descendant that matches the selector string. It returns a new Selection object with the matching elements

load("html.sky", "html")

def transform(ds):
  example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="C">c</p><p class="D">d</p></div></body>'
  doc = html(example_html)
  # get body element
  divs = doc.find("body").children()
  # get div that has a child with class ".A"
  div = divs.has(".A")
  ps = div.children()
  print(ps.text()) # prints "ab", 2 p elements that exist in the div element
  return ds

last

selection.last()

returns the last element of the selection as a new selection

load("html.sky", "html")

def transform(ds):
  example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="C">c</p><p class="D">d</p></div></body>'
  doc = html(example_html)
  # get the body
  body = doc.find("body")
  # get the children of the body, in this case 2 divs
  divs = body.children()
  # get the first div
  div = divs.last()
  print(div.text()) # prints "cd"
  return ds

len

selection.len()

returns the length of the nodes in the selection as an integer

load("html.sky", "html")

def transform(ds):
  example_html = '<body><p class="A">a</p><p class="B">b</p><p class="C">c</p><p class="D">d</p></body>'
  doc = html(example_html)
  # get the body
  body = doc.find("body")
  print(body.len()) # prints "4", the 4 p elements
  return ds

parent

selection.parent()

gets the parent of each element in the Selection. It returns a new Selection object containing the matched elements

load("html.sky", "html")

def transform(ds):
  example_html = '<body><div title="hi"><p class="A">a</p><p class="B">b</p></div><div><p class="C">c</p><p class="D">d</p></div></body>'
  doc = html(example_html)
  # select p elements
  ps = doc.find("p")
  # get parents of p elements, in this case 2 divs
  divs = ps.parent()
  print(divs.len()) # 2 div parent elements
  title = divs.first().attr("title")
  print(title)
  return ds

parents_until

selection.parents_until(selector)

gets the ancestors of each element in the Selection, up to but not including the element matched by the selector string. It returns a new Selection object containing the matched elements

load("html.sky", "html")

def transform(ds):
  example_html = '<body><div title="hi"><span title="bye"><p class="A">a</p><p class="B">b</p></span></div><div title="woo"><span title="weee"><p class="C">c</p><p class="D">d</p></span></div></body>'
  doc = html(example_html)
  # select p elements
  ps = doc.find("p")
  # get all parents of the p elements, but doesn't include the body element
  parents = ps.parents_until("body")
  print(parents.len()) # 4 div parent elements
  print(parents.eq(0).attr("title")) # prints "bye"
  print(parents.eq(1).attr("title")) # prints "hi"
  print(parents.eq(2).attr("title")) # prints "weee"
  print(parents.eq(3).attr("title")) # prints "woo"
  return ds

siblings

selection.siblings()

gets the siblings of each element in the Selection. It returns a new Selection object containing the matched elements

load("html.sky", "html")

def transform(ds):
  example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="C">c</p><p class="D">d</p></div></body>'
  doc = html(example_html)
  # select elements with class ".A"
  a = doc.find(".A")
  # get all parents of the p elements, but doesn't include the body element
  b = a.siblings()
  print(b.len()) # prints "1", the p element with class ".B"
  print(b.text()) # prints "b"
  return ds

text

text()

gets the combined text contents of each element in the set of matched elements, including their descendants

load("html.sky", "html")

def transform(ds):
  example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="C">c</p><p class="D">d</p></div></body>'
  doc = html(example_html)
  a = doc.text() # prints "abcd"
  return ds

http module

you can access these methods from the http module

response object

you can access these methods from a response object


Function Definitions:

delete

http.delete(url, params, headers, data, jsondata, auth)

sends a DELETE request to the given url. Returns a response.

param type optional?
url string url no
params dictionary of param names to param values yes
headers dictionary of header names to header values yes
data dictionary yes
jsondata dictionary or list yes
auth string yes

get

http.get(url, params, headers, data, jsondata, auth)

sends a GET request to the given url. Returns a response.

param type optional?
url string url no
params dictionary of param names to param values yes
headers dictionary of header names to header values yes
data dictionary yes
jsondata dictionary or list yes
auth string yes
load("http.sky", "http")

def download(ds):
  res = http.get("https://www.fake-json-response-endpoint.com")
  ds.set_body(res.json())
  return ds

options

http.options(url, params, headers, data, jsondata, auth)

Sends an OPTIONS request to the given url. Returns a response.

param type optional?
url string url no
params dictionary of param names to param values yes
headers dictionary of header names to header values yes
data dictionary yes
jsondata dictionary or list yes
auth string yes

patch

http.patch(url, params, headers, data, jsondata, auth)

Sends a PATCH request to the given url. Returns a response.

param type optional?
url string url no
params dictionary of param names to param values yes
headers dictionary of header names to header values yes
data dictionary yes
jsondata dictionary or list yes
auth string yes

post

http.post(url, params, headers, data, jsondata, auth)

Sends a POST request to the given url. Returns a response.

param type optional?
url string url no
params dictionary of param names to param values yes
headers dictionary of header names to header values yes
data dictionary yes
jsondata dictionary or list yes
auth string yes

put

http.put(url, params, headers, data, jsondata, auth)

Sends a PUT request to the given url. Returns a response.

param type optional?
url string url no
params dictionary of param names to param values yes
headers dictionary of header names to header values yes
data dictionary yes
jsondata dictionary or list yes
auth string yes

response

content

response.content()

returns the raw data as a string. This string can be passed to html(content_string) to return a document that can be parsed by the html functions.

load("http.sky", "http")
load("html.sky", "html")

def download(ds):
  res = http.get("https://some-website-to-get-html.com")
  doc = html(res.content())
  # do stuff here with html document
  ds.set_body(some_data)
  return ds

encoding

response.encoding

a string, the different forms of encoding used in the response.

headers

response.headers

a dictionary of the response headers.

json

response.json()

attempts the response body as json.

status_code

response.status_code

an integer, the status code of the response.

text

response.text()

returns the raw data as a string. This string can be passed to html(text) to return a document that can be parsed by the html functions.

url

response.url

a string representation of the url


time module

you can access these methods from the time module:

time object

you can access these methods from a time object:


Function Definitions:

time

time.time(time_string, format_string, location_string)

converts a time string, that is in format to a time object. Returns a time object. If no format or location are given, it assumes you mean to to use RFC3339: “2006-01-02T15:04:05Z07:00”. To learn more about format strings checkout to time/format golang page, or this helpful blog post from flaviocopes

load("time.sky", "time")

def transform(ds):
  # an example with no format or location string
  time_string = "2018-10-31T00:00:00Z"
  t = time.time(time_string)
  print(t.month()) # prints 10
  print(t.day()) # prints 31
  print(t.year()) # prints 2018
  return ds
load("time.sky", "time")

def transform(ds):
  # an example with a format and location string:
  # basically, as long as you use the data
  # Mon Jan 2 15:04:05 -0700 MST 2006 as a reference, you are good
  print(time.location("America/New_York"))
  t = time.time("November 15, 2018", "January 2, 2006", "America/New_York")
  print(t.month()) # prints 11
  print(t.day()) # prints 15
  print(t.year()) # prints 2018
  return ds 

duration

time.duration(duration_string)

converts a string in ‘00h0m0s’ format to a duration object, returns a duration.

load("time.sky", "time")

def transform(ds):
  duration_1_str = "450h79m300s"
  duration_2_str = "0h0m1s"
  #
  d1 = time.duration(duration_1_str)
  print(d1) # prints "451h24m0s", notice how it parsed the string and converted 300 seconds into 5 min, and 79 + 5 min into 1hr 24 min
  #
  d2 = time.duration(duration_2_str)
  print(d2) # prints "1s"
  #
  print(d1 + d2) # prints "451h24m1s"
  print(d1

  d2) # prints "451h23m59s"
  #
  # create another duration, same length as d2
  #
  d3 = time.duration(duration_2_str)
  print( d2 == d3 ) # prints true
  return ds

You also get a duration when you subtract two times:

load("time.sky", "time")

def transform(ds):
  oct_1 = "2018-10-01T00:00:00Z"
  halloween = "2018-10-31T00:00:00Z"
  f = time.time(oct_1)
  h = time.time(halloween)
  duration = h

  f # subtracting two time values gives you a duration
  print(duration) # prints "720h0m0s"
  return ds

location

time.location(location_string)

loads location based on string. Empty string returns “UTC”

load("time.sky", "time")

def transform(ds):
  loc = time.location("EST")
  print(loc) # returns "EST"
  return ds

now

time.now()

returns the current time

load("time.sky", "time")

def transform(ds):
  now = time.now()
  print(now) # returns current time
  #
  # print date in MM/DD/YYYY format:
  print(str(now.month()) + "/" + str(now.day()) + "/" + str(now.year()))
  return ds

year

t.year()

returns year as int

month

t.month()

returns month as int

day

t.day()

returns day as int

hour

t.hour()

returns hour as int

minute

t.minute()

returns minute as int

second

t.second()

returns second as int

nanosecond

t.nanosecond()

returns nanosecond as int


xlsx module

you can access these methods from the xlsx module. xlsx can only be used in the download function:


Function Definitions:

xlsx.get_url

xlsx.get_url(url)

makes a get request of the url, attempts to return the body as a xlsx file. Can only be used in the download function

This example shows how to use get_url, get_sheets, and get_rows:

load("xlsx.sky", "xlsx")

def download(ds):
  data = xlsx.get_url("https://www.ntia.doc.gov/files/ntia/publications/225-5000-composite-inventory_2015-12-16.xlsx")
  #
  # get the sheets of the xlsx file
  # and print. Will print out a map of ints to sheet names:
  sheets = data.get_sheets()
  print(sheets)
  #
  # get name of first sheet and print:
  sheet1 = sheets[1]
  print(sheet1) # prints "Sheet1"
  #
  # get data from sheet 1 and print:
  rows = data.get_rows(sheet1)
  print(rows) # prints 2d list of data
  #
  # set the first 10 rows as the body:
  ds.set_body(rows[:10])
  return ds

get_sheets

x.get_sheets()

returns a map of ints to sheet names, indexing starts with 1. See above get_url example for use.

get_rows

x.get_rows(sheet_name)

returns a 2 dimensional list of data from the specified sheet. See above get_url example for use