qri and dataset modules
In Qri two “nonstandard” modules specific to qri are available. these modules are not considered part of the standard library project, and are defined in a different repository. They’re described here to keep documentation complete:
Modules
html module
can only be used in the download
function. You must use the html
function to parse an html response, to get a document object that you can traverse
To load and parse:
load("html.sky", "html")
load("http.sky", "http")
def download(ds):
res = http.get("https://some-website.com")
# parses the response body into something that
# can be traversed using a jquery-like syntax
doc = html(res.content())
return ds
The following methods can be used on the document object that gets returned by using the html
function:
- attr
- children
- children_filtered
- contents
- eq
- find
- first
- filter
- get
- has
- last
- len
- parent
- parents_until
- siblings
- text
Function Definitions:
attr
selection.attr(attribute)
Returns a string of the given attribute for that selection in the document. attribute
is a string.
load("html.sky", "html")
def transform(ds):
example_html = '<div title="hello world" class="example_class_name"><p>test</p></div>'
#
# find returns a list of elements that match the search string:
divs = html(example_html).find("div")
#
# get the first element in the list:
div = divs.first()
#
# get the attr with the key "title"
attr = div.attr("title")
print(attr) # prints "hello world"
#
ds.set_body([attr])
return ds
children
selection.children()
gets the child elements of each element in the Selection. It returns a new Selection object containing these elements
load("html.sky", "html")
def transform(ds):
example_html = '<body><p class="A">a</p><p class="B">b</p><p class="C">c</p><p class="D">d</p></body>'
doc = html(example_html)
# gives you a selection of the body element
body = doc.find("body")
# gives you a selection made up of the children of body
children_len = body.children().len()
print(children_len) # prints 4, specifically the 4 p elements
return ds
children_filtered
selection.children_filtered(filter)
gets the child elements of each element in the selection, filtered by the specified by the filter
string. It returns a new Selection object containing these elements
load("html.sky", "html")
def transform(ds):
example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="A">a</p><p class="B">b</p></div></body>'
doc = html(example_html)
# find the body element and select it's two div element children
divs = doc.find("body").children()
# filter the divs to get the 2 p elements with class name ".A"
p_as = divs.children_filtered(".A")
print(p_as.text()) # prints "aa"
return ds
contents
selection.contents()
gets the children of each element in the Selection, including text and comment nodes. It returns a new Selection object containing these elements
load("html.sky", "html")
def transform(ds):
example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="C">c</p><p class="D">d</p></div></body>'
doc = html(example_html)
# get the body
body = doc.contents()
print(body.len()) # prints "2", the 2 div elements
print(body.text()) # prints "abcd"
return ds
eq
selection.eq(index)
returns node i as a new selection
load("html.sky", "html")
def transform(ds):
example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="C">c</p><p class="D">d</p></div></body>'
doc = html(example_html)
#
# get the body
body = doc.find("p")
print(body.len()) # prints "4", the 4 p elements
#
# create list of the text in each element:
texts = []
for i in range(body.len()):
# use `eq` function to access each node
texts.append(body.eq(i).text())
print(texts) # prints ["a", "b", "c", "d"]
#
ds.set_body(texts)
return ds
find
selection.find(selector)
gets the descendants of each element in the current set of matched elements, filtered by a given selector
string. It returns a new Selection containing these matched elements.
load("html.sky", "html")
def transform(ds):
example_html = '<body><p class="A">a</p><p class="B">b</p></div><div><p class="A">a</p><p class="B">b</p></div></body>'
doc = html(example_html)
# get the elements with the class ".B"
p_Bs = doc.find(".B")
print(p_Bs.len()) # prints "2", the 2 p elements with class ".B"
print(p_Bs.text()) # prints "bb"
return ds
first
selection.first()
returns the first element of the selection as a new selection
load("html.sky", "html")
def transform(ds):
example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="C">c</p><p class="D">d</p></div></body>'
doc = html(example_html)
# get the body
body = doc.find("body")
# get the children of the body, in this case 2 divs
divs = body.children()
# get the first div
div = divs.first()
print(div.text()) # prints "ab"
return ds
filter
selection.filter(selector)
reduces the set of matched elements to those that match the selector
string. It returns a new Selection object for this subset of matching elements
load("html.sky", "html")
def transform(ds):
example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="A">a</p><p class="B">b</p></div></body>'
doc = html(example_html)
# get a list of all the p tags
ps = doc.find("p")
print(ps.len()) # prints "4", the 4 p elements
# filter ps to find elements with class ".B"
p_Bs = ps.filter(".B")
print(p_Bs.len()) # prints "2", the 2 p elements with class ".B"
print(p_Bs.text()) # prints "bb"
return ds
has
selection.has()
reduces the set of matched elements to those that have a descendant that matches the selector
string. It returns a new Selection object with the matching elements
load("html.sky", "html")
def transform(ds):
example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="C">c</p><p class="D">d</p></div></body>'
doc = html(example_html)
# get body element
divs = doc.find("body").children()
# get div that has a child with class ".A"
div = divs.has(".A")
ps = div.children()
print(ps.text()) # prints "ab", 2 p elements that exist in the div element
return ds
last
selection.last()
returns the last element of the selection as a new selection
load("html.sky", "html")
def transform(ds):
example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="C">c</p><p class="D">d</p></div></body>'
doc = html(example_html)
# get the body
body = doc.find("body")
# get the children of the body, in this case 2 divs
divs = body.children()
# get the first div
div = divs.last()
print(div.text()) # prints "cd"
return ds
len
selection.len()
returns the length of the nodes in the selection as an integer
load("html.sky", "html")
def transform(ds):
example_html = '<body><p class="A">a</p><p class="B">b</p><p class="C">c</p><p class="D">d</p></body>'
doc = html(example_html)
# get the body
body = doc.find("body")
print(body.len()) # prints "4", the 4 p elements
return ds
parent
selection.parent()
gets the parent of each element in the Selection. It returns a new Selection object containing the matched elements
load("html.sky", "html")
def transform(ds):
example_html = '<body><div title="hi"><p class="A">a</p><p class="B">b</p></div><div><p class="C">c</p><p class="D">d</p></div></body>'
doc = html(example_html)
# select p elements
ps = doc.find("p")
# get parents of p elements, in this case 2 divs
divs = ps.parent()
print(divs.len()) # 2 div parent elements
title = divs.first().attr("title")
print(title)
return ds
parents_until
selection.parents_until(selector)
gets the ancestors of each element in the Selection, up to but not including the element matched by the selector
string. It returns a new Selection object containing the matched elements
load("html.sky", "html")
def transform(ds):
example_html = '<body><div title="hi"><span title="bye"><p class="A">a</p><p class="B">b</p></span></div><div title="woo"><span title="weee"><p class="C">c</p><p class="D">d</p></span></div></body>'
doc = html(example_html)
# select p elements
ps = doc.find("p")
# get all parents of the p elements, but doesn't include the body element
parents = ps.parents_until("body")
print(parents.len()) # 4 div parent elements
print(parents.eq(0).attr("title")) # prints "bye"
print(parents.eq(1).attr("title")) # prints "hi"
print(parents.eq(2).attr("title")) # prints "weee"
print(parents.eq(3).attr("title")) # prints "woo"
return ds
siblings
selection.siblings()
gets the siblings of each element in the Selection. It returns a new Selection object containing the matched elements
load("html.sky", "html")
def transform(ds):
example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="C">c</p><p class="D">d</p></div></body>'
doc = html(example_html)
# select elements with class ".A"
a = doc.find(".A")
# get all parents of the p elements, but doesn't include the body element
b = a.siblings()
print(b.len()) # prints "1", the p element with class ".B"
print(b.text()) # prints "b"
return ds
text
text()
gets the combined text contents of each element in the set of matched elements, including their descendants
load("html.sky", "html")
def transform(ds):
example_html = '<body><div><p class="A">a</p><p class="B">b</p></div><div><p class="C">c</p><p class="D">d</p></div></body>'
doc = html(example_html)
a = doc.text() # prints "abcd"
return ds
http module
you can access these methods from the http
module
response object
you can access these methods from a response object
Function Definitions:
delete
http.delete(url, params, headers, data, jsondata, auth)
sends a DELETE request to the given url. Returns a response.
param | type | optional? |
---|---|---|
url | string url | no |
params | dictionary of param names to param values | yes |
headers | dictionary of header names to header values | yes |
data | dictionary | yes |
jsondata | dictionary or list | yes |
auth | string | yes |
get
http.get(url, params, headers, data, jsondata, auth)
sends a GET request to the given url. Returns a response.
param | type | optional? |
---|---|---|
url | string url | no |
params | dictionary of param names to param values | yes |
headers | dictionary of header names to header values | yes |
data | dictionary | yes |
jsondata | dictionary or list | yes |
auth | string | yes |
load("http.sky", "http")
def download(ds):
res = http.get("https://www.fake-json-response-endpoint.com")
ds.set_body(res.json())
return ds
options
http.options(url, params, headers, data, jsondata, auth)
Sends an OPTIONS request to the given url. Returns a response.
param | type | optional? |
---|---|---|
url | string url | no |
params | dictionary of param names to param values | yes |
headers | dictionary of header names to header values | yes |
data | dictionary | yes |
jsondata | dictionary or list | yes |
auth | string | yes |
patch
http.patch(url, params, headers, data, jsondata, auth)
Sends a PATCH request to the given url. Returns a response.
param | type | optional? |
---|---|---|
url | string url | no |
params | dictionary of param names to param values | yes |
headers | dictionary of header names to header values | yes |
data | dictionary | yes |
jsondata | dictionary or list | yes |
auth | string | yes |
post
http.post(url, params, headers, data, jsondata, auth)
Sends a POST request to the given url. Returns a response.
param | type | optional? |
---|---|---|
url | string url | no |
params | dictionary of param names to param values | yes |
headers | dictionary of header names to header values | yes |
data | dictionary | yes |
jsondata | dictionary or list | yes |
auth | string | yes |
put
http.put(url, params, headers, data, jsondata, auth)
Sends a PUT request to the given url. Returns a response.
param | type | optional? |
---|---|---|
url | string url | no |
params | dictionary of param names to param values | yes |
headers | dictionary of header names to header values | yes |
data | dictionary | yes |
jsondata | dictionary or list | yes |
auth | string | yes |
response
content
response.content()
returns the raw data as a string. This string can be passed to html(content_string)
to return a document that can be parsed by the html
functions.
load("http.sky", "http")
load("html.sky", "html")
def download(ds):
res = http.get("https://some-website-to-get-html.com")
doc = html(res.content())
# do stuff here with html document
ds.set_body(some_data)
return ds
encoding
response.encoding
a string, the different forms of encoding used in the response.
headers
response.headers
a dictionary of the response headers.
json
response.json()
attempts the response body as json.
status_code
response.status_code
an integer, the status code of the response.
text
response.text()
returns the raw data as a string. This string can be passed to html(text)
to return a document that can be parsed by the html
functions.
url
response.url
a string representation of the url
math module
docs coming soon. In the meantime see https://godoc.org/github.com/qri-io/starlib/math
re module
docs coming soon. In the meantime see https://godoc.org/github.com/qri-io/starlib/re
time module
you can access these methods from the time
module:
time object
you can access these methods from a time
object:
Function Definitions:
time
time.time(time_string, format_string, location_string)
converts a time string, that is in format to a time object. Returns a time object. If no format or location are given, it assumes you mean to to use RFC3339: “2006-01-02T15:04:05Z07:00”. To learn more about format strings checkout to time/format golang page, or this helpful blog post from flaviocopes
load("time.sky", "time")
def transform(ds):
# an example with no format or location string
time_string = "2018-10-31T00:00:00Z"
t = time.time(time_string)
print(t.month()) # prints 10
print(t.day()) # prints 31
print(t.year()) # prints 2018
return ds
load("time.sky", "time")
def transform(ds):
# an example with a format and location string:
# basically, as long as you use the data
# Mon Jan 2 15:04:05 -0700 MST 2006 as a reference, you are good
print(time.location("America/New_York"))
t = time.time("November 15, 2018", "January 2, 2006", "America/New_York")
print(t.month()) # prints 11
print(t.day()) # prints 15
print(t.year()) # prints 2018
return ds
duration
time.duration(duration_string)
converts a string in ‘00h0m0s’ format to a duration object, returns a duration.
load("time.sky", "time")
def transform(ds):
duration_1_str = "450h79m300s"
duration_2_str = "0h0m1s"
#
d1 = time.duration(duration_1_str)
print(d1) # prints "451h24m0s", notice how it parsed the string and converted 300 seconds into 5 min, and 79 + 5 min into 1hr 24 min
#
d2 = time.duration(duration_2_str)
print(d2) # prints "1s"
#
print(d1 + d2) # prints "451h24m1s"
print(d1
d2) # prints "451h23m59s"
#
# create another duration, same length as d2
#
d3 = time.duration(duration_2_str)
print( d2 == d3 ) # prints true
return ds
You also get a duration when you subtract two times:
load("time.sky", "time")
def transform(ds):
oct_1 = "2018-10-01T00:00:00Z"
halloween = "2018-10-31T00:00:00Z"
f = time.time(oct_1)
h = time.time(halloween)
duration = h
f # subtracting two time values gives you a duration
print(duration) # prints "720h0m0s"
return ds
location
time.location(location_string)
loads location based on string. Empty string returns “UTC”
load("time.sky", "time")
def transform(ds):
loc = time.location("EST")
print(loc) # returns "EST"
return ds
now
time.now()
returns the current time
load("time.sky", "time")
def transform(ds):
now = time.now()
print(now) # returns current time
#
# print date in MM/DD/YYYY format:
print(str(now.month()) + "/" + str(now.day()) + "/" + str(now.year()))
return ds
year
t.year()
returns year as int
month
t.month()
returns month as int
day
t.day()
returns day as int
hour
t.hour()
returns hour as int
minute
t.minute()
returns minute as int
second
t.second()
returns second as int
nanosecond
t.nanosecond()
returns nanosecond as int
xlsx module
you can access these methods from the xlsx
module. xlsx
can only be used in the download
function:
Function Definitions:
xlsx.get_url
xlsx.get_url(url)
makes a get request of the url, attempts to return the body as a xlsx file. Can only be used in the download
function
This example shows how to use get_url
, get_sheets
, and get_rows
:
load("xlsx.sky", "xlsx")
def download(ds):
data = xlsx.get_url("https://www.ntia.doc.gov/files/ntia/publications/225-5000-composite-inventory_2015-12-16.xlsx")
#
# get the sheets of the xlsx file
# and print. Will print out a map of ints to sheet names:
sheets = data.get_sheets()
print(sheets)
#
# get name of first sheet and print:
sheet1 = sheets[1]
print(sheet1) # prints "Sheet1"
#
# get data from sheet 1 and print:
rows = data.get_rows(sheet1)
print(rows) # prints 2d list of data
#
# set the first 10 rows as the body:
ds.set_body(rows[:10])
return ds
get_sheets
x.get_sheets()
returns a map of ints to sheet names, indexing starts with 1. See above get_url
example for use.
get_rows
x.get_rows(sheet_name)
returns a 2 dimensional list of data from the specified sheet. See above get_url
example for use
zipfile module
_docs coming soon. In the meantime see https://godoc.org/github.com/qri-io/starlib/zipfile