Stride API Documentation

Stride is a realtime analytics API. A fully managed service, Stride enables developers to collect streams of events in realtime and construct networks of continuous queries on top of those streams that do things like high-throughput continuous aggregation, distincts counting, sliding-window computations, fire webhooks, run massive retroactive batch queries, and more.

Realtime events can also be streamed out of Stride by subscribing to streams of changes as they're applied throughout your setup, although you can issue on-demand queries too. Events are schemaless JSON, and queries are defined using Stride SQL, a simplified dialect of SQL designed for productivity and ease of use.

Stride also gives you a beautiful web interface to manage your environment and visualize your data. Let's start with basics of the Stride API and then dive into how you can process your data by composing continuous queries into arbitrary

Security

We provide each account with two types of API keys: secret and open, which can be retrieved from the account page within the Stride web interface. Your secret key can be used to access any endpoint as well as create and destroy resources. open keys can only be used to write events to the /collect endpoint

Your secret key is meant to be kept, well, secret, and should never be included in client side code. Your open key can be used in client side code, at the risk of someone using it to post junk data using your key.

Furthermore, no request should ever include an API key in the url. Stride uses request headers for authentication, and it is only available via HTTPS so none of this will ever be transmitted in the clear. We will also never ask you for any of your API keys, they are yours to protect!

Headers

Authentication

Stride uses HTTP Basic Auth to authenticate incoming requests. Basic Auth base64 encodes a username:password pair, prepends the encoded string with Basic and sends the result string in the Authorization header field. When using the Stride API, the account's API key should be used as the username and the password should be left blank. So if your API key is helloworld, the final header will be: Authorization: Basic aGVsbG93b3JsZDo=.

Content-Type

All requests made to the Stride API must set the Content-Type header to be application/json.

Response Codes

We use conventional HTTP response status codes to indicate the result of an API request:

Limits

There are two kinds of limits associated with your Stride account: resource limits and rate limits. All limits can be viewed on your account page when logged in to the Stride web application, and we'll also briefly cover some of the important ones here.

Resource limits

Resource limits cap the number of each type of resource that you can create. Currently these limits default to:

Rate limits

Rate limits specify a few things for each of Stride's endpoints:

While all of these limits can be viewed on your account page within the Stride web application, here are some of the important defaults:

MethodEndpointRequests/minuteBytes in/minuteBytes out/minuteConcurrent subscribers
POST/collect/$stream100,000100MBN/AN/A
GET/analyze/$saved_query/results1,000N/A1GBN/A
POST/analyze/$saved_query/results1,000N/A1GBN/A
POST/analyze1,000N/A1GBN/A
GET/analyze/$saved_query/results1,000N/A1GBN/A
GET/process/$process/subscribe100N/A1GB8
GET/collect/$stream/subscribe100N/A1GB8

If you need any of your resource or rate limits increased, feel free to email us at support@stride.io and we'll get you taken care of.

Endpoints

/collect

The collect endpoint is used for collecting events. An event is a JSON object with any number of properties which describe the context in which the event took place. All JSON value types are supported, including nested objects. A sample event looks like:

{
  "$timestamp": "2015-05-05T23:40:27Z",
  "path": "/url/to/page",
  "ip": "50.233.123.210",
  "user": {
    "id": "deadbeef",
    "name": "Alyssa Hacker",
    "age": 31
  },
  "product": {
    "id": "1ceb00da",
    "name": "Qubit",
    "price": 3.99
  }
}

$timestamp is an optional property which describes the time at which the event took place. If ommitted, the server automatically sets it to the time the event was received.

Events are collected into streams. The stream which an event should be emitted into is specified as part of the request. The consumers of these streams are continuous processing tasks which you will learn more about in the process section.

Events can either be sent individually or in bulk for increased throughput.

Collecting Individual Events

To collect individual events, append the stream name to the collect endpoint path and set the request body to the event. The following request will send a single event to the commits stream.

POST https://api.stride.io/v1/collect/commits

{
  "$timestamp": "2015-05-05T23:40:27Z",
  "repo": "pipelinedb/pipelinedb",
  "username": "usmanm",
  "sha1": "690e6814144a174d38ff501c5d89bfff5ff8d6de"
}

Collecting Events In Bulk

There are two ways events can be collected in bulk. If you want to send multiple events to the same stream, use the same endpoint logic as above but send an array of events in the body.

POST https://api.stride.io/v1/collect/commits

[
  {
    "$timestamp": "2015-05-05T23:40:27Z",
    "repo": "pipelinedb/pipelinedb",
    "username": "usmanm",
    "sha1": "690e6814144a174d38ff501c5d89bfff5ff8d6de"
  },
  {
    "$timestamp": "2015-05-05T23:48:03Z",
    "repo": "pipelinedb/pipelinedb",
    "username": "derekjn",
    "sha1": "95bbf000808c8e7493d3c4cdd5aa3d26e91f6f6e"
  }
]

To collect events into multiple streams, make a request to the collect endpoint with no suffix. The body should be a nested JSON object where each top-level property name maps to a stream and its corresponding value to an array of events to be emitted into the stream.

POST https://api.stride.io/v1/collect

{
  "commits": [
    {
      "$timestamp": "2015-05-05T23:40:27Z",
      "repo": "pipelinedb/pipelinedb",
      "username": "usmanm",
      "sha1": "690e6814144a174d38ff501c5d89bfff5ff8d6de"
    },
    {
      "$timestamp": "2015-05-05T23:48:03Z",
      "repo": "pipelinedb/pipelinedb",
      "username": "derekjn",
      "sha1": "95bbf000808c8e7493d3c4cdd5aa3d26e91f6f6e"
    }
  ],
  "pull_requests": [
    {
      "$timestamp": "2015-05-05T23:42:53Z",
      "repo": "pipelinedb/pipelinedb",
      "action": "opened",
      "number": 1576,
      "username": "usmanm"

  ]
}

Subscribing to Streams

You can subscribe to a stream to see sampled events being inserted into a stream in realtime. We use persistent HTTP connections with chunked transfer encoding for streaming events down to the client. This is similar to how Twitter's Streaming API works. The body of the response contains a series of newline delimited events, where each event is a JSON encoded string and newline is equal to \r\n. Events may contain the newline \n character, but will never contain the carriage return \r character.

GET https://api.stride.io/v1/collect/commits/subscribe

  {
    "$timestamp": "2015-05-05T23:40:27Z",
    "repo": "pipelinedb/pipelinedb",
    "username": "usmanm",
    "sha1": "690e6814144a174d38ff501c5d89bfff5ff8d6de"
  }
  \r\n
  {
    "$timestamp": "2015-05-05T23:48:03Z",
    "repo": "pipelinedb/pipelinedb",
    "username": "derekjn",
    "sha1": "95bbf000808c8e7493d3c4cdd5aa3d26e91f6f6e"
  }
  \r\n
  ...

Note: In the above response, you wouldn't see \r\n as 4 ASCII characters. We've just written them as such to make it clear that there is a boundary between messages.

Retreiving Streams

You can query the Stride API to get a list of all the streams that exist in your database. A stream is automatically created the first time an event is collected into it.

GET https://api.stride.io/v1/collect

[
  "commits",
  "pull_requests",
  "app_events",
  "web_logs"
]

You can also check if a named stream exists in the system.

GET https://api.stride.io/v1/collect/commits

A 200 response indicates that the stream exists, while a 404 response indicates the stream doesn't exist.

Deleting Streams

Streams can be deleted by making a DELETE HTTP request.

DELETE https://api.stride.io/v1/collect/commits

A 200 response indicates that the stream was successfully deleted, while a 404 response indicates the stream didn't exist.

/process

The process endpoint is used to create processing tasks which run continuous queries over streams. A processing task consists of two parts: a SQL query to continuously run, and an action that specifies what to do with its output.

Query:

A SQL query that SELECTs from a stream. For example, we could compute the number of commits made by each user per day by running the following query on the commits stream we saw above.

SELECT
  username         AS user,
  date($timestamp) AS day,
  count(*)         AS count
FROM commits
GROUP BY user, day

Or we could filter out commits for a certain repository.

SELECT
  username   AS user,
  $timestamp AS commit_time,
  sha1       AS commit_hash
FROM commits
WHERE repo = 'pipelinedb/pipelinedb'

Stride provides a wide variety of useful aggregate and non-aggregate functions for common operations within queries, which are all covered in the built-in section.

Note: We use a slightly simplified version of SQL, which we call Stride SQL, to describe queries. Check out the Stride SQL section to learn more about it.

Action:

An action describes what should be done with the results produced by the continuous query.

Stride currently supports three different actions:

MATERIALIZE

Store the result of the query durably so that it can be analyzed later.

The results of a continuous MATERIALIZE process are incrementally updated in realtime as they read incoming data. MATERIALIZE processes must contain aggregates. Here's a trivial MATERIALIZE process that counts unique users by hour:

POST https://api.stride.io/v1/process/uniques

{
  "query": "
    SELECT hour($timestamp), count(distinct user_id) FROM stream GROUP BY hour",
  "action": "MATERIALIZE"
}

Time-to-Live (TTL)

A common pattern with MATERIALIZE processes is to include a timestamp component in the grouping. And it is often desirable for rows older than a certain age to be automatically removed in order to keep MATERIALIZE process output compact over time.

MATERIALIZE processes take an optional ttl object parameter to define such behavior, which requires a duration in interval form (such as "1 day") as well as the column used to determine row age.

A ttl can be specified at MATERIALIZE process create time:

POST https://api.stride.io/v1/process/ttl

{
  "query": "
    SELECT hour($timestamp), count(distinct user_id) FROM stream GROUP BY hour",
  "action": "MATERIALIZE",
  "ttl": {
    "column": "hour",
    "duration": "1 month"
  }
}

TTLs can also be added, modified, or removed after a MATERIALIZE process has already been created by issuing a PUT request containing the ttl parameter:

PUT https://api.stride.io/v1/process/ttl

{
  "ttl": {
    "column": "hour",
    "duration": "1 week"
  }
}

To delete a ttl, simply use null in place of a ttl object.

TRANSFORM

SELECT from a stream (possibly joined on one more MATERIALIZE processes), and write out the resulting events to a new stream.

TRANSFORM processes are stateless, and therefore must not contain aggregates. Here's a transform that joins a stream on a MATERIALIZE process and writes out resulting rows to a new stream:

POST https://api.stride.io/v1/process/join_transform

{
  "query": "
    SELECT stream.url, proc.user_id FROM stream JOIN proc ON stream.user_id = proc.user_id
  "action": "TRANSFORM"
}

The resulting stream with joined rows will now be accessible by simply using the TRANSFORM name as a stream name:

POST https://api.stride.io/v1/process/join_consumer

{
  "query": "
    SELECT user_id, count(distinct url) FROM join_transform GROUP BY user_id"
  "action": "MATERIALIZE"
}

WEBHOOK

Encode the result as JSON and send it as an HTTP POST payload to a webhook URL.

The url for a WEBHOOK is specified by the url field of the WEBHOOK process' args dictionary. For example, suppose we had a MATERIALIZE process doing some aggregation, and we wanted to fire a WEBHOOK whenever one of that process' aggregate values exceeded a certain threshold. Here's the process:

POST https://api.stride.io/v1/process/my_proc

{
  "query": "
    SELECT sum(x) AS value FROM stream",
  "action": "MATERIALIZE"
}

And the WEBHOOK that reads from it:

POST https://api.stride.io/v1/process/webhook

{
  "query": "
    SELECT new.value FROM my_proc WHERE old.value =< 10 AND new.value > 10",
  "args": {
    "url": "http://website.com/webhook1"
  },
  "action": "WEBHOOK"
}

Note that if no action is specified with a process' definition, it will automatically be determined based on the following rules:

Building Processing Pipelines

So far we've only seen how we can run continuous queries on events emitted into streams via the /collect endpoint. But Stride lets us do much more. The output of processing tasks is exposed as streams which can be queried by other processing tasks. That's how networks of computations can be built.

Lets put of all of this together by creating such a network on top of the commits stream we saw earlier. First, we'll create a task that calculates per-day aggregates for each repository. The aggregates we're interested in are the number of commits made and the set of users who made them.

POST https://api.stride.io/v1/process/repo_aggs_per_day

{
  "query": "
    SELECT
      repo              AS repo,
      date($timestamp)  AS day,
      set_agg(username) AS users,
      count(*)          AS num_commits
    FROM commits
    GROUP BY repo, day",
  "action": {
    "type": "MATERIALIZE"
   }
}

Next, we want a webhook to be invoked whenever the number of commits made to a repository in a day exceeds 100. To that end, we'll run a processing task that queries the output of the repo_aggs_per_day query. Notice how we use new and old to access the old and updated values of the record. Needless to say, the values of the column being grouped on will be identical.

POST https://api.stride.io/v1/process/popular_repos_per_day

{
  "query": "
    SELECT
      new.day  AS day,
      new.repo AS repo
    FROM repo_aggs_per_day
    WHERE new.count > 100 AND old.count <= 100",
  "action": {
    "type": "WEBHOOK",
    "args": {
      "url": "http://mysite.com/webhook" 
    }
  }
}

The webhook should see data that looks like:

{
  "$timestamp": "2015-05-05T23:48:04Z",
  "day": "2015-05-05",
  "repo": "pipelinedb/pipelinedb"
}

Subscribing to Processes

Just like we can subscribe to streams, we can also subscribes to processes. The response is a realtime stream of the output of the process. Let's say we want to get notified whenever a user makes a commit to the pipelinedb/pipelinedb repository for the first time on a day. We start by creating a process that does that.

POST https://api.stride.io/v1/process/users_per_day

{
  "query": "
    SELECT
      time()            AS timestamp,
      unnest(new.users) AS username
    FROM repo_aggs_per_day
    WHERE NOT user = ANY(old.users) AND new.repo = 'pipelindb/pipelinedb'",
  "action": {
    "type": "MATERIALIZE"
  }
}

This query first unnests the new set of users and then filters out users that were present in the old set of users. We can not subscribe to this process to get a realtime stream of users making commits to our repo.

GET https://api.stride.io/v1/process/users_per_day/subscribe

  {
    "timestamp": "2015-05-05T23:40:27Z",
    "username": "usmanm"
  }
  \r\n
  {
    "timestamp": "2015-05-05T23:48:03Z",
    "username": "derekjn"
  }
  \r\n
  ...

Retreiving Processes

You can query the Stride API to get a list of all registered processes.

GET https://api.stride.io/v1/process

[
  {
    "name": "repo_aggs_per_day",
    "query": "
      SELECT
        repo              AS repo,
        date($timestamp)  AS day,
        set_agg(username) AS users,
        count(*)          AS num_commits
      FROM commits
      GROUP BY repo, day",
    "action": {
      "type": "MATERIALIZE"
     }
  },
  {
    "name": "popular_repos_per_day",
    "query": "
      SELECT
        new.day  AS day,
        new.repo AS repo
      FROM repo_aggs_per_day
      WHERE new.count > 100 AND old.count <= 100"
    "action": {
      "type": "WEBHOOK",
      "args": {
        "url": "http://mysite.com/webhook" 
      }
    }
  },
  {
    "name": "users_per_day",
    "query": "
      SELECT
        time()            AS timestamp,
        unnest(new.users) AS username
      FROM repo_aggs_per_day
      WHERE NOT user = ANY(old.users) AND new.repo = 'pipelindb/pipelinedb'",
    "action": {
      "type": "MATERIALIZE"
    }
  }
]

You can also get the metadata for a single process.

GET https://api.stride.io/v1/process/users_per_day

{
  "name": "users_per_day",
  "query": "
    SELECT
      time()            AS timestamp,
      unnest(new.users) AS username
    FROM repo_aggs_per_day
    WHERE NOT user = ANY(old.users) AND new.repo = 'pipelindb/pipelinedb'",
  "action": {
    "type": "MATERIALIZE"
  }
}

Deleting Processes

Processes can be deleted by making a DELETE HTTP request.

DELETE https://api.stride.io/v1/process/users_per_day

A 200 response indicates that the process was successfully deleted, while a 404 response indicates the process didn't exist.

Sliding-Window Queries

One of the unique benefits of Stride's continuous API is that it allows you to perform computations over sliding windows of time. That is, you can keep track of results only for the last minute, the last hour, day, or whatever frame of time is of interest. Sliding window results will only reflect data receieved between the current moment in time back until the beginning of the window. All out-of-window data is automatically excluded.

Sliding-windows can be applied to both WEBHOOK and MATERIALIZE processes by including the "over" argument in the process definition. Let's look at an examples to illustrate how this can help you. The following example represents a common monitoring use case in which we want to fire an alert any time the number of errors in our logs stream goes above 100 over a 10-second window. Without sliding windows, this problem is actually deceptively hard. But with a simple sliding window, Stride makes it dead simple.

We'll use two processes: the first one is a MATERIALIZE process that keeps track of the error count over the last 10 seconds; the second process fires a WEBHOOK whenever the error threshold is exceeeded.

POST https://api.stride.io/v1/process/logs_errors_10s

{
  "query": "
    SELECT
      count(*)    AS errors
    FROM logs
    WHERE type = 'ERROR'"
  "over": "10 seconds",
  "action": {
    "type": "WEBHOOK"
    "args": {
      "url": "http://some.monitoring.service/errors_alarm" 
    }
   }
}

Now let's read from the above process' output and fire a WEBHOOK whenever the count crosses from fewer than 10 to 10 or more:

POST https://api.stride.io/v1/process/logs_errors_10s

{
  "query": "
    SELECT
      new.errors  AS errors
    FROM logs_errors_10s
    WHERE old.errors < 10 AND new.errors >= 10",
  "action": {
    "type": "WEBHOOK"
    "args": {
      "url": "http://some.monitoring.service/errors_alarm" 
    }
   }
}

Note that we have access to the previous and new values when reading from the output of another process.

Tables

"Tables" in Stride are simply a special form of a MATERIALIZE process. Since aggregates are required for materialization, the latest aggregate can be used to achieve the behavior of a static table of rows. The value of the latest aggregate is simply the most recent value seen for the aggregation group it belongs to, so a GROUP BY clause can be used as the table's "primary key".

Let's consider an example. Suppose we needed a table to store relatively static data about user accounts in Stride, so that other continuous MATERIALIZE processes could JOIN incoming streaming events on it. Using the user_id column as a "primary key", this should give us what we want:

POST https://api.stride.io/v1/process/users_table

{
  "query": "
    SELECT user_id, latest(name) AS name, latest(address) AS address
      FROM users_table_stream
    GROUP BY user_id",
  "action": "MATERIALIZE"
}

We can now "insert" rows into our "table" by simply writing events to users_table_stream:

POST https://api.stride.io/v1/collect/users_table_stream

{
  "user_id": 42,
  "name": "Derek Nelson",
  "address": "The Matrix"
}

/analyze

The analyze endpoint lets you query continuously updating results of processing tasks created using the process endpoint. The query is described using Stride SQL, much like how we did for the process, except that instead of reading from streams, it reads from processing tasks whose results are materialized. Say if we want to find out the repositories derekjn has contributed to in 2016, we would issue the following request.

POST https://api.stride.io/v1/analyze

{
  "query": "
    SELECT
      DISTINCT repo AS repo
    FROM repo_aggs_per_day
    WHERE year(day) = 2016 AND 'derekjn' = ANY(users)"
}

The response is an array of JSON encoded records that the query outputs.

[
  { "repo": "pipelinedb/pipelinedb" },
  { "repo": "pipelinedb/docs" }
]

Named Queries

You can create named queries and then retrieve results for them later without having to specify the query everytime. To create a named query, append the name to the analyze endpoint path.

POST https://api.stride.io/v1/analyze/total_commits

{
  "query": "
    SELECT
      sum(num_commits) AS total
    FROM repo_aggs_per_day"
}

Now to fetch the result of this query any anytime, you can issue a GET request.

GET https://api.stride.io/v1/analyze/total_commits/results

[
  { "total": 134723 }
]

Retreiving Analyze Queries

You can query the Stride API to get a list of all saved analyze queries.

GET https://api.stride.io/v1/analyze

[
  {
    "name": "total_commits",
    "query": "
      SELECT
        sum(num_commits) AS total
      FROM repo_aggs_per_day"
  }
]

You can also get the metadata for a single analyze query.

GET https://api.stride.io/v1/analyze/total_commits

{
  "name": "total_commits",
  "query": "
    SELECT
      sum(num_commits) AS total
    FROM repo_aggs_per_day"
}

Deleting Analyze Queries

Saved analyze queries can be deleted by making a DELETE HTTP request.

DELETE https://api.stride.io/v1/process/users_per_day

A 200 response indicates that the analyze query was successfully deleted, while a 404 response indicates the it didn't exist.

Naming Limitations

All resource names must follow the following rules:

Stride SQL

Overview

Stride SQL only supports SELECT statements. These SELECT statements are used in two contexts:

Stride SQL aspires to be as minimal and standard as possible. The main thing that's unique about it is that when definining continuous processes, Stride SQL allows you to dereference the schemaless JSON objects that are read from streams. For example, if events written to a stream named user_events, looked something like,

{
  "url": "https://stride.io",
  "user": {
    "id": "aef1259e",
    "segment": "b",
    ...
  },
  ...
}

And if you wanted to keep track of the number of unique user IDs with a MATERIALIZE process, you could simply dereference the event to get to the id field, similarly to how you would in JavaScript:

SELECT COUNT(DISTINCT user.id) FROM user_events

Also unique to Stride SQL is that every column of a SELECT query must have an alias. Often column names are unambiguous and thus don't usually require explicit aliasing using AS syntax, such as in the above example. But aliasing is required when more complex expressions are used. For example, the following is an invalid Stride SQL query because the name of a + b in the output is ambiguous:

SELECT a + b FROM some_stream

We just need to name the expression using an AS alias:

SELECT a + b AS sum FROM some_stream

Other than object dereference support and explicit column naming, Stride SQL is just a minimal subset of standard SQL's SELECT syntax.

Grammar

Since Stride SQL only supports SELECT statements, its grammar is minimal:

SELECT [ DISTINCT [ ON ( expression [, ...] ) ] ]
    [ * | expression [ [ AS ] output_name ] [, ...] ]
    [ FROM from_item [, ...] ]
    [ WHERE condition ]
    [ GROUP BY ( expression [, ...] ) ]
    [ HAVING condition [, ...] ]
    [ ORDER BY expression [ ASC | DESC ]
    [ LIMIT { count | ALL } ]

-- Where from_item can be one of:
    stream_name
    materialize_process_name
    [ LATERAL ] ( subselect ) [ AS ] alias [ ( column [, ...] ) ]
    from_item JOIN from_item ON join_condition

Built-in Functionality

Stride provides the following functions that can be invoked by process definitions as well as queries against MATERIALIZE processes:

Aggregates

SignatureDescription
avg(expression)Take the average of the given expression
combine(aggregate column)Combine the aggregate value of the given MATERIALIZE process column
count([ DISTINCT ] expression or *)Count the number of distinct values seens for the given expression
latest(expression)Return the most recently seen input value
min(expression)Return the minimum of all input values
num_uniques(expression)Count the number of distinct values for the given expression (shorthand for COUNT DISTINCT)
oldest(expression)Return the first input value seen
percentile(expression, percentile, ...)Build an array of values corresponding to the given percentiles to keep track of
stddev(expression)Compute the standard deviation of all input values
sum(expression)Sum all values
top_k(expression, k)Build an array of [value, frequency] tuples of the top k values seen
variance(expression)Compute the variance of all input expressions

Non-aggregates

SignatureDescription
ceil(expression)Round the value up to the nearest integer
coalesce(expression, value)Return expression, or value when expression is NULL
concat(string, string, ...)Concatenate all of the given strings into a single string
day(timestamp)Truncate the given $timestamp to the nearest the day
floor(expression)Round the value down to the nearest integer
hour(timestamp)Truncate the given $timestamp to the nearest the hour
lower(string)Return the lower-cased version of the input string
matches(string, regex)Return true iff the input string matches the given regex
md5(string)Return the MD5 hash of the input string
minute(timestamp)Truncate the given $timestamp to the nearest the minute
month(timestamp)Truncate the given $timestamp to the nearest the month
now(timestamp)Return the current UTC time
round(expression)Round the given value
second(timestamp)Truncate the given $timestamp to the nearest the second
strlen(string)Return the length of the input string
substr(string, start, length)Return a substring of the input string starting at start having length length
timestamp_bucket(timestamp, interval)Truncate the given timestamp down to the nearest interval, .e.g. truncate($timestamp, '1 day')
trim(string)Trim whitespace from either side of the input string
upper(string)Return the upper-cased version of the input string
year(timestamp)Truncate the given $timestamp to the nearest the year

If you need some functionality that's not here, please don't hesitate to let us know at support@stride.io and we'll consider adding it.