Stride API Documentation

Stride is a realtime analytics API. A fully managed service, Stride enables developers to collect streams of events in realtime and construct networks of continuous queries on top of those streams that do things like high-throughput continuous aggregation, distincts counting, sliding-window computations, fire webhooks, run massive retroactive batch queries, and more. Realtime events can also be streamed out of Stride by subscribing to streams of changes as they're applied throughout your setup, although you can issue on-demand queries too. Events are schemaless JSON, and queries are defined using Stride SQL, a simplified dialect of SQL designed for productivity and ease of use.

Stride also gives you a beautiful web interface to manage and visualize your data. Let's start with basics of the Stride API and then dive into how you can process your data by building continuous processing pipelines and analyze the results later.

Headers

Authentication

We provide each account with two API keys: a su-key and a write-key. The su-key can be used to access any endpoint while the write-key can only be used with the collect endpoint. The su-key is meant to be kept secret and should never be included in client side code. The write-key can be used in client side code, at the risk of someone posting junk data using your key. However, this isn't any different from other analytics APIs.

We use HTTP Basic Auth to authenticate incoming requests. Basic Auth base64 encodes a username:password pair, prepends the encoded string with Basic and sends the result string in the Authorization header field. When using the Stride API, the account's API key (either write-key or su-key) should be used as the username and the password should be left blank. So if your API key is helloworld, the final header will look like Authorization: Basic aGVsbG93b3JsZDo=. Our API is only accessible via HTTPS so none of this will ever be transmitted in the clear.

Content-Type

All requests made to the Stride API must set the Content-Type header to be application/json.

Errors

We use conventional HTTP status codes to indicate the result of an API request.

Limits

Coming soon...

Endpoints

/v1/collect

The collect endpoint is used for collecting events. An event is a JSON object with any number of properties which describe the context in which the event took place. All JSON value types are supported, including nested objects. A sample event looks like:

{
  "$timestamp": "2015-05-05T23:40:27Z",
  "path": "/url/to/page",
  "ip": "50.233.123.210",
  "user": {
    "id": "deadbeef",
    "name": "Alyssa Hacker",
    "age": 31
  },
  "product": {
    "id": "1ceb00da",
    "name": "Qubit",
    "price": 3.99
  }
}

$timestamp is an optional property which describes the time at which the event took place. If ommitted, the server automatically sets it to the time the event was received.

Events are collected into streams. The stream which an event should be emitted into is specified as part of the request. The consumers of these streams are continuous processing tasks which you will learn more about in the process section.

Events can either be sent individually or in bulk for increased throughput.

Collecting Individual Events

To collect individual events, append the stream name to the collect endpoint path and set the request body to the event. The following request will send a single event to the commits stream.

POST https://api.stride.io/v1/collect/commits

{
  "$timestamp": "2015-05-05T23:40:27Z",
  "repo": "pipelinedb/pipelinedb",
  "username": "usmanm",
  "sha1": "690e6814144a174d38ff501c5d89bfff5ff8d6de"
}

Collecting Events In Bulk

There are two ways events can be collected in bulk. If you want to send multiple events to the same stream, use the same endpoint logic as above but send an array of events in the body.

POST https://api.stride.io/v1/collect/commits

[
  {
    "$timestamp": "2015-05-05T23:40:27Z",
    "repo": "pipelinedb/pipelinedb",
    "username": "usmanm",
    "sha1": "690e6814144a174d38ff501c5d89bfff5ff8d6de"
  },
  {
    "$timestamp": "2015-05-05T23:48:03Z",
    "repo": "pipelinedb/pipelinedb",
    "username": "derekjn",
    "sha1": "95bbf000808c8e7493d3c4cdd5aa3d26e91f6f6e"
  }
]

To collect events into multiple streams, make a request to the collect endpoint with no suffix. The body should be a nested JSON object where each top-level property name maps to a stream and its corresponding value to an array of events to be emitted into the stream.

POST https://api.stride.io/v1/collect

{
  "commits": [
    {
      "$timestamp": "2015-05-05T23:40:27Z",
      "repo": "pipelinedb/pipelinedb",
      "username": "usmanm",
      "sha1": "690e6814144a174d38ff501c5d89bfff5ff8d6de"
    },
    {
      "$timestamp": "2015-05-05T23:48:03Z",
      "repo": "pipelinedb/pipelinedb",
      "username": "derekjn",
      "sha1": "95bbf000808c8e7493d3c4cdd5aa3d26e91f6f6e"
    }
  ],
  "pull_requests": [
    {
      "$timestamp": "2015-05-05T23:42:53Z",
      "repo": "pipelinedb/pipelinedb",
      "action": "opened",
      "number": 1576,
      "username": "usmanm"

  ]
}

Subscribing to Streams

You can subscribe to a stream to see sampled events being inserted into a stream in realtime. We use persistent HTTP connections with chunked transfer encoding for streaming events down to the client. This is similar to how Twitter's Streaming API works. The body of the response contains a series of newline delimited events, where each event is a JSON encoded string and newline is equal to \r\n. Events may contain the newline \n character, but will never contain the carriage return \r character.

GET https://api.stride.io/v1/collect/commits/subscribe

  {
    "$timestamp": "2015-05-05T23:40:27Z",
    "repo": "pipelinedb/pipelinedb",
    "username": "usmanm",
    "sha1": "690e6814144a174d38ff501c5d89bfff5ff8d6de"
  }
  \r\n
  {
    "$timestamp": "2015-05-05T23:48:03Z",
    "repo": "pipelinedb/pipelinedb",
    "username": "derekjn",
    "sha1": "95bbf000808c8e7493d3c4cdd5aa3d26e91f6f6e"
  }
  \r\n
  ...

Note: In the above response, you wouldn't see \r\n as 4 ASCII characters. We've just written them as such to make it clear that there is a boundary between messages.

Retreiving Streams

You can query the Stride API to get a list of all the streams that exist in your database. A stream is automatically created the first time an event is collected into it.

GET https://api.stride.io/v1/collect

[
  "commits",
  "pull_requests",
  "app_events",
  "web_logs"
]

You can also check if a named stream exists in the system.

GET https://api.stride.io/v1/collect/commits

A 200 response indicates that the stream exists, while a 404 response indicates the stream doesn't exist.

Deleting Streams

Streams can be deleted by making a DELETE HTTP request.

DELETE https://api.stride.io/v1/collect/commits

A 200 response indicates that the stream was successfully deleted, while a 404 response indicates the stream didn't exist.

/v1/process

The process endpoint is used to create processing tasks which run continuous queries over streams. A processing task consists of two parts:

Query:

A SQL query that SELECTs from a stream. For example, we could compute the number of commits made by each user per day by running the following query on the commits stream we saw above.

SELECT
  username         AS user,
  date($timestamp) AS day,
  count(*)         AS count
FROM commits
GROUP BY user, day

Or we could filter out commits for a certain repository.

SELECT
  username   AS user,
  $timestamp AS commit_time,
  sha1       AS commit_hash
FROM commits
WHERE repo = 'pipelinedb/pipelinedb'

Note: We use a slightly simplified version of SQL, which we call Stride SQL, to describe queries. Check out the Stride SQL section to learn more about it.

Action:

An action describes what should be done with the results produced by the continuous query.

We provide two different actions:

MATERIALIZE

Store the result of the query durably so that it can be analyzed later. The results of a continuous MATERIALIZE process are incrementally updated in realtime as they read incoming data.

WEBHOOK

Encode the result as JSON and send it as a HTTP POST payload to a webhook URL. For queries that perform aggregations, the result is a JSON serialized object containing the old record and the new record. So for the first query stated above, you would see something like:

POST http://website.com/webhook1

{
  "$timestamp": "2015-05-05T23:48:03Z",
  "new": {
    "user": "usmanm",
    "day": "2015-05-05",
    "count": 5
  },
  "old": {
    "user": "usmanm",
    "day": "2015-05-05",
    "count": 4
  }
}

If the query performs no aggregation, the result is simply the record serialized as JSON. For the second query above, the request would look as follows.

POST http://website.com/webhook2

{
  "$timestamp": "2015-05-05T23:48:04Z",
  "user": "derekjn",
  "commit_time": "2015-05-05T23:48:03Z",
  "commit_hash": "95bbf000808c8e7493d3c4cdd5aa3d26e91f6f6e"
}

Building Processing Pipelines

So far we've only seen how we can run continuous queries on events emitted into streams via the collect endpoint. But Stride lets us do much more. The output of processing tasks is exposed as streams which can be queried by other processing tasks. That's how networks of computations can be built.

Lets put of all of this together by creating such a network on top of the commits stream we saw earlier. First, we'll create a task that calculates per-day aggregates for each repository. The aggregates we're interested in are the number of commits made and the set of users who made them.

POST https://api.stride.io/process/repo_aggs_per_day

{
  "query": "
    SELECT
      repo              AS repo,
      date($timestamp)  AS day,
      set_agg(username) AS users,
      count(*)          AS num_commits
    FROM commits
    GROUP BY repo, day",
  "action": {
    "type": "MATERIALIZE"
   }
}

Next, we want a webhook to be invoked whenever the number of commits made to a repository in a day exceeds 100. To that end, we'll run a processing task that queries the output of the repo_aggs_per_day query. Notice how we use new and old to access the old and updated values of the record. Needless to say, the values of the column being grouped on will be identical.

POST https://api.stride.io/process/popular_repos_per_day

{
  "query": "
    SELECT
      new.day  AS day,
      new.repo AS repo
    FROM repo_aggs_per_day
    WHERE new.count > 100 AND old.count <= 100"
  "action": {
    "type": "WEBHOOK",
    "url": "http://mysite.com/webhook"
  }
}

The webhook should see data that looks like:

{
  "$timestamp": "2015-05-05T23:48:04Z",
  "day": "2015-05-05",
  "repo": "pipelinedb/pipelinedb"
}

Subscribing to Processes

Just like we can subscribe to streams, we can also subscribes to processes. The response is a realtime stream of the output of the process. Let's say we want to get notified whenever a user makes a commit to the pipelinedb/pipelinedb repository for the first time on a day. We start by creating a process that does that.

POST https://api.stride.io/process/users_per_day

{
  "query": "
    SELECT
      time()            AS timestamp,
      unnest(new.users) AS username
    FROM repo_aggs_per_day
    WHERE NOT user = ANY(old.users) AND new.repo = 'pipelindb/pipelinedb'",
  "action": {
    "type": "MATERIALIZE"
  }
}

This query first unnests the new set of users and then filters out users that were present in the old set of users. We can not subscribe to this process to get a realtime stream of users making commits to our repo.

GET https://api.stride.io/v1/process/users_per_day/subscribe

  {
    "timestamp": "2015-05-05T23:40:27Z",
    "username": "usmanm"
  }
  \r\n
  {
    "timestamp": "2015-05-05T23:48:03Z",
    "username": "derekjn"
  }
  \r\n
  ...

Retreiving Processes

You can query the Stride API to get a list of all registered processes.

GET https://api.stride.io/v1/process

[
  {
    "name": "repo_aggs_per_day",
    "query": "
      SELECT
        repo              AS repo,
        date($timestamp)  AS day,
        set_agg(username) AS users,
        count(*)          AS num_commits
      FROM commits
      GROUP BY repo, day",
    "action": {
      "type": "MATERIALIZE"
     }
  },
  {
    "name": "popular_repos_per_day",
    "query": "
      SELECT
        new.day  AS day,
        new.repo AS repo
      FROM repo_aggs_per_day
      WHERE new.count > 100 AND old.count <= 100"
    "action": {
      "type": "WEBHOOK",
      "url": "http://mysite.com/webhook"
    }
  },
  {
    "name": "users_per_day",
    "query": "
      SELECT
        time()            AS timestamp,
        unnest(new.users) AS username
      FROM repo_aggs_per_day
      WHERE NOT user = ANY(old.users) AND new.repo = 'pipelindb/pipelinedb'",
    "action": {
      "type": "MATERIALIZE"
    }
  }
]

You can also get the metadata for a single process.

GET https://api.stride.io/v1/process/users_per_day

{
  "name": "users_per_day",
  "query": "
    SELECT
      time()            AS timestamp,
      unnest(new.users) AS username
    FROM repo_aggs_per_day
    WHERE NOT user = ANY(old.users) AND new.repo = 'pipelindb/pipelinedb'",
  "action": {
    "type": "MATERIALIZE"
  }
}

Deleting Processes

Processes can be deleted by making a DELETE HTTP request.

DELETE https://api.stride.io/v1/process/users_per_day

A 200 response indicates that the process was successfully deleted, while a 404 response indicates the process didn't exist.

Sliding-Window Queries

One of the unique benefits of Stride's continuous API is that it allows you to perform computations over sliding windows of time. That is, you can keep track of results only for the last minute, the last hour, day, or whatever frame of time is of interest. Sliding window results will only reflect data receieved between the current moment in time back until the beginning of the window. All out-of-window data is automatically excluded.

Sliding-windows can be applied to both WEBHOOK and MATERIALIZE processes by including the "over" argument in the process definition. Let's look at an examples to illustrate how this can help you. The following example represents a common monitoring use case in which we want to fire an alert any time the number of errors in our logs stream goes above 100 over a 10-second window. Without sliding windows, this problem is actually deceptively hard. But with a simple sliding window, Stride makes it dead simple.

We'll use two processes: the first one is a MATERIALIZE process that keeps track of the error count over the last 10 seconds; the second process fires a WEBHOOK whenever the error threshold is exceeeded.

POST https://api.stride.io/process/logs_errors_10s

{
  "query": "
    SELECT
      count(*)    AS errors
    FROM logs
    WHERE type = 'ERROR'
  "over": "10 seconds",
  "action": {
    "type": "WEBHOOK"
    "url": "http://some.monitoring.service/errors_alarm"
   }
}

Now let's read from the above process' output and fire a WEBHOOK whenever the count crosses from fewer than 10 to 10 or more:

POST https://api.stride.io/process/logs_errors_10s

{
  "query": "
    SELECT
      new.errors  AS errors
    FROM logs_errors_10s
    WHERE old.errors < 10 AND new.errors >= 10
  "action": {
    "type": "WEBHOOK"
    "url": "http://some.monitoring.service/errors_alarm"
   }
}

Note that we have access to the previous and new values when reading from the output of another process.

/v1/analyze

The analyze endpoint lets you query continuously updating results of processing tasks created using the process endpoint. The query is described using Stride SQL, much like how we did for the process, except that instead of reading from streams, it reads from processing tasks whose results are materialized. Say if we want to find out the repositories derekjn has contributed to in 2016, we would issue the following request.

POST https://api.stride.io/analyze

{
  "query": "
    SELECT
      DISTINCT repo AS repo
    FROM repo_aggs_per_day
    WHERE year(day) = 2016 AND 'derekjn' = ANY(users)"
}

The response is an array of JSON encoded records that the query outputs.

[
  { "repo": "pipelinedb/pipelinedb" },
  { "repo": "pipelinedb/docs" }
]

Named Queries

You can create named queries and then retrieve results for them later without having to specify the query everytime. To create a named query, append the name to the analyze endpoint path.

POST https://api.stride.io/analyze/total_commits

{
  "query": "
    SELECT
      sum(num_commits) AS total
    FROM repo_aggs_per_day"
}

Now to fetch the result of this query any anytime, you can issue a GET request.

GET https://api.stride.io/analyze/total_commits/results

[
  { "total": 134723 }
]

Retreiving Analyze Queries

You can query the Stride API to get a list of all saved analyze queries.

GET https://api.stride.io/v1/analyze

[
  {
    "name": "total_commits",
    "query": "
      SELECT
        sum(num_commits) AS total
      FROM repo_aggs_per_day"
  }
]

You can also get the metadata for a single analyze query.

GET https://api.stride.io/v1/analyze/total_commits

{
  "name": "total_commits",
  "query": "
    SELECT
      sum(num_commits) AS total
    FROM repo_aggs_per_day"
}

Deleting Analyze Queries

Saved analyze queries can be deleted by making a DELETE HTTP request.

DELETE https://api.stride.io/v1/process/users_per_day

A 200 response indicates that the analyze query was successfully deleted, while a 404 response indicates the it didn't exist.

Naming Limitations

All resource names must follow the following rules:

Stride SQL

Coming soon...