BoilingData API v1.0.1 documentation

BoilingData API v1.0.1

Introduction

This document describes the asynchronous WebSockets based API of BoilingData. JSON messages are sent (published) as per this specification over authenticated WebSocket connection(s), and the client will receive (subscribe) messages that carry JSON query results.

The service sends query processing informatiion such as query run times, Lambda container state events that enable you to follow the lifecycle of data and queries if you like.

Building in a JavaScript environment?
Check out our JS/TS SDK - @boilingdata/node-boilingdata on GitHub.

Registering with BoilingData

First, create an account for BoilingData. Then your username and password can be used to login to the service (AWS Cognito). Your AWS credentials that are then used to sign the BoilingData application WebSockets URL and connect to the service. See NodeJS/JS SDK as an example.

Metadata Queries

As the API is asynchronous, an ID must be assigned to each query and this ID will be returned with all response messages. BoilingData may send multiple responses for each query. Each response will have batch information so that the API consumer can collect all response batches, and declare the query finished (please note that multiple identical batches maybe received). Communication is done using SQL. For instance:

-- Show user's configuration metadata
SELECT * FROM boilingdata;
-- (optional) Based on the configuration metadata, create IAM Role on you AWS
-- Account that BoilingData can assume
PRAGMA s3AccessRoleArn='arn:aws:iam::123456789012:role/bdS3';
-- List accessible S3 Buckets and their contents
-- (optional) BoilingData uses the IAM Role to read your S3 Bucket(s)
SELECT * FROM list('s3://');
SELECT * FROM list('s3://mybucket/');
-- List BoilingData specific PRAGMAs, like the s3AccessRoleArn
-- NOTE: The list does not contain real in-use values, only examples!
SELECT * FROM pragmas;
-- Get all shared data sets from/to you
SELECT * FROM boilingshares;

You can also get the schema of your response data by prepending DESCRIBE into your data query that produces data. When this happens, we return the schema rows instead of the data rows. Note that metadata queries currently do not support DESCRIBE.

DESCRIBE SELECT * FROM parquet_scan('s3://boilingdata-demo/demo.parquet');

Reading various data formats

Parquet have 1st level support on Boiling, and they are accessed with parquet_scan(). CSV files with read_csv_auto().

Accessing your Data with BoilingData

Use BDCLI for configuring S3 sandboxes
Check out our @boilingdata/boilingdata-bdcli on GitHub.

The data in your S3 Buckets can be queried directly in-place. Or you can query data sets shared to you even without any AWS access. For accessing your own S3 Bucket(s), an IAMRole and Policy needs to be created on your AWS Account that is assumable by the BoilingData AWS Account, and requiring the externalId set to your BoilingData account's externalId parameter. In other words, the IAM Role should be only assumable from the BoilingData AWS Account if the externalId parameter matches your BoilingData externalId parameter.

As the IAM role is created by you, you always control the access to your data (via the IAM role permissions). BoilingData will use the externalId parameter in the sts:assumeRole API call to assume your IAM role, populated with the Cognito username based hash value (i.e. unique to your username).

Here is an example trust policy that you need to provide for the IAM Role. Replace the placeholders with real values you get from the BoilingData API (see below).

Note that BDCLI can do all this for you, like create the IAM Role with the help of your Boiling account details.

{
  "Version": "2012-10-17",
  "Statement": {
    "Effect": "Allow",
    "Principal": {
      "AWS": "AWS_ACCOUNT_ID"
    },
    "Action": "sts:AssumeRole",
    "Condition": {
      "StringEquals": {
        "sts:ExternalId": "BOILINGDATA_EXTERNAL_ID"
      }
    }
  }
}

The BoilngData service AWS Account Id, and your own externalId parameter are available in the app via the API.

SELECT * FROM boilingdata;

The response looks like this (not real values):

{
  "awsAccountId": "589434896614",
  "externalId": "MjEzNDZiZjItNmMzMS00Y2FmLThlN2UtOTgzMjIwNWZmZGFhCg==",
}

Permissions Policy

To give your newly created IAM role permission to access your files on S3, an IAM Policy will need to be created.

Note that BDCLI can do all this for you, like create the IAM Role with the help of your Boiling account details and a YAML configuration file you created. It also supports multiple profiles so you can configure multiple users at the same time.

Development Environment / Just Testing

The most permissive policy will allow BoilingData to see all of your buckets, list the objects in a bucket, read them, and upload new files. This policy is perfect if you are just playing around with the BoilingData interface or are using a dev account.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "BoilingData0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket",
                "s3:GetBucketLocation",
                "s3:GetBucketRequestPayment"
            ],
            "Resource": [
                "arn:aws:s3:::BUCKET-NAME/*",
                "arn:aws:s3:::BUCKET-NAME"
            ]
        },
        {
            "Sid": "BoilingData1",
            "Effect": "Allow",
            "Action": "s3:ListAllMyBuckets",
            "Resource": "*"
        }
    ]
}

(replace BUCKET-NAME with your bucket name)

Production Environment

A production environment will ideally have the most restrictive permissions. In this case, the S3 object paths will be known in advance, and traversing buckets or uploading is not needed. The minimum required permissions to allow querying of a known S3 object are s3:GetObject, s3:GetBucketLocation, and s3:GetBucketRequestPayment.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "BoilingData0",
            "Effect": "Allow",
            "Action": [
                "s3:GetObject",
                "s3:GetBucketLocation",
                "s3:GetBucketRequestPayment"
            ],
            "Resource": [
                "arn:aws:s3:::BUCKET-NAME",
                "arn:aws:s3:::BUCKET-NAME/*"
            ]
        }
    ]
}

BoilingData PRAGMAs

We have added some additional PRAGMA statements on top of standard SQL supports, for instance:

PRAGMA s3AccessRoleArn='arn:aws:iam::123456789012:role/bdS3';

This PRAGMA can be used to set the IAM assume role ARN for accessing S3 from the BoilingData service. The setting is persisted on the service side, so needs to be set once only.

All of the custom BoilingData PRAGMA statements can be queried with

SELECT * FROM pragmas;

Operations

RECEIVE DataQuery
Send SQL queries.
With this API call you send the SQL queries, including any PRAGMA statements.

Operation IDpublish
Accepts the following message:
Data QueryDataQuery
application/json
Message IDDataQuery
User/client sends SQL queries with this message

object
messageType
required
string
Allowed values:
"SQL_QUERY"
sql
required
string
SQL query string

object
initFunc
string
This JS hook is called first, if present, it returns the privCtxt private context that is passed for other JS hooks.

(sql: string, webSocketUrl: string, scanCursor: number) => (privCtx: any);

headerFunc
string
This JS hook is called to produce header row. Must return the passed privCtx as first array element and return value on second.

(privCtx: any, firstRow: any) => [privCtx: any, header: any];

batchFunc
string
This JS hook is called for evey row batch. Must return the passed privCtx as first array element and return value on second.

(privCtx: any, rows: any[]) => [privCtx: any, rows: any[]];

footerFunc
string
This JS hook is called once when all rows in the batch are ready. Must return the footer "row".

(privCtx: any, total: number) => (footer: any);

finalFunc
string
This JS hook can be used to process all the results (rows) before sending back to client. The reults contain also the header and footer rows, if JS hooks produced them.

(privCtx: any, allRows: any[]) => (allRows: any[]);

Additional properties are allowed.
scanCursor
number
offset for rows to deliver

engine
string
Allowed values:
"DUCKDB"
"SQLITE"
array<object>
Select output types with paths. Multiple types are supported at the same time.

{ "reuquestId": "exampleWithS3Output", "sql": "SELECT * FROM parquet_scan('s3://KEY') WHERE year='2022' AND month='03'", "outputs": [{ "outputType": "S3", "outputPath": "s3://boilingdata-demo/uploads/queryResultsFolder/", "outputFormat": "CSV", "outputCompression": "NONE" }, { "outputType": "FILE", "outputPath": "/mnt/efs/queryResultsFolder/", "outputFormat": "PARQUET", "outputCompression": "ZSTD" }, { "outputType": "KAFKA", "outputPath": "b-1.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9098,b-2.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9098;myTopic", "outputFormat": "JSON", "outputCompression": "NONE" }], "keys": [ "s3://mybucket/data.1.parquet", "s3://mybucket/data.2.parquet", "s3://mybucket/data.3.parquet" ] }

object
outputType
required
string
Allowed values:
"S3"
"WEB_SOCKET"
"FILE"
"KAFKA"
outputPath
required
string
outputFormat
required
string
Allowed values:
"PARQUET"
"CSV"
"JSON"
outputCompression
required
string
Allowed values:
"NONE"
"SNAPPY"
"ZSTD"
"GZIP"
Additional properties are allowed.
requestId
required
string
Request identifier (Id) that is replayed back within the responses and other log messages.

crossRegionPolicy
string
Default is DISALLOWED. BoilingData reads a lot data from your data sources. For example, if you use BoilingData on eu-west-1 region, but your actual S3 Bucket source region is us-east-2, then you are paying data egress fees for AWS when BoilingData reads data across regions. Better use BoilingData on the same region where your data is located at.

Allowed values:
"DISALLOWED"
"ALLOWED"
"SELECTED"
array<string>
If you set crossRegionPolicy value to SELECTED, you can whitelist AWS Regions in this parameter so that you can read S3 Buckets on those regions. Note that you need to white list also the current BoilingData service endpoint region as well. The values in the array need to match with the source S3 Bucket region that you want to allow usage for.

Items:
string
readCache
string
Read cache behaviour. Default is MEMORY_COPY, where the whole object / file is read into local "RAM disk" first. With NONE, data is read on-demand from the source. FS_COPY uses local filesystem to copy the data into, e.g. Lambda /tmp that has maximum room for 10GB of data, whilst NETWORK_FS_COPY uses a mounted network filesystem that persists between invocations.

Allowed values:
"NONE"
"MEMORY_COPY"
"FS_COPY"
"NETWORK_FS_COPY"
writeCache
string
Similar to readCache, but for writing the data. Default is NONE, so the data will be written directly to the target.

Allowed values:
"NONE"
"MEMORY_COPY"
"FS_COPY"
"NETWORK_FS_COPY"
array<restricted any>
Set any Tags to your query and use them later to filter out your query log, e.g. cost center, project Id, etc.

restricted any
name
required
string
Tag name

value
required
string
Tag value

array<string>
Optionally, list of S3 Keys or Glue (Hive) Tables as targets for the query. Use like

SELECT * FROM parquet_scan('s3://KEY') LIMIT 1;

where KEY will be replaced with an entry from the keys array. Each of the keys is a full S3 path, including the "s3://" prefix, bucket and the key itself, or a Glue Table like "glue..".

Items:
string
List of S3 Keys, including the "s3://" prefix, bucket and the whole Key as a single string.

{ "reuquestId": "example1", "sql": "SELECT * FROM duckdb('s3://KEY') LIMIT 1", "keys": [ "s3://boilingdata-demo/referers.1.duckdb.zst", "s3://boilingdata-demo/referers.49.duckdb.zst" ] }

List of Glue Tables, including the "glue." prefix, database, and table names.

{ "reuquestId": "example1", "sql": "SELECT * FROM parquet_scan('s3://KEY') LIMIT 1", "keys": [ "glue.default.nyctaxis", ] }

Additional properties are allowed.
Examples
#1 Example - Set IAM Role
BoilingData PRAGMA to set IAM role for accessing your data on your S3 Data Lake.
{ "messageType": "SQL_QUERY", "sql": "PRAGMA s3AccessRoleArn='arn:aws:iam::123456789012:role/bdS3';", "requestId": "reqId1" }

#2 Example - List S3 Buckets
Special function to list S3 bucket(s)
{ "messageType": "SQL_QUERY", "sql": "SELECT * FROM list('s3://');", "requestId": "reqId2" }

#3 Example - List S3 Bucket
Special function (non-recursive) to list S3 bucket contents when bucket is defined.
{ "messageType": "SQL_QUERY", "sql": "SELECT * FROM list('s3://myBucket/');", "requestId": "reqId3" }

#4 Example - Query a single Parquet file
parquet_scan function to access Parquet files on S3.
{ "messageType": "SQL_QUERY", "sql": "SELECT COUNT(*) FROM parquet_scan('s3://boilingdata-demo/demo2.parquet');", "requestId": "reqId4", "readCache": "NONE", "tags": [ { "name": "CostCenter", "value": "930" }, { "name": "ProjectId", "value": "Top secret Area 53" } ] }

#5 Example - Query multiple Parquet files in parallel
Include additional key "keys", which is a list of S3 URLs including the s3:// prefix and full key path
{ "messageType": "SQL_QUERY", "sql": "SELECT * FROM duckdb('s3://KEY','referers') LIMIT 1;", "keys": [ "s3://boilingdata-demo/referers.1.duckdb.zst", "s3://boilingdata-demo/referers.49.duckdb.zst" ], "requestId": "reqId5" }

#6 Example - Query Glue Tables
Include additional key "keys", which is a list of Glue Tables with "glue." prefix.
{ "messageType": "SQL_QUERY", "sql": "SELECT * FROM parquet_scan('s3://KEY','referers') LIMIT 1;", "keys": [ "glue.default.nyctaxis1", "glue.default.nyctaxis2" ], "requestId": "reqId6" }

SEND DataResponse

Query response

Query responses, at least one for each requestId.

Operation IDsubscribe

Accepts the following message:

Data ResponseDataResponse

application/json

Message IDDataResponse

These messages carry the actual response data for the query. Responses may come in multiple batches and sub batches to match with the distributed streaming design.

object

Examples

#1 Example - Batched response example. This response is sub batch 3/16 of batch 1/1024.

{
  "messageType": "DATA",
  "requestId": "test1",
  "batchSerial": 1,
  "totalBatches": 1024,
  "subBatchSerial": 3,
  "totalSubBatches": 16,
  "data": [
    {
      "VendorID": 1,
      "tpep_pickup_datetime": 1556669690000,
      "tpep_dropoff_datetime": 1556669808000,
      "passenger_count": 1,
      "trip_distance": 0,
      "RatecodeID": 1,
      "store_and_fwd_flag": "N",
      "PULocationID": 145,
      "DOLocationID": 145,
      "payment_type": 2,
      "fare_amount": 3,
      "extra": 0.5,
      "mta_tax": 0.5,
      "tip_amount": 0,
      "tolls_amount": 0,
      "improvement_surcharge": 0.3,
      "total_amount": 4.3,
      "congestion_surcharge": 0
    },
    {
      "VendorID": 1,
      "tpep_pickup_datetime": 1556670954000,
      "tpep_dropoff_datetime": 1556671047000,
      "passenger_count": 1,
      "trip_distance": 1.5,
      "RatecodeID": 1,
      "store_and_fwd_flag": "N",
      "PULocationID": 145,
      "DOLocationID": 145,
      "payment_type": 2,
      "fare_amount": 3,
      "extra": 0.5,
      "mta_tax": 0.5,
      "tip_amount": 0,
      "tolls_amount": 0,
      "improvement_surcharge": 0.3,
      "total_amount": 4.3,
      "congestion_surcharge": 0
    }
  ]
}

SEND QueryInfo
Query processing information events
You get query processing information events from the service.

Operation IDsubscribe
Accepts the following message:
Query InformationQueryInfo
application/json
Message IDQueryInfo
Information about query progress and processing

object
messageType
required
string
Allowed values:
"INFO"
required
array<any>
Message from the service

object
name
required
string
May contain different key value pairs. New enums may appear with new versions.

Allowed values:
"DATA_LOAD"
"DATA_LOAD_MS"
"APPS_RUN"
"APPS_RUN_MS"
"QUERY_TIME_MS"
"LAMBDA_TOTAL_EXECUTION_TIME_ESTIMATE_MS"
"PROCESSING_SQL"
"CREDENTIALS_REFRESH"
value
required
string
Corresponding value, like "100"

Additional properties are allowed.
requestId
string
Matching request identifier

Additional properties are allowed.
Examples
#1 Example - Processing info
{ "messageType": "INFO", "requestId": "test1", "info": [ { "name": "PROCESSING_SQL", "value": "SELECT 1;" } ] }

#2 Example - Initial connection time
{ "messageType": "INFO", "requestId": "test1", "info": [ { "name": "DATA_LOAD_MS", "value": "71" } ] }

#3 Example - Query time
{ "messageType": "INFO", "requestId": "test1", "info": [ { "name": "QUERY_TIME_MS", "value": "31" } ] }

#4 Example - Total estimated Lambda execution time for cost calculation.
{ "messageType": "INFO", "requestId": "test1", "info": [ { "name": "LAMBDA_TOTAL_EXECUTION_TIME_ESTIMATE_MS", "value": "97" } ] }
SEND LambdaEvent
Information events about Lambda Assured Warm Concurrency
With these events you can follow what happens with the Lambda containers behind the scenes. These events reflect the state of the warm Lambdas, how many there are and their lifecycles.

Operation IDsubscribe
Accepts the following message:
Lambda Event InformationLambdaEvent
application/json
Message IDLambdaEvent
Information about Lambda containers, their lifecycles, and related hot datasets.

object
messageType
required
string
Allowed values:
"LAMBDA_EVENT"
required
object
instanceId
required
string
format may change between versions

username
required
string
format may change between versions

dataset
required
string
format may change between versions

status
required
string
Allowed values:
"warmingUp"
"warm"
"shutdown"
Additional properties are allowed.
Additional properties are allowed.
Examples
#1 Example - Simple example
{ "messageType": "LAMBDA_EVENT", "lambdaEvent": { "instanceId": "1638791651869__808", "username": "aac5c1d9-a0a9-4855-b896-0f3998b2f16b", "dataset": "aac5c1d9-a0a9-4855-b896-0f3998b2f16b__s3://myBucket/demo.parquet", "status": "warm" } }
SEND LogMessage
General logging messages
Logging messages with varying levels.

Operation IDsubscribe
Accepts the following message:
Log MessageLogMessage
application/json
Message IDLogMessage
General logging information. May also be unrelated to query.

object
messageType
required
string
Allowed values:
"LOG_MESSAGE"
logLevel
required
string
Log message level

Allowed values:
"ERROR"
"WARNING"
"INFO"
"DEBUG"
logMessage
required
string
Log message from the service

requestId
string
Matching request identifier

Additional properties are allowed.
Examples
{ "messageType": "LOG_MESSAGE", "logLevel": "ERROR", "logMessage": "string", "requestId": "string" }

This example has been generated automatically.

Messages

#1Data QueryDataQuery
- application/json
Message IDDataQuery
User/client sends SQL queries with this message
object
messageType
required
string
Allowed values:
"SQL_QUERY"
sql
required
string
SQL query string

object
initFunc
string
This JS hook is called first, if present, it returns the privCtxt private context that is passed for other JS hooks.

(sql: string, webSocketUrl: string, scanCursor: number) => (privCtx: any);

headerFunc
string
This JS hook is called to produce header row. Must return the passed privCtx as first array element and return value on second.

(privCtx: any, firstRow: any) => [privCtx: any, header: any];

batchFunc
string
This JS hook is called for evey row batch. Must return the passed privCtx as first array element and return value on second.

(privCtx: any, rows: any[]) => [privCtx: any, rows: any[]];

footerFunc
string
This JS hook is called once when all rows in the batch are ready. Must return the footer "row".

(privCtx: any, total: number) => (footer: any);

finalFunc
string
This JS hook can be used to process all the results (rows) before sending back to client. The reults contain also the header and footer rows, if JS hooks produced them.

(privCtx: any, allRows: any[]) => (allRows: any[]);

Additional properties are allowed.
scanCursor
number
offset for rows to deliver

engine
string
Allowed values:
"DUCKDB"
"SQLITE"
array<object>
Select output types with paths. Multiple types are supported at the same time.

{ "reuquestId": "exampleWithS3Output", "sql": "SELECT * FROM parquet_scan('s3://KEY') WHERE year='2022' AND month='03'", "outputs": [{ "outputType": "S3", "outputPath": "s3://boilingdata-demo/uploads/queryResultsFolder/", "outputFormat": "CSV", "outputCompression": "NONE" }, { "outputType": "FILE", "outputPath": "/mnt/efs/queryResultsFolder/", "outputFormat": "PARQUET", "outputCompression": "ZSTD" }, { "outputType": "KAFKA", "outputPath": "b-1.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9098,b-2.myTestCluster.123z8u.c2.kafka.us-west-1.amazonaws.com:9098;myTopic", "outputFormat": "JSON", "outputCompression": "NONE" }], "keys": [ "s3://mybucket/data.1.parquet", "s3://mybucket/data.2.parquet", "s3://mybucket/data.3.parquet" ] }

object
outputType
required
string
Allowed values:
"S3"
"WEB_SOCKET"
"FILE"
"KAFKA"
outputPath
required
string
outputFormat
required
string
Allowed values:
"PARQUET"
"CSV"
"JSON"
outputCompression
required
string
Allowed values:
"NONE"
"SNAPPY"
"ZSTD"
"GZIP"
Additional properties are allowed.
requestId
required
string
Request identifier (Id) that is replayed back within the responses and other log messages.

crossRegionPolicy
string
Default is DISALLOWED. BoilingData reads a lot data from your data sources. For example, if you use BoilingData on eu-west-1 region, but your actual S3 Bucket source region is us-east-2, then you are paying data egress fees for AWS when BoilingData reads data across regions. Better use BoilingData on the same region where your data is located at.

Allowed values:
"DISALLOWED"
"ALLOWED"
"SELECTED"
array<string>
If you set crossRegionPolicy value to SELECTED, you can whitelist AWS Regions in this parameter so that you can read S3 Buckets on those regions. Note that you need to white list also the current BoilingData service endpoint region as well. The values in the array need to match with the source S3 Bucket region that you want to allow usage for.

Items:
string
readCache
string
Read cache behaviour. Default is MEMORY_COPY, where the whole object / file is read into local "RAM disk" first. With NONE, data is read on-demand from the source. FS_COPY uses local filesystem to copy the data into, e.g. Lambda /tmp that has maximum room for 10GB of data, whilst NETWORK_FS_COPY uses a mounted network filesystem that persists between invocations.

Allowed values:
"NONE"
"MEMORY_COPY"
"FS_COPY"
"NETWORK_FS_COPY"
writeCache
string
Similar to readCache, but for writing the data. Default is NONE, so the data will be written directly to the target.

Allowed values:
"NONE"
"MEMORY_COPY"
"FS_COPY"
"NETWORK_FS_COPY"
array<restricted any>
Set any Tags to your query and use them later to filter out your query log, e.g. cost center, project Id, etc.

restricted any
name
required
string
Tag name

value
required
string
Tag value

array<string>
Optionally, list of S3 Keys or Glue (Hive) Tables as targets for the query. Use like

SELECT * FROM parquet_scan('s3://KEY') LIMIT 1;

where KEY will be replaced with an entry from the keys array. Each of the keys is a full S3 path, including the "s3://" prefix, bucket and the key itself, or a Glue Table like "glue..".

Items:
string
List of S3 Keys, including the "s3://" prefix, bucket and the whole Key as a single string.

{ "reuquestId": "example1", "sql": "SELECT * FROM duckdb('s3://KEY') LIMIT 1", "keys": [ "s3://boilingdata-demo/referers.1.duckdb.zst", "s3://boilingdata-demo/referers.49.duckdb.zst" ] }

List of Glue Tables, including the "glue." prefix, database, and table names.

{ "reuquestId": "example1", "sql": "SELECT * FROM parquet_scan('s3://KEY') LIMIT 1", "keys": [ "glue.default.nyctaxis", ] }

Additional properties are allowed.
#2Lambda Event InformationLambdaEvent
- application/json
Message IDLambdaEvent
Information about Lambda containers, their lifecycles, and related hot datasets.
object
messageType
required
string
Allowed values:
"LAMBDA_EVENT"
required
object
instanceId
required
string
format may change between versions

username
required
string
format may change between versions

dataset
required
string
format may change between versions

status
required
string
Allowed values:
"warmingUp"
"warm"
"shutdown"
Additional properties are allowed.
Additional properties are allowed.
#3Data ResponseDataResponse
- application/json
Message IDDataResponse
These messages carry the actual response data for the query. Responses may come in multiple batches and sub batches to match with the distributed streaming design.
object
messageType
required
string
Allowed values:
"DATA"
required
array<object>
For now, JSON array of any type.

Items:
object
The actual response rows for the query

Additional properties are allowed.
requestId
required
string
Matching request identifier

numOfRecords
number
format: INT64
Number of records/rows in this batch

batchSerial
required
number
format: INT64
Response may consist of multiple ordered batches. This is the ordering number / serial of this response (>=0). Value 0 may be used to inform totalbatches without carrying actual response data.

totalBatches
required
number
format: INT64
Total number (>=0) of batches that make the whole response. Value 0 means the number of total batches is not yet known (e.g. continuous streaming)

subBatchSerial
number
format: INT64
A batch may consist of sub batches (>=0). Value 0 may be used to inform totalSubbatches without carrying actual response data yet.

totalSubBatches
number
format: INT64
Total number (>0) of sub batches that make a complete batch.

Additional properties are allowed.
#4Query InformationQueryInfo
- application/json
Message IDQueryInfo
Information about query progress and processing
object
messageType
required
string
Allowed values:
"INFO"
required
array<any>
Message from the service

object
name
required
string
May contain different key value pairs. New enums may appear with new versions.

Allowed values:
"DATA_LOAD"
"DATA_LOAD_MS"
"APPS_RUN"
"APPS_RUN_MS"
"QUERY_TIME_MS"
"LAMBDA_TOTAL_EXECUTION_TIME_ESTIMATE_MS"
"PROCESSING_SQL"
"CREDENTIALS_REFRESH"
value
required
string
Corresponding value, like "100"

Additional properties are allowed.
requestId
string
Matching request identifier

Additional properties are allowed.
#5Log MessageLogMessage
- application/json
Message IDLogMessage
General logging information. May also be unrelated to query.
object
messageType
required
string
Allowed values:
"LOG_MESSAGE"
logLevel
required
string
Log message level

Allowed values:
"ERROR"
"WARNING"
"INFO"
"DEBUG"
logMessage
required
string
Log message from the service

requestId
string
Matching request identifier

Additional properties are allowed.

BoilingData API v1.0.1

Introduction

Registering with BoilingData

Metadata Queries

Reading various data formats

Accessing your Data with BoilingData

Permissions Policy

Development Environment / Just Testing

Production Environment

BoilingData PRAGMAs

Operations

RECEIVE DataQuery

Examples

#1 Example - Set IAM Role

#2 Example - List S3 Buckets

#3 Example - List S3 Bucket

#4 Example - Query a single Parquet file

#5 Example - Query multiple Parquet files in parallel

#6 Example - Query Glue Tables

SEND DataResponse

Examples

#1 Example - Batched response example. This response is sub batch 3/16 of batch 1/1024.

SEND QueryInfo

Examples

#1 Example - Processing info

#2 Example - Initial connection time

#3 Example - Query time

#4 Example - Total estimated Lambda execution time for cost calculation.

SEND LambdaEvent

Examples

#1 Example - Simple example

SEND LogMessage

Examples

This example has been generated automatically.

Messages