Scalable on-the-fly analytics from S3

Virtual Data Warehouses On-Demand, Automated

High performance multi-user analytics in milliseconds from the cold data in your S3 with no pre-caching, no managing clusters, and no problems (white paper).

Managed and distributed DuckDB with AWS Lambda functions for very cost efficient interactive speed

Python client with e.g. Notebooks. Queries run seamlessly locally and remotely.

JS Client for interacting with Boiling from NodeJS or directly from Browser.

BDCLI for full management of your Boiling account and integrating with IaC and automation.

BI Proxy for connecting your BI Tool of choice through Presto interface.

See also Data Taps - a Boiling Data Product

(experimental) Boiling interface HTTPFS+Parquet directly from DuckDB with BDCLI generated TABLE MACRO


Deployed regions: eu-west-1, eu-north-1, us-west-2, us-east-2
(your AWS S3 Buckets need to be in these regions).

Analytics For SaaS Dashboarding

Run aggregation queries to create analytics dashboards instantly using the full power of Lambda's 18,000 CPUs.

BoilingData supports all standard SQL filtering syntax and aggregation functions, and can join several datasets to create high level insights.

On The Fly Querying

Files are warmed the moment they are queried, eliminating the need to preemptively manage warming lifecycles.

Each query is executed in a dedicated environment, meaning that you can scale up to hundreds of queries per second without pre-caching queries or predicting what information users will look at.

From Cold S3 data

Data is warmed and queried in-place in the same AWS region as your data, meaning lightning-fast responsiveness with no egress costs.

Queries can be run on a single file, in parallel across multiple files (i.e. generating timeseries trends), or concurrently to create roll-up aggregations from multiple files.

BoilingData over your
S3 Data Lake for

Using BoilingData you know exactly the caching status of your data sets and always get predictable high query performance.

Your AWS IAM Role, your control

Hosted DuckDB* - high performance embedded OLAP SQL engine

Rapidly warmed data and lifecycle management

Fully managed service and per-millisecond billing



Instant Computing Power For
Realtime Dashboarding

Our innovative routing layer coordinates Lambda Functions globally for achieving realtime aggregations with blazing fast filtering.

State of the art DuckDB* OLAP

Open Source DuckDB* is disrupting the analytics data processing world with embedded SQL data processing engine developed with C++ and state-of-the-art algorithms. We run a fully hosted DuckDB service using AWS Lambda for high performance and scalability. We slice and dice the data files and distribute the queries over tens of Lambda Functions depending on the source data size and query requirements. This happens network and Lambda Service optimised.

Boiling Lifecycle Management

We optimise data availability for a typical dashboard session up to configurable time (free test app uses 5 minutes). You gain the full instant computing power of Lambda (18,000 CPUs) for running queries while maintaining hot in-memory data between queries for instant subsequent data access. Only pay for the milliseconds that your queries are actually running.

...

Pay-per-Millisecond Pricing

Fast, scalable analytics over cold data

Starter
Free

Get started creating analytics from your S3 data right now!

FREE TIER
1h total query exec. time / month

500MB max filesize with 5min in-mem caching.

Lightweight (non-)distributed SQL query execution.

High performance in-memory SQL database from Parquet.

Try Now

Premium

$ 29* monthly

STARTER plus:

10GB max single query working set size.

Distritubed SQL queries with tens / hundreds of Lambda functions lifting data from S3 with 5-6GB/s speed.

Support for querying AWS Glue (Hive) Tables and S3 Folders

BI Tool integration through presto compatible interface.

*) Pay as you go pricing above free tier: $0.000001 / worker exec. time (ms)

Try Now

Boiling Ponds!

$ 129* monthly

PREMIUM plus:

Optimise your S3 Prefix data! Boiling imports and optimises data into a dedicated EC2 instance running DuckDB*

Fully managed. Boiling automatically routes queries for the optimised S3 Prefix to the EC2 instance and start/stop it automatically

Up to 6 TB compressed (30 TB uncompressed) Parquet data for single query working set size!

*) Plus, pay as you go pricing for the EC2 instance.

Try Now
?

Some use cases for BoilingData?

To serve any number of customer data sets efficiently and fast with any dynamic queries (e.g. filtering) directly and cost efficiently from your S3 Data Lake. That is, your customers accessing your application and working with their data simultaneously. All of them getting the same dedicated and single tenant query power.

To serve any number of customer data sets and queries with caching - never run the query twice - but serve the results immediately, regardless how complex and time consuming the original query might have been.

To move from ETL to EL(T) with live queries, but also to allow very efficient reports/analytics pre-generation (cached responses) for rapid heavy Analytics Dashboards interactions (e.g. trend data, pivot tables), including dynamic filtering and paged row level access to the real data. Get your data hot with BoilingData and run hundreds of queries so that they are immediately ready for your customers when they open their dashboards.


?

Can I use BoilingData on my existing S3 Data Lake?

Absolutely. You create an IAM Role (with the help of BDCLI) in your AWS Account that has only access to the S3 Bucket(s) and prefix(es) you want to run BoilingData with. Then you set the ARN of the role to your BoilingData user account (with SQL PRAGMA command). BoilingData will use that IAM Role (assume role) to access your data. You have full control of the IAM Role and your data.

?

Can I query data shared to me, or can I share data for others?

Absolutely. You don't even need your own AWS account to start if you only consume data from others. You can consume data shared to you with row, column, and time level secure access (SQL based view with cron vending schedule and token lifetime).

We use de-facto security token vending and exchange login (AWS Cognito) token to BoilingData STS Tokens that can either be based on your configured IAM Role to your AWS Account (your data) or be shared from others (their data). You can list data sets shared to you, but you can also share data sets for others with accompanying SQL clause to achive row and column based security (like segmenting data for tenants).

If you like, you can also accompany a cron schedule and specify token lifetime to control when data access can happen. And since this is built on top of AWS IAM, you're in full control over the access on AWS IAM level as well - you specify the boundaries with IAM Role and for each share use Boiling level SQL to bind the share with the query results only - Data Warehouse level security with additional time dimension.

?

Why should I use BoilingData?

To get in-memory and compute caching for all your data in your Data Lake without the need to run any clusters or worry about scalability. You get visible and predictable query performance and data warmup times. Data warmup is the time when BoilingData reads the data into memory, after which the data is hot and queries blazing fast.

?

Can I run BoilingData on-premises or in my AWS account?

For Enterprises, BoilingData data plane can be run inside of a dedicated AWS account, but the control plane cannot. This means that all data and query results will stay inside a dedicated AWS account, but BoilingData infrastructure will plan queries and orchestrate the Lambda events. Contact us for more information, as data and control plane separation is a separate deployment scenario and requires a support contract.

?

Is BoilingData secure?

Yes, we take security seriously. Each and every query runs in a warm dedicated Lambda function where your data is already hot. No other user gets access to that Lambda function. We use RBAC and Least Privilege principles in our deployments. We use de-facto JWT Token based security where each request carries a token inside encrypted channel. For every request, we verify the token and the claims it carries. Every EC2 instance for Boiling Pond also runs single customer queries.


?

What makes BoilingData different?

You run queries over your Data Lake without any clusters. No Data imports needed. Just SQL over your S3 Data Lake with Data Warehouse level security. However, you have the option to auto import and optimise your S3 Prefix with Boiling Ponds. In this case the data is copied from S3 Prefix into EC2 instance running DuckDB. For optimised S3 Prefixes, Boiling routes the queries to the instance instead of distributed AWS Lambda functions. Queries with JOINs and hard to distribute aggregations are faster to run on a dedicated EC2 instance.

Our platform monitors AWS Lambda containers' lifecycle and implements data based query routing. We ensure that required number of concurrent warm Lambda containers are running for your dataset when it is in-use.

We have streaming realtime analytics and data ingestion support with Data Taps. See live demo of embedded Aanlytics on the Data Taps frontpage.

We implement a practical Data Mesh (secure multi-user, concurrent data access sharing) architecture globally (queries run where data resides). See supported AWS regions on top of this page.

BoilingData's innovative distributed query processing layer is evolving towards Data Fabric, meaning users can connect their data to Boiling Data Fabric even outside of AWS S3 Data Lake. Users will be able to connect their any-cloud or on-prem data to Boiling Fabric, their laptops, browsers, and mobile devices if they like. Overall, this improves security, builds upon the Data Mesh architecture and avoids long data pipelines with high latency and various overheads and single point of failures.

The Boiling Data Fabric will support periodic and event based queries from data sources to online consuming users like operational dashboards. This is practical and real low latency data streaming, like database replication on per query level.

BoilingData re-invents User Facing Analytics with Data Mesh and Data Fabric architecture with global multi-user and concurrent data processing layer. Furthermore, Data producers join and link to Boiling to easily share their data for their current and potential new customers without even having to deploy or pay for any compute (consumers pay for producers) - where data contracts and sharing of data can happen with highest security and with single simple and understandable command. Data consumers do not need tens of different data ingestion plugins and tools/services to get analytical access to their own data.

Boiling plays well with data and aligns with data gravity! Currently many companies bring data physically into a single place with tens of error-prone pipelines (and while having tens if not hundreds of these pipelines), where data quality is an open question and nobody understands the whole e2e data journey/lineage, or where cross-team collaboration is a big organisational challenge, not only from data competence perspective. Boiling aligns with the promise of Data Mesh modularity where data products can be published and consumed.