In October 2020, Cloudera strategically acquired a company called Eventador. This acquisition mainly enhances the streaming function of Cloudera DataFlow. Eventador excels at simplifying the process of building streaming applications. Their flagship product, SQL Stream Builder, can easily access real-time data streams with only SQL (Structured Query Language). Cloudera's customers are trying to solve the same challenge to query a large number of real-time data streams with a simple query method such as SQL.
Today, within 5 months of acquiring Eventador, Cloudera is very happy to announce that SQL Stream Builder is now relaunched as Cloudera SQL Stream Builder. This can be completed once it is fully integrated with the Shared Data Experience (SDX) of the Cloudera Data Platform (CDP). After integration, SQL Stream Builder with SDX can take advantage of the equally unified security and management as the rest of the platform.
What is SQL Stream Builder?
Cloudera's SQL Stream Builder now enhances the powerful stream processing capabilities of streaming media platform of Cloudera's Data Stream (CDF). It provides a slick user interface for writing SQL queries aiming at real-time data streams in Apache Kafka or Apache Flink. This enables developers, data analysts, and data scientists to write streaming applications using only SQL. They no longer need to rely on any skilled Java or Scala developers to write special programs to access these data streams.
SQL Stream Builder runs SQL continuously through Flink. It provides syntax checking, error reporting, pattern detection, query creation, sampling results, and creation output with a simple and visualized user interface. It provides an advanced materialized view engine as well, which enables other applications to access real-time aggregated data sets through a simple REST API.
There is an expiration of data. In current business environment, the data you receive must be processed immediately to understand the business impact and take action. If you can only ingest all the data in real time, but cannot use what the data means to you, then streaming analysis solutions are not favorable. Imagine a scenario: a manufacturer receives a data stream with millions of messages every day from a dozen or more manufacturing plants. If they need to know where a specific surge of a stream comes from, or need to detect a specific anomaly in the stream, they should be able to query the stream in real time. It is inappropriate for them to send all to the storage and then analyze it the next day to find feasible solutions because the data will have no value in the next day. The skills of performing such real-time queries are usually mastered by a few people in an organization, who have unique skills such as Scala or Java, and can write code to gain solutions. This is not a scalable model.
SQL is a universal language
In the past three decades, SQL has become an accepted method for querying across multiple database systems. Among the key corporate data tools, SQL is also one of the most popular skills. Since data analysts and data scientists are struggling to easily gain access to real-time data streams, SQL has become an easy choice for this task. However, this is a key challenge. Unlike database tables that usually have a fixed number of rows at any given time point, streaming is unrestricted. This means that they are continuous in nature, and neither have limits nor enter in order. Some messages may also come later or lead to malfunction, which makes it challenging to query the data stream through SQL.
The data stream must be processed using tiny time slices called "windows", such as 5 seconds. Each message on the stream also has a timestamp which can be used to detect the order in which the message should be processed. Therefore, using SQL as the basic tool, and some other keywords are added to process the data flow in the context of the time window. Streaming SQL or Continuous SQL was born for this. Their UIs and functions are similar to regular SQL, but they have many other constructs that can be used to group streams within a specific time frame. They also support a series of aggregation functions to perform various augmentation tasks on the stream, such as querying averages, sums, counts, etc. These functions can immediately allow data analysts and data scientists to use SQL to query data streams! This is what we call the democratization of real-time data within the organization.
Figure: SQL Stream Builder brings the simplicity of SQL and the value of using real-time streaming to obtain data
Why users are excited about SQL Stream Builder?
Release all user roles' access rights to real-time data -- Data analysts and data scientists can use SQL Stream Builder to run temporary queries by themselves.
Simplify the process of building streaming applications -- SQL Stream Builder provides an interactive user interface that supports streaming SQL. This allows users to run continuous queries on the data stream within a specific time window. You can also add multiple data streams and perform aggregation.
Expose aggregated data stream to other applications -- SQL Stream Builder allows the creation of materialized views, which can be easily exposed to other applications through the REST API. This again releases the value locked in the real-time data stream to more applications throughout the enterprise.
Speed up queries with minimal impact on the core system -- The real function of SQL Stream Builder lies in its underlying engine, which can make these queries execute very fast without burdening the core system that saves such data streams. For example, Kafka broker saves the data stream in its topic.