Get Started Free
‹ Back to courses
course: ksqlDB 101

Introduction to ksqlDB

2 min
Allison

Allison Walther

Integration Architect (Presenter)

Robin Moffatt

Robin Moffatt

Principal Developer Advocate (Author)

ksqlDB Introduction

ksqlDB allows you to build stream processing applications on top of Apache Kafka with the ease of building traditional applications on a relational database. Using SQL to describe what you want to do rather than how, it makes it easy to build Kafka-native applications for processing streams of real-time data. Some key ksqlDB use cases include:

  • Materialized caches
  • Streaming ETL pipelines
  • Event-driven microservices

ksqlDB is designed from the principle of simplicity. While many streaming architectures require a grab-bag of components pieced together from many projects, ksqlDB provides a single platform on which you can build streaming ETL and streaming applications, and all with just a single dependency—Apache Kafka.

KSQLDB

Stream Processing

Regardless of how we choose to store it later, much of the data that we work with in our systems begins life as part of an unbounded event stream. Consider some common examples of event streams:

  • Customer interactions with a shopping cart on an e-commerce site
  • Network packets on a router
  • IoT device sensor readings
  • Manufacturing telemetry

We want to use this data for various things. Businesses drive processes in reaction to events, and they also want to analyze that data retrospectively.

Stream Processing in Action

Consider the example of a manufacturing plant, and the kind of data stream emitted by a production line:

{
    "reading_ts": "2021-06-24T09:30:00-05:00",
    "sensor_id": "aa-101",
    "production_line": "w01",
    "widget_type": "acme94",
    "temp_celcius": 23,
    "widget_weight_g": 100
}

Using this data, we may want to do multiple things:
  • Alert if the line produces an item that is over a threshold weight
  • Monitor the rate of item production
  • Detect anomalies in the equipment
  • Store the data for analytics dashboards and ad hoc querying

Stream processing is a method of processing events in an event stream, as they are happening. It is distinct from batch processing, in which events are processed at some delayed point in time, after they are created. With stream processing, we can react to events happening in the real world when they occur, rather than afterwards.

In the example of a production line, it is more desirable to alert as soon as the equipment shows abnormal temperatures. If the monitoring system waited to batch process the readings, the equipment may already be damaged or the production process impacted.

With ksqlDB, we can apply this concept of stream processing using SQL. The ability to use SQL, unlike other options that require complex coding to perform even simple tasks, makes stream processing accessible to many more developers and analysts.

  • Alert if the line produces an item that is over a threshold weight

       SELECT *
       FROM WIDGETS
       WHERE WEIGHT_G > 120
  • Monitor the rate of item production

       SELECT COUNT(*)
       FROM WIDGETS
       WINDOW TUMBLING (SIZE 1 HOUR)
       GROUP BY PRODUCTION_LINE
  • Detect anomalies in the equipment

       SELECT AVG(TEMP_CELCIUS) AS TEMP
       FROM WIDGETS
       WINDOW TUMBLING (SIZE 5 MINUTES)
       GROUP BY SENSOR_ID
       HAVING TEMP>20
  • Store the data for analytics dashboards and ad-hoc querying

       CREATE SINK CONNECTOR dw WITH (
       connector.class = S3Connector,
       topics = widgets
       []
       );

How does ksqlDB work?

ksqlDB separates its distributed compute layer from its distributed storage layer, for which it uses Apache Kafka.

ksqldb-kafka

ksqlDB allows us to read, filter, transform, or otherwise process streams and tables of events, which are backed by Kafka topics. We can also join streams and/or tables to meet the needs of our application. And we can do all of this using familiar SQL syntax.

ksqldb-computer

ksqlDB can also build stateful aggregations on event streams. How many orders have been placed in the last hour? Errors in the last five minutes? Current balance on an account? These aggregates are held in a state store within ksqlDB, and external applications can query the store directly.

Errata

  • The Confluent Cloud signup process illustrated in this video includes a step to enter payment details. This requirement has been eliminated since the video was recorded. You can now sign up without entering any payment information.

Use the promo code KSQLDB101 to get $25 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

Introduction to ksqlDB

Hi, I'm Allison Walther with Confluent. Welcome to the ksqlDB course. ksqlDB is a database purpose built for building real-time applications that leverage stream processing. Its architecture separates its distributed compute layer from its distributed storage layer, for which it uses and tightly integrates with Apache Kafka. Using a lightweight SQL syntax ksqlDB provides everything that a developer needs to quickly and efficiently create a complete real-time application, enabling you to unlock real-time business insights, and rich customer experiences. Let's discuss a bit more about what ksqlDB offers. Let's say you have a stream or Kafka topic that captures when a widget is manufactured by your company. And each data event in the stream also captures the color of the widget created. Now, perhaps you want a separate stream that filters only events that are built for blue widgets. ksqlDB would allow you to do that in real time. Perhaps you also want to merge two topics together, which is common for many stream processing use cases. ksqlDB can easily join topics to one another. Not all that dissimilar to how you could join two tables in a relational database. In this example, we can join that original topic to another topic that focuses on green and yellow widgets, and then filter out the messages that we care less about. ksqlDB can be used to aggregate streams into tables and capture summary statistics over a window of time within the stream. These tables can then be subsequently joined with other tables and streams. In this example, we're using stream processing to get a count of the number of widgets by color type within the topic. Again, these examples are very elementary and stream processing is of course capable of much more advanced logic, but this provides you with a high level idea of how ksqlDB can help you build applications that action on new events happening within your business immediately. Let's take a look at what this might look like in action. With ksqlDB, developers can use a lightweight SQL syntax to build a complete real-time application. This includes the ability to enrich input streams, to derive new event streams and creating aggregations of event streams that are continuously updated as new events occur. And that can serve both push and pull queries to applications. We will dive more into the various constructs that make up ksqlDB in a bit. By bringing the same feel of traditional application development on a relational database, ksqlDB opens up stream processing and real time application development up to many more developers within an organization than if only Kafka streams was used. Here's a look at the ksqlDB UI in Confluent Cloud. We have a graphical editor that allows us to quickly try out new queries or to create streams and tables. There are also pages for viewing details about streams, tables and the persistent queries that make them up. One very powerful feature is the flow page. With the flow page, we get an at-a-glance view of the components of our application and how they are interacting. And if you're building these event streaming applications, it's likely that you're going to want to work with data that exists in a variety of other data stores. Confluent provides 100 plus pre-built connectors to easily start moving your data in and out of Kafka, but it still requires deploying a separate Kafka connect cluster, which can take time. That's why ksqlDB supports running connectors directly on its servers, rather than needing to run a separate Kafka connect cluster for capturing events, ksqlDB can run pre-built connectors in embedded mode. Best of all, you can do this entirely with SQL syntax, making it much easier to work with data throughout your organization. Let's get started, but first here's a promo code for $101 of free Confluent Cloud usage. Head over there and get signed up. Then meet me back here to continue onward. Now let's get you signed up on Confluent Cloud. You'll land on the initial page and enter things like your name and your email. Click Agree, and the Email Me checkboxes. And now you can click the Submit button. You should check your email for confirmation. It should pop up within about a second. We'll be prompted next about what type of cluster we want. Let's use standard for now. Then we'll be asked about what cloud provider we would like to launch our cluster in. For this course let's use Google Cloud US West 4 in a single AZ. Now we'll navigate our way to the credit card page and fill in your own information. You can go ahead and expand the promo code field because we do have a promo code for you today. It's 101KSQLDB, Review your information that you've put in and go ahead and click Launch and look, a new cluster has appeared.