Get Started Free
Untitled design (21)

Tim Berglund

VP Developer Relations

ksqlDB

Kafka Streams works very well as a Java-based stream processing API, both to build scalable, standalone stream processing applications and to enrich Java applications with stream processing functionality that complements their other functions. But what if you don’t have an existing commitment to Java? Or what if you find it advantageous from an architectural or operational perspective to deploy a pure stream processing job without its own web interface or API to expose results to the front end? This is where ksqlDB comes in.

ksqlDB is a highly specialized kind of database that is optimized for stream processing applications. It runs on a scalable, fault-tolerant cluster of its own, exposing a REST interface to applications, which can then submit new stream processing jobs to run and query the results. The language in which those stream processing jobs and queries are defined is SQL. With REST and command line interface options, it doesn’t matter what language you use to build your applications. And it’s easy to get started within development mode, either running in Docker or on a single node running natively on a development machine.

Here’s some example ksqlDB code that does substantially the same thing as the Kafka Streams code we looked at previously:

CREATE TABLE rated_movies AS
   SELECT  title,
           release_year,
           sum(rating) / count(rating) AS avg_rating
   FROM ratings
   INNER JOIN movies ON ratings.movie_id = movies.movie_id
   GROUP BY title,
            release_year;

This query would result in a table whose key would be the composite of movie title and release year, and the value would be the average rating for the movie—and ksqlDB would provide query access to that table over its REST API. ksqlDB also provides an integration with Kafka Connect, allowing you to connect to external data sources from within the ksqlDB interface, running Connect either embedded in the cluster or in its own standalone cluster.

Overall, you can think of ksqlDB as a standalone, SQL-powered stream processing engine that performs continuous processing of event streams and exposes the results to applications in a database-like way. It aims to provide one mental model for most Kafka-based stream processing application workloads.

For a more detailed introduction to ksqlDB, check out the ksqlDB 101 course.

Use the promo codes KAFKA101 & CONFLUENTDEV1 to get $25 of free Confluent Cloud storage and skip credit card entry.

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

ksqlDB

Hey, Tim Berglund with Confluent here to tell you about ksqlDB. Now, you know Kafka Streams works very well as a Java-based stream-processing API. It's kind of built on a functional paradigm. A lot of complex operations or single method calls, and you can use it to build scalable, stand-alone stream-processing applications and to take existing Java applications, usually microservices, and enrich them with stream-processing functionality that compliments whatever else it is that they're doing. But what if you don't have an existing commitment to Java? I mean, I know that's some of you, and, believe me, we can still be friends, or what if you find it advantageous from an architectural or operational perspective to just deploy a pure stream-processing job without its own web interface or API, or it doesn't have anything else to do? It's not a service on its own. It's just getting stuff from a topic, computing, and putting stuff back out in a topic. Sometimes that's what you've got and what you want to do. This is where ksqlDB comes in. Now, we describe ksqlDB as the event-streaming database for Apache Kafka. That's not a standard term yet, so I'm gonna spend some time describing what I mean. It really is a highly-specialized kind of database that's optimized for stream-processing applications. It runs on a cluster of its own node. So again, it's this thing outside the Kafka cluster. I deploy the ksqlDB process to a server or a cluster of servers that look like clients to Kafka, to those Kaka brokers. So really, you've got now a scalable fault-tolerant cluster of these ksqlDB servers exposing a REST interface to applications which can then use that interface to do a few things. They can submit new stream-processing jobs. They look like SQL queries. They're really like these little stream-processing programs written in SQL, and query the results of those stream-processing programs also in SQL. The language, if you haven't quite caught this yet, with which all of these stream-processing jobs are written and which those queries are defined is SQL, and since you've got the REST interface and a command-line interface to be able to actually talk to the ksqlDB server, it just doesn't matter what language you build to use your applications. It can be anything, and it's super easy to run in development mode. There's a standard Docker image, and you can download, build from source, and just run natively on your laptop. It doesn't need to be some big set of infrastructure. It can run comfortably on a local machine. Here's some example ksqlDB code that does substantially the same thing as my example in my lesson on Kafka Streams. You'll notice that Kafka Streams code, if you looked at that, was pretty long. We had to scroll through it, and this I've been pretty generous with the new lines, and it still doesn't really take up a lot of space, and you can kinda read this. Now, there are some concepts lurking in there that require some explanation, and it's out of scope for me to give you all that explanation right now. This is not a complete ksqlDB tutorial, but you can read this. You probably know SQL, and you can kinda get that into your head and see, oh, okay, we're doing a join from a thing called ratings to a thing called movies, and if you thought about it a little bit, you might think ratings seems like a stream of events and movies seems like a table of entities, but hey, it's a join, and you get that, and you're grouping, and computing an average, and all that stuff's pretty readable. And I said this isn't a complete tutorial, but I can't help myself just briefly. This query would result in a table whose key would be the composite of movie title and release year. Why those two? Well, 'cause those are my group by parameters. When you group by, I mean, you're making a table, so it picks the key for you and that's title and release year, and the value would be the average rating for the movie, that avg_rating column that we computed, and ksqlDB would provide query access to that table over its REST API. So if I wanted to know the average rating for "Tree of Life," I can't recall, it was 2008, maybe, the release year? I'd have to know that. That's a part of the key. I could send that query to the ksqlDB server and very quickly get a result back, and if I don't wanna fuss with the API, and I am writing code in Java, there is a very nice Java library for this that provides a Java idiomatic wrapper for all those functions as well. Additionally, ksql provides an integration with Kafka Connect, allowing you to connect to external data sources from within the ksqlDB interface, so I can actually create a connector with a statement called, mysteriously, create connector that runs Kafka Connect either embedded inside the ksqlDB cluster, or if I already have a stand-alone Kafka Connect cluster, I can tell ksqlDB to go use that one, and I get to configure Kafka Connect from within the SQL environment. It's pretty cool. Overall, you can think of ksqlDB as a stand-alone SQL-powered stream-processing engine that performs continuous processing of event streams and exposes the results to applications in a very database-like way. Now, it's not gonna banish all of your Postgres instances into outer darkness and replace all of them with ksqlDB servers. That is not the intent of the project, but there is this database-like functionality that we want around event streams, and it's exactly that that ksqlDB is trying to give you. It aims to provide you a single mental model for most Kafka-based stream-processing application workloads in a single easy-to-use and familiar interface.