Get Started Free
‹ Back to courses
course: Kafka Streams 101

Basic Operations

6 min
Screen Shot 2021-07-23 at 4.13.50 PM

Sophie Blee-Goldman

Senior Software Engineer (Presenter)

bill-bejeck

Bill Bejeck

Integration Architect (Author)

Basic Operations

This module covers the basic operations of Kafka Streams—such as mapping and filtering—but begins with its primary building blocks: event streams and topologies.

Event Streams

An event represents data that corresponds to an action, such as a notification or state transfer. An event stream is an unbounded collection of event records.

Key-Value Pairs

Apache Kafka® works in key-value pairs, so it’s important to understand that in an event stream, records that have the same key don't have anything to do with one another. For example, the image below shows four independent records, even though two of the keys are identical:

identical-keys

Topologies

To represent the flow of stream processing, Kafka Streams uses topologies, which are directed acyclic graphs ("DAGs").

topology

Each Kafka Streams topology has a source processor, where records are read in from Kafka. Below that are any number of child processors that perform some sort of operation. A child node can have multiple parent nodes, and a parent node can have multiple child nodes. It all feeds down to a sink processor, which writes out to Kafka.

Streams

You define a stream with a StreamBuilder, for which you specify an input topic as well as SerDes configurations, via a Consumed configuration object. These settings—covered in detail in Module 4—tell Kafka Streams how to read the records, which have String-typed keys and values in this example:

StreamBuilder builder = new StreamBuilder();
KStream<String, String> firstStream = 
builder.stream(inputTopic, Consumed.with(Serdes.String(), Serdes.String()));

A KStream is part of the Kafka Streams DSL, and it’s one of the main constructs you'll be working with.

Stream Operations

Once you've created a stream, you can perform basic operations on it, such as mapping and filtering.

Mapping

With mapping, you take an input object of one type, apply a function to it, and then output it as a different object, potentially of another type.

map

For example, using mapValues, you could convert a string to another string that omits the first five characters:

mapValues(value -> value.substring(5))

Map, on the other hand, lets you change the key and the value:

map((key, value) -> ..)

Filtering

With a filter, you send a key and value, and only the records that match a predicate make it through to the other side.

filter((key, value) -> Long.parseLong(value) > 1000)

filtering-operation

Use the promo code STREAMS101 to get $25 of free Confluent Cloud usage

Be the first to get updates and new content

We will only share developer content and updates, including notifications when new content is added. We will never send you sales emails. 🙂 By subscribing, you understand we will process your personal information in accordance with our Privacy Statement.

Basic Operations

Hi, I'm Sophie Blee-Goldman with Confluent. In this module, we're going to be talking about some basic operations of Kafka streams. Kafka streams work on unbounded sequence of events called event streams. Event streams are like the topics that you have on the brokers in Kafka. They are a sequence of records, which are like key value pairs in Kafka, and therefore also in Kafka streams. So in this example, we have two records with key A, and two records with key B. Now, normally in a key value store, you might think of subsequent records as overwriting the previous record with that same key. Now, the difference is with an event stream, each record is a standalone event, and that means that two records with the same key don't really have any relationship to each other. So in this example, we have a total of four events and not just two events in total. So what does Kafka streams actually do with those event streams? Well, when you define a Kafka streams application, what you're really doing is defining a processor topology. A Kafka stream's processor topology is just a directed acyclic graph or DAG, where you have processing nodes and edges that represent the streams or the flow of the event stream. Each processor topology is typically going to have source nodes, user processor nodes, and sink processor nodes. The source nodes are where the data is coming in. when it's consumed from the topic on the broker, gets passed into the source node. The source node then forwards it through one or more user processor nodes, which are just defined by some custom logic. And finally, they get sent to the sink processor node, where the producer sends the data back to Kafka to a new topic. That's what's actually happening when you write a Kafka streams application. But how do you actually write that application? The Kafka stream topology is gonna be defined using this streams builder class. So to define a stream, you're going to first create an instance of this stream builder class. Then you can use this builder to create a Kstream, which is just the Kafka streams abstraction over the event stream, and kind of the fundamental building unit of Kafka streams. And you can create this Kstream using the builder.stream. Now the builder that stream is going to just take in an input topic, which is just the name of the topic that you want to stream these events from, and a consumed configuration objects. Now we're gonna get a bit more into this in later modules, but for now, all you need to know is that this consumes just specifying the type of the key and of the value of the record. So in this case, we have a key and a value that are both strings. And now we have our string string Kstream. So now that you have your Kstream, what can you actually do with it? One of the things that people usually like to do is transform their data, and Kafka streams provides a mapping operation for that. So Kafka streams provides the typical functionality that you expect for a map. It takes an input record and it lets you produce a new key or value or even a new type entirely. And for this map in Kafka streams, we have two map operators. We have the map values and the map itself. So the map values operator takes in a value mapper. Now value mapper is just a functional interface defines a map method with a single input, which is the value. And the output is of course the value that you would like to produce after the mapping. So because it's a functional interface, you can express it with just the Lambda as we have here. And so all you need to do to map the values of this event stream, is pass in something like this, where you take in the sub string and extract out the first five characters of your string. So this could be useful in the case you have some personal data like credit card transactions, and you want to scrub out those first five characters in that string. Map values is the way to go for you. Similarly, the map takes the key and the value and it'll let you produce a new key and value. The difference being of course, that the key is changed. The important thing to note here is that anytime in Kafka streams when you can use the map values, you should prefer it unless you really have to change the key. The reason for this we'll get into a little bit later, it has to do with repartitioning, but you can think of it as a performance optimization that you should always prefer map values over map whenever it's possible. So another operator that we have in Kafka streams is the filter. So you can define a filter on a Kstream and it defines which events should be filtered out of the event stream, and which one should be kept. So a filter really creates a new event stream with only those events which you care about similar to the red widgets in our previous module. In the filter, you have access to the key and the value, and you define a predicate on these values and return just a true or false whether or not you want to keep them. So for example, you might have a long value and you only care about values that are greater than 1,000. And in this case, you can drop all the events you don't care about and continue on with only those values which you do. It's important to note that you should not actually modify anything on the key or the value in the filter operator or in any of the other operators. You should be thinking of this as creating a new event stream, not as modifying an existing one. We'll go into more detail on how to build a Kafka streams application in this first exercise.