Sandbox for learning Druid + Kafka

Docker + Druid + Kafka

Objective today

What I want to do in this article is give you the tools you need to get started with Apache Druid + Apache Kafka so you can jump in and start learning how great these two are together.

By the end of this article I want you to be in a position to make use of a Druid+Kafka sandbox that you can start up and shut down within which you can do your learning.

What you are going to need

Not much. You're going to need to have docker installed wherever you want your sandbox to run. You can follow these instructions if, like me, you're using a mac.

You're going to need command line access to this enviroment. Well, that's pretty easy - right?

And you're going to need a mug of tea and of course a few biscuits. I would estimate this to be a two-biscuit-tutorial. I'll take you through the steps one biscuit at a time.

Biscuit#1

From the command line create yourself a new directory. Maybe call it something like test. Download these two files:

mkdir test
cp ~/Downloads/kd.yml test/
cp ~/Downloads/environment test/

cd test

Quick sip of tea.

Biscuit#2

Lets use docker to start druid and kafka up.

docker-compose -f kd.yml up -d

docker-compose -f kd.ymp ps
docker-compose -f kd.yml ps                                                                                                                                              ✔  3466  18:22:32
WARNING: Compose V1 is no longer supported and will be removed from Docker Desktop in an upcoming release. See https://docs.docker.com/go/compose-v1-eol/
     Name                   Command               State                                                                                   Ports
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
broker           /druid.sh broker                 Up      0.0.0.0:8082->8082/tcp
coordinator      /druid.sh coordinator            Up      0.0.0.0:8081->8081/tcp
historical1      /druid.sh historical             Up      0.0.0.0:8083->8083/tcp
kafkabroker      /etc/confluent/docker/run        Up      0.0.0.0:9091->9091/tcp, 9092/tcp, 0.0.0.0:9101->9101/tcp
middlemanager1   /druid.sh middleManager          Up      0.0.0.0:8091->8091/tcp, 0.0.0.0:8100->8100/tcp, 0.0.0.0:8101->8101/tcp, 0.0.0.0:8102->8102/tcp, 0.0.0.0:8103->8103/tcp, 0.0.0.0:8104->8104/tcp, 0.0.0.0:8105->8105/tcp
postgres         docker-entrypoint.sh postgres    Up      5432/tcp
router           /druid.sh router                 Up      0.0.0.0:8888->8888/tcp
zookeeper        /docker-entrypoint.sh zkSe ...   Up      0.0.0.0:2181->2181/tcp, 2888/tcp, 3888/tcp, 8080/tcp

Proof of the pudding....

That's it. You're done - we have Apache Druid up and running, Apache Kafka is working, and they are both on the same network and able to talk to each other (we'll confirm this fact with more biscuits in later tutorials)

First lets show that Druid is working. Point your browser at localhost:8888

You should see the console above. It can take a few minutes for everything to start up so be patient.

Next lets check that Kafka is working. that's an easy command we can run:

docker exec kafkabroker kafka-topics --bootstrap-server localhost:9091 --list                                                                                             ✔  3486  19:17:12
__consumer_offsets
_confluent-command
_confluent-metrics
_confluent-telemetry-metrics
_confluent_balancer_api_state

And there you go. Druids working. Kafkas running. Our work here is done!

What's next?

Next tutorial we'll make this environment a bit more useful. start to generate some data in Kafka, ingest it in druid, put some pretty visualisations on the top. Watch this space!