A Journey of a Thousand Miles Begins with a Single Step

Apache Kafka Management and Monitoring

Monitoring for Apache Kafka is crucial to know the moment when to act or scale out your Kafka clusters. Besides the CLI commands, metrics are also accessible over JMX and jconsole. A more convenient way is to have a GUI that displays it. This post focus on Kafka Manager, a administration GUI for Kafka by Yahoo.

Read more

Testing YAML

YAML (YAML Ain’t Markup Language) is a essential part of Ansible playbooks. If you have really long options to pass to programs, yaml offers several possibilities to maintain it readable and thus maintainable. This article demonstrates how how to split a string over multiple lines in yaml.

Read more

Apache ZooKeeper in Production: Replicated ZooKeeper

Apache Kafka uses Apache ZooKeeper. Apache Kafka needs coordination and Apache ZooKeeper is the piece of software which provides it. Coordinating distributed applications is ZooKeeper’s job. As part of my Kafka evaluation I investigated how to run Apache ZooKeeper in a production scenario for Apache Kafka. This a detailed documentation and summary of my observations. I won’t go into detail how coordination is done for Apache Kafka with ZooKeeper. I might explain it in another article. This article focus on ZooKeeper in a production environment concerning High Availability scenarios.

Read more

Live Debugging Elasticsearch

Elasticsearch offers the capability to alter the log level at runtime, for troubleshooting. I got some problems with TLS and this was really helpful and the good thing: No cluster downtime! Elasticsearch uses Apache Log4j 2

Read more

Configure Git Credentials

Gitlab and Github offers personal access tokens for git access over https. They are the only accepted method of authentication when you have Two-Factor Authentication (2FA) enabled. Since I have a Yubikey, I have to use a personal access token, if SSH is not viable, e.g. working in safe guarded environment. A token however has the advantage that it can expire, thus forcing me to exchange it more frequently to hinder attack scenarios.

Read more

Jaegertracing with Elasticsearch Storage

Distributed Tracing with Jaeger by Uber Technologies is pretty impressive. As default you use Apache Cassandra as storage. Jaeger is also capable of using Elasticsearch 5/6. It took me some time and some code diving on github to find the respective options for Elasticsearch. But I finally got it together in this docker-compose.yml. My Elasticsearch Cluster runs with a commercial X-Pack license so we have to pass some authentication.

Read more

Accessing Mustache Arrays Element

The QA (Quality Assurance) team use simulators like Astrex to check and test respective changes and features. I was asked if I could bring the simulator logs into our Elasticsearch, for a real time purpose. Tailing log files is still difficult, except if you can use bash.

Read more

Parse XML content with Logstash

A customer of mine, requires xml data as separate field data for further investigation. The data itself is part of a log message that is processed by Logstash. Logstash provides the powerful XML filter plugin for further parsing.

Read more

Ship Docker Container Logs to Elasticsearch with Fluentd

By default, Docker captures the standard output (and standard error) of all your containers, and writes them in files using the JSON format. It is advised to set a max size, otherwise you will run out of disk space. Having unified logging with Elasticsearch allows you to investigate logs in a single point of view. Sending the logs to Elasticsearch from the Docker containers is quite easy. Fluentd is a data collector, which a Docker container can use by omitting the option --log-driver=fluentd.

Read more