1. 2018-03-08 - Apache ZooKeeper in Production: Replicated ZooKeeper; Tags: Apache ZooKeeper in Production: Replicated ZooKeeper
    Loading...

    Apache ZooKeeper in Production: Replicated ZooKeeper

    Apache Kafka uses Apache ZooKeeper. Apache Kafka needs coordination and Apache ZooKeeper is the piece of software which provides it. Coordinating distributed applications is ZooKeeper’s job. As part of my Kafka evaluation I investigated how to run Apache ZooKeeper in a production scenario for Apache Kafka. This a detailed documentation and summary of my observations. I won’t go into detail how coordination is done for Apache Kafka with ZooKeeper. I might explain it in another article. This article focus on ZooKeeper in a production environment concerning High Availability scenarios.

    General

    Some details about my scenario:

    • Using 9 nodes (Yes that my production, btw. Raspberry Pi is pretty cheap in case you wanna try it out yourself)
    • Node List: 1. alpha, 2. beta, 3. gamma, 4. delta, 5. epsilon, 6. zeta, 7. eta, 8. theta, 9. omega
    • Deployment in Docker Containers with Ansible to above nodes, using this ZooKeeper Docker image
    • Ports:
      • 2181 - the client port for Apache Kafka or other clients
      • 2888 - port for ZooKeeper to connect to other ZooKeeper peers to coordinate
      • 3888 - the port for leader election, if the cluster needs to determine who is in charge
      • port 2888 and 3888 must not be exposed regarding Docker Networking

    Getting Started

    • From the Apache Zookeeper Guide: Using Apache ZooKeeper in production, you should run it in replicated mode.

    What does replicated mode mean?

    • A replicated group of servers in the same application is called a quorum,
    • and in replicated mode, all servers in the quorum have copies of the same configuration file.

    Some notes on Kafka’s ZooKeeper usage:

    • You have more than one ZooKeeper instance for Apache Kafka.
    • If Kafka uses a ZooKeeper cluster, some called it ensemble (Kafka in Action).

    In the Kafka server.properties you can provide a connection string with all the ZooKeeper instances. What about a Load Balancer? Let’s leave that out of the equation :wink:.

    Kafka’s server.properties

    >zookeeper.connect="alpha:2181,beta:2181,gamma:2181,delta:2181,epsilon:2181,zeta:2181,eta:2181,theta:2181,omega:2181"

    Architecture Design

    Some information from Apache ZooKeeper

    Minimum Requirement

    Apache ZooKeeper Getting Started:

    For replicated mode, a minimum of three servers are required, and it is strongly recommended that you have an odd number of servers. If you only have two servers, then you are in a situation where if one of them fails, there are not enough machines to form a majority quorum. Two servers is inherently less stable than a single server, because there are two single points of failure.

    Apache ZooKeeper Administration Guide:

    Usually three servers is more than enough for a production install, but for maximum reliability during maintenance, you may wish to install five servers. With three servers, if you perform maintenance on one of them, you are vulnerable to a failure on one of the other two servers during that maintenance. If you have five of them running, you can take one down for maintenance, and know that you’re still OK if one of the other four suddenly fails.

    Summary:

    • minimum 3 servers
    • odd numbers is better for majority election
    • recommendation is 5 servers

    To my specific scenario:

    • Having 9 servers the majority election takes place with 5 servers!
    • 3 ZooKeeper servers are not the majority in a cluster of 9!

    Deployment

    As I have stated Ansible is used to ship ZooKeeper in Docker containers.

    Ansible Playbook

    Here is the playbook.yml and explained in detail.

    # playbook for Apache Zookeeper deployment with docker containers
    # some problems with exposed ports, switch to network_mode host as workaround
    ---
    
    - hosts: all
      vars:
        image: zookeeper
        ports:
          - "2181:2181"
          - "2888:2888"
          - "3888:3888"
        ids:
          alpha:    1
          beta:     2
          gamma:    3
          delta:    4
          epsilon:  5
          zeta:     6
          eta:      7
          theta:    8
          omega:    9
        mappings:
          ZOO_SERVERS: >
            server.1=alpha:2888:3888
            server.2=beta:2888:3888
            server.3=gamma:2888:3888
            server.4=delta:2888:3888
            server.5=epsilon:2888:3888
            server.6=zeta:2888:3888
            server.7=eta:2888:3888
            server.8=theta:2888:3888
            server.9=omega:2888:3888
          ZOO_MY_ID: "{{ids[ansible_hostname]}}"
      tasks:
        - name: deploy
          docker_container:
            env:
              "{{mappings}}"
            name: zookeeper
            image: zookeeper
            network_mode: host
            pull: true
            log_opt:
              max-file: "3"
              max-size: 25m
            state: started
            restart: yes
            restart_policy: always
            restart_retries: 10
            volumes:
              - "/var/opt/zookeeper:/data"
              - "/var/log/zookeeper:/datalog"
    

    The environment variables are important.

    • ZOO_MY_ID = Each ZooKeeper server needs a unique id. We pass it by using a dictionary, i.e. alpha has the id 1.
    • ZOO_SERVERS = The server list is mandatory for Apache ZooKeeper. Each line is concatenated to a single line and pass as docker environment argument.

    Run to deploy ansible-playbook playbook.yml.

    Inspect Deployment

    Each ZooKeeper server is shipped in a docker container with the name zookeeper. Inspecting the alpha container:

    tan@alpha:~> docker exec -it zookeeper /bin/bash
    bash-4.4# cat /conf/
    configuration.xsl  log4j.properties   zoo.cfg            zoo_sample.cfg
    bash-4.4# cat /conf/zoo.cfg
    clientPort=2181
    dataDir=/data
    dataLogDir=/datalog
    tickTime=2000
    initLimit=5
    syncLimit=2
    maxClientCnxns=60
    server.1=alpha:2888:3888
    server.2=beta:2888:3888
    server.3=gamma:2888:3888
    server.4=delta:2888:3888
    server.5=epsilon:2888:3888
    server.6=zeta:2888:3888
    server.7=eta:2888:3888
    server.8=theta:2888:3888
    server.9=omega:2888:3888
    
    • The passed arguments are used to write the ZooKeeper configuration in /conf.
    • Pay attention that the docker image does not used the conf directory within the ZooKeeper shipment.
    bash-4.4# cd conf/
    bash-4.4# ls
    bash-4.4# pwd
    /zookeeper-3.4.11/conf
    

    We have seen that the server list is written in the ZooKeeper configuration, but where is ZOO_MY_ID is stored?

    ZooKeeper stores it in myid in the data directory. This is the mounted volume /var/opt/zookeeper.

    On the docker host system:

    tan@alpha:/var/opt/zookeeper> cat myid
    1
    

    Leader Election

    How does ZooKeeper elect its leader? As we know this is a majority election. For demonstration purposes, I will start node by node to illustrate the behavior of ZooKeeper, since I have found some bogus information on various blog pages, I want to prevent you from misinformation. As we know 5 servers are mandatory for leader election.

    Initial

    Start Server 1: alpha

    tan@alpha:/opt/ansible/zookeeper> ansible-playbook -l alpha playbook.yml
    
    PLAY [all] *********************************************************************
    
    TASK [setup] *******************************************************************
    ok: [alpha]
    
    TASK [deploy] ******************************************************************
    changed: [alpha]
    
    PLAY RECAP *********************************************************************
    alpha: ok=2    changed=1    unreachable=0    failed=0
    

    ZooKeeper allows us to issue four letter commands via telnet or nc (netcat) to check its status with the stats command.

    tan@alpha:/opt/ansible/zookeeper> telnet alpha 2181
    Trying 10.22.62.124...
    Connected to alpha.
    Escape character is '^]'.
    stat
    This ZooKeeper instance is not currently serving requests
    Connection closed by foreign host.
    

    The message This ZooKeeper instance is not currently serving requests is important for us, though this node is up but not operational. Another command is ruok (are you ok?).

    tan@alpha:/opt/ansible/zookeeper> telnet alpha 2181
    Trying 10.22.62.124...
    Connected to fo-ppd01-dc1.
    Escape character is '^]'.
    ruok
    imok
    Connection closed by foreign host.
    

    ZooKeeper responds imok (I am ok :smile:). The alpha node is up.

    Start other nodes

    Starting beta ZooKeeper server 2 and check with netcat

    ansible-playbook -l beta playbook.yml
    echo stats | nc beta 2181
    This ZooKeeper instance is not currently serving requests
    

    Repeat this until server 5 (epsilon).

    tan@alpha:/opt/ansible/zookeeper> echo stats | nc epsilon 2181
    Zookeeper version: 3.4.11-37e277162d567b55a07d1755f0b31c32e93c01a0, built on 11/01/2017 18:06 GMT
    Clients:
     /10.22.62.128:56598[0](queued=0,recved=1,sent=0)
    
    Latency min/avg/max: 0/0/0
    Received: 2
    Sent: 1
    Connections: 1
    Outstanding: 0
    Zxid: 0x600000000
    Mode: leader
    Node count: 4
    

    We see that ZooKeeper was able to elect the leader as the majority vote could take place. Check the logs:

    2018-03-08 10:42:59,758 [myid:5] - INFO  [LearnerHandler-/10.22.62.124:55684:LearnerHandler@535] - Received NEWLEADER-ACK message from 1
    2018-03-08 10:42:59,758 [myid:5] - INFO  [LearnerHandler-/10.22.190.126:39461:LearnerHandler@535] - Received NEWLEADER-ACK message from 4
    2018-03-08 10:42:59,758 [myid:5] - INFO  [LearnerHandler-/10.22.190.121:49735:LearnerHandler@535] - Received NEWLEADER-ACK message from 3
    2018-03-08 10:42:59,775 [myid:5] - INFO  [LearnerHandler-/10.22.62.126:39509:LearnerHandler@535] - Received NEWLEADER-ACK message from 2
    2018-03-08 10:42:59,776 [myid:5] - INFO  [QuorumPeer[myid=5]/0:0:0:0:0:0:0:0:2181:Leader@962] - Have quorum of supporters, sids: [ 1,2,3,4,5 ]; starting up and setting last processed zxid: 0x600000000
    

    Check the previous nodes:

    tan@alpha:~> for i in alpha beta gamma delta; do echo $i: $(echo stats | nc $i 2181 | grep "Mode"); done
    alpha: Mode: follower
    beta: Mode: follower
    gamma: Mode: follower
    delta: Mode: follower
    

    I have written a small gist how to check for all nodes who is the leader.

    Important: The startup order of a ZooKeeper server is not relevant, i.e. it does not matter that alpha has to be started first.

    High Availability Scenarios

    • Having 9 nodes, means that you must operate with 5 nodes, in order to keep the cluster operational.
    • 4 nodes can be altered or upgraded in the mean time.

    • The recommended scenario is 5 nodes, that means 3 nodes must be alive.
    • If only 2 nodes are alive, ZooKeeper will stop serving requests until the third node is up again.

    Summary

    • Running Zookeeper in Replicated Mode is simple.
    • Ansible und Docker are great and essential for maintaining a production cluster.
    • Some details where intentionally left out, e.g. how much disk space must or should a ZooKeeper server have. This is really dependent on your use case with Apache Kafka or Apache Cassandra using Apache ZooKeeper.

    Comments


    Leave a comment