How to Setup an Apache Cassandra Cluster

This article is a guide to setup an Apache Cassandra cluster. The cluster runs on local CentOS virtual machines using Virtualbox. I use this to have a local environment for development and testing.

Prerequisites

It assumes you are using the following "software" versions.

  • MacOS 10.11.3
  • Vagrant 1.8.5
  • Java 1.8.0
  • Apache Cassandra 1.2.2

Here are the steps I used:

  1. First, create a workspace.

    mkdir -p ~/vagrant_boxes/cassandra

    cd ~/vagrant_boxes/cassandra

  2. Next, create a new vagrant box. I'm using a minimal CentOS vagrant box.

    vagrant box add "CentOS 6.5 x86_64" https://github.com/2creatives/vagrant-centos/releases/download/v6.5.3/centos65-x86_64-20140116.box

  3. I need a basic VM to install the packages. This command creates one.

    vagrant init -m "CentOS 6.5 x86_64" cassandra_base

  4. Next, change the Vagrantfile to the following:

      Vagrant.configure(2) do |config|
        config.vm.box = "CentOS 6.5 x86_64"
        config.vm.box_url = "cassandra_base"
        config.ssh.insert_key = false
      end
    

  5. Now, install Cassandra and it's dependencies.

    vagrant up

    vagrant ssh

    sudo yum install java-1.8.0-openjdk-devel

    sudo yum install wget

    wget http://www-us.apache.org/dist/cassandra/3.9/apache-cassandra-3.9-bin.tar.gz ~

    gunzip -c *gz | tar xvf -

  6. Open up your ~/.bash_profile and append the following lines.

      export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64
      export PATH=$PATH:$JAVA_HOME/bin
      export CASSANDRA_HOME=~/apache-cassandra-3.9
      export PATH=$PATH:$CASSANDRA_HOME/bin
      export CASSANDRA_CONF_DIR=$CASSANDRA_HOME/conf
    

  7. Source the profile.

    source ~/.bash_profile

  8. Create a ~/.ssh/config file to avoid host key checking for SSH. Since these are DEV servers, this is ok. Note that the indentation here before StrictHostKeyChecking must be a tab.

      Host *
            StrictHostKeyChecking no
    

  9. Now run these commands to finish the password-less authentication.

    chmod 600 ~/.ssh/config

    ssh-keygen -f ~/.ssh/id_rsa -t rsa -P ""

    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

  10. In /etc/hosts, add the following lines.

      192.168.50.41 cassandra1.example.com
      192.168.50.42 cassandra2.example.com
      192.168.50.43 cassandra3.example.com
      192.168.50.44 cassandra4.example.com
      192.168.50.45 cassandra5.example.com
    

  11. In ~/apache-cassandra-3.9/conf/cassandra.yaml, comment out the line below. This will tell Cassandra to bind to the IP Address and allow our host machine to connect to it.

    rpc_address: localhost

  12. It's a good idea to install Python 2.7 here, since the CQL shell (cqlsh) command requires that version. The image of CentOS we are using has Python 2.6. Follow the instructions here to update Python.

  13. Exit the SSH session and copy the VM for the other Cassandra nodes.

    exit

    vagrant halt

    vagrant package

    vagrant box add cassandra ~/vagrant_boxes/cassandra/package.box

  14. Edit the Vagrantfile to look like the following below. This will create 5 Cassandra nodes.

      Vagrant.configure("2") do |config|
        (1..5).each do |i|
          config.vm.define "cassandra#{i}" do |node|
            node.vm.box = "cassandra"
            node.vm.box_url = "cassandra#{i}.example.com"
            node.vm.hostname = "cassandra#{i}.example.com"
            node.vm.network :private_network, ip: "192.168.50.4#{i}"
            node.ssh.insert_key = false
    
            # Replace the "listen_address" line in the conf file.
            node.vm.provision "shell", inline: "sed -i 's/^cluster_name: .*/cluster_name: \"My Cluster\"/' ~/apache-cassandra-3.9/conf/cassandra.yaml", privileged: false
            node.vm.provision "shell", inline: "sed -i 's/- seeds: .*/- seeds: \"192.168.50.41\"/' ~/apache-cassandra-3.9/conf/cassandra.yaml", privileged: false
            node.vm.provision "shell", inline: "sed -i 's/listen_address: .*/listen_address: \"192.168.50.4#{i}\"/' ~/apache-cassandra-3.9/conf/cassandra.yaml", privileged: false
          end
        end
      end
    

  15. Bring the new Vagrant VMs up.

    vagrant up --no-provision

    vagrant provision

  16. Start Cassandra. It's tricky because each node has to start up completely so it can join the cluster. I ssh into each VM individually and start Cassandra and wait about 60 seconds until it has completely started.

    vagrant ssh cassandra1

    ~/apache-cassandra-3.9/bin/cassandra

    # Wait 60 seconds

    exit

    vagrant ssh cassandra2

    ~/apache-cassandra-3.9/bin/cassandra

    # Wait 60 seconds

    exit

    vagrant ssh cassandra3

    ~/apache-cassandra-3.9/bin/cassandra

    # Wait 60 seconds

    exit

    vagrant ssh cassandra4

    ~/apache-cassandra-3.9/bin/cassandra

    # Wait 60 seconds

    exit

    vagrant ssh cassandra5

    ~/apache-cassandra-3.9/bin/cassandra

    # Wait 60 seconds

    exit

To verify Cassandra

  1. Verify that Cassandra is running correctly.

    vagrant ssh cassandra1

    ~/apache-cassandra-3.9/bin/nodetool status
    Datacenter: datacenter1
    =======================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    -- Address Load Tokens Owns (effective) Host ID Rack
    UN 192.168.50.41 108.49 KiB 256 38.0% 74b9bb75-e941-4e4c-9900-0b30c7b204b4 rack1
    UN 192.168.50.42 108.2 KiB 256 41.8% 8ee92d98-2a83-491d-b39f-fc3263217dc7 rack1
    UN 192.168.50.43 84.1 KiB 256 41.0% f30a37fd-8763-4f11-a4be-7f413e062bd9 rack1
    UN 192.168.50.44 115.62 KiB 256 38.8% 15356519-30e0-435e-a865-47abfbef5a81 rack1
    UN 192.168.50.45 96.62 KiB 256 40.4% 213e082a-92fd-4948-84be-fe941a0801b1 rack1

Setting up Python on Linux and using Virtualenv

I am writing this because I found out that there are a couple of different ways to install Python on a *nix based computer. I ran around in circles for a while before I figured it out.

Basically, I realized this. In order to have multiple versions of Python

A) Install the Python version and then alias to it.

B) Use something like virtualenv to allow me to switch between versions.

After reading this post, virtualenv seemed like a powerful option. It seems to follow the similar pattern of nvm and sdkman, which are really great tools for developers that want to rapidly switch between different projects.

So, here is how I install Python now.

This is on a CentOS 6.5 machine which has Python 2.6 installed on it.

  1. From the command prompty, install the libraries needed for Python.

    sudo yum install zlib-devel openssl openssl-devel

  2. Install wget.

    sudo yum install wget

  3. Install Python 2.7. I use the --prefix command to install it to an isolated location.

    cd ~

    wget http://www.python.org/ftp/python/2.7.8/Python-2.7.8.tar.xz

    xz -d Python-2.7.8.tar.xz

    tar -xvf Python-2.7.8.tar

    cd ~/Python-2.7.8

    ./configure --prefix=/opt/python2.7

    make

    sudo make altinstall

  4. Now, create an alias in your ~/.bash_profile to the new version of Python. Add the following line to the bottom of the file.

      alias python="/opt/python2.7/bin/python2.7"
    

  5. Source the profile.

    source ~/.bash_profile

Now, when you type python at the command prompt, it should bring you into the 2.7 shell and not the 2.6. The downside here is all of your python packages will be shared. You risk running the wrong package when you update Python.

Virtualenv

Using virtualenv, you could skip the step of having to mess with the aliases. Another benefit is the pip packages stay isolated from the other versions.

So, from the last step, here's how we can install virtualenv.

  1. In the ~/.bash_profile, remove the alias for python that we just added.

  2. Source the profile.

    source ~/.bash_profile

  3. Install virtualenv.

    sudo yum install python-pip

    sudo pip install virtualenv

  4. Create the virtual environment.

    virtualenv --system-site-packages -p /opt/python2.7/bin/python2.7 ~/python-vms/python2.7

    cd ~/python-vms/python2.7

    source bin/activate

How to setup a basic HBase cluster

This article is a guide to setup a HBase cluster. The cluster runs on local CentOS virtual machines using Virtualbox. I use this to have a local environment for development and testing.

Prerequisites

This setup guides assumes you have gone through the Hadoop Setup Guide and the Zookeeper Setup Guide.

It assumes you are using the following software versions.

  • MacOS 10.11.3
  • Vagrant 1.8.5
  • Java 1.8.0
  • Zookeeper 3.4.8
  • Hadoop 2.7.3
  • HBase 1.2.2

Here are the steps I used:

  1. First, create a workspace.

    mkdir -p ~/vagrant_boxes/hbase

    cd ~/vagrant_boxes/hbase

  2. Next, create a new vagrant box. I'm using a minimal CentOS vagrant box.

    vagrant box add "CentOS 6.5 x86_64" https://github.com/2creatives/vagrant-centos/releases/download/v6.5.3/centos65-x86_64-20140116.box

  3. We are going to create a vagrant box with the packages we need. So, first we initialize the vagrant box.

    vagrant init -m "CentOS 6.5 x86_64" hbase_base

  4. Next, change the Vagrantfile to the following:

      Vagrant.configure(2) do |config|
        config.vm.box = "CentOS 6.5 x86_64"
        config.vm.box_url = "hbase_base"
        config.ssh.insert_key = false
      end
    

  5. Now, install HBase and it's dependencies.

    vagrant up

    vagrant ssh

    sudo yum install java-1.8.0-openjdk-devel

    sudo yum install wget

    wget https://www.apache.org/dist/hbase/stable/hbase-1.2.2-bin.tar.gz ~

    gunzip -c *gz | tar xvf -

  6. Open up your ~/.bash_profile and append the following lines.

      export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64
      export PATH=$PATH:$JAVA_HOME/bin
      export HBASE_HOME=~/hbase-1.2.2
      export PATH=$PATH:$HBASE_HOME/bin
      export HBASE_CONF_DIR=$HBASE_HOME/conf
    

  7. Source the profile.

    source ~/.bash_profile

  8. Create a ~/.ssh/config file to avoid host key checking for SSH. Since these are DEV servers, this is ok. Note that the indentation here before StrictHostKeyChecking must be a tab.

      Host *
            StrictHostKeyChecking no
    

  9. Now run these commands to finish the password-less authentication.

    chmod 600 ~/.ssh/config

    ssh-keygen -f ~/.ssh/id_rsa -t rsa -P ""

    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

  10. In /etc/hosts, remove any lines starting with 127.0.*, and add the following lines.

      192.168.50.11 zoo1.example.com
      192.168.50.12 zoo2.example.com
      192.168.50.13 zoo3.example.com
      192.168.50.14 zoo4.example.com
      192.168.50.15 zoo5.example.com
      192.168.50.21 hdfs-namenode.example.com
      192.168.50.22 hdfs-datanode1.example.com
      192.168.50.23 hdfs-datanode2.example.com
      192.168.50.24 hdfs-datanode3.example.com
      192.168.50.25 hdfs-datanode4.example.com
      192.168.50.31 hbase-master.example.com
      192.168.50.32 hbase-region1.example.com
      192.168.50.33 hbase-region2.example.com
      192.168.50.34 hbase-region3.example.com
      192.168.50.35 hbase-region4.example.com
    

  11. In ~/hbase-1.2.2/conf/hbase-env.sh, append the following lines to the bottom of the file.

      export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64
      export HBASE_MANAGES_ZK=false
    

  12. Edit ~/hbase-1.2.2/conf/hbase-site.xml to contain the following:

      <configuration>
        <property>
          <name>hbase.rootdir</name>
          <value>hdfs://hdfs-namenode.example.com:9000/hbase</value>
        </property>
        <property>
          <name>hbase.master.hostname</name>
          <value>hbase-master.example.com</value>
        </property>
        <property>
          <name>hbase.cluster.distributed</name>
          <value>true</value>
        </property>
        <property>
          <name>hbase.zookeeper.quorum</name>
          <value>zoo1.example.com,zoo2.example.com,zoo3.example.com,zoo4.example.com,zoo5.example.com</value>
        </property>
        <property>
          <name>hbase.zookeeper.property.dataDir</name>
          <value>/tmp/zookeeper</value>
        </property>
      </configuration>
    

  13. In ~/hbase-1.2.2/conf/regionservers, remove localhost and add the following lines:

      hbase-region1.example.com
      hbase-region2.example.com
      hbase-region3.example.com
      hbase-region4.example.com
    

  14. The docs say you can create a "backup-masters" file in the conf directory, but I had a problem starting my cluster when I did. So, I skipped this step.

  15. Exit the SSH session and copy the VM for the other hbase nodes.

    exit

    vagrant halt

    vagrant package

    vagrant box add hbase ~/vagrant_boxes/hbase/package.box

  16. Edit the Vagrantfile to look like the following below. This will create 5 hbase nodes for us using the new HBase VM.

      Vagrant.configure("2") do |config|
        config.vm.define "hbase-master" do |node|
          node.vm.box = "hbase"
          node.vm.box_url = "hbase-master.example.com"
          node.vm.hostname = "hbase-master.example.com"
          node.vm.network :private_network, ip: "192.168.50.31"
          node.ssh.insert_key = false
    
          # Change hostname
          node.vm.provision "shell", inline: "hostname hbase-master.example.com", privileged: true
        end
    
        (1..4).each do |i|
          config.vm.define "hbase-region#{i}" do |node|
            node.vm.box = "hbase"
            node.vm.box_url = "hbase-region#{i}.example.com"
            node.vm.hostname = "hbase-region#{i}.example.com"
            node.vm.network :private_network, ip: "192.168.50.3#{i+1}"
            node.ssh.insert_key = false
    
            # Change hostname
            node.vm.provision "shell", inline: "hostname hbase-region#{i}.example.com", privileged: true
          end
        end
      end
    

  17. Bring the new Vagrant VMs up.

    vagrant up --no-provision

  18. Start HBase. For some reason, I can start HBase from the provisioner. So, I ssh in and start it up.

    vagrant provision

    vagrant ssh hbase-master

    ~/hbase-1.2.2/bin/start-hbase.sh

To test the cluster:

  1. Log into the Master Server and run 'jps' on the command line. You should see at least these two process.

    jps
    Jps
    HMaster

  2. Log into one of the Region Servers and run 'jps' on the command line. You should see at least these three processes.

    jps
    Jps
    HMaster
    HRegionServer

  3. Go to http://192.168.50.31:16010/ and you should see all of the Region Servers running.

  4. From the Master Server, start the HBase shell.

    vagrant ssh hbase-master

    sudo ~/hbase-1.2.2/bin/hbase shell

  5. At the command prompt, you should be able to create a table.

    create 'test', 'cf'

  6. And you should be able to list the table.

    list

  7. And you should be able to put date into the table.

    put 'test', 'row1', 'cf:a', 'value1'

    put 'test', 'row2', 'cf:b', 'value2'

    put 'test', 'row3', 'cf:c', 'value3'

  8. And you should be able to view all the data in the table.

    scan 'test'

  9. Or just get one row.

    get 'test', 'row1'

How to setup a basic Hadoop cluster

This article is a guide to setup a Hadoop cluster. The cluster runs on local CentOS virtual machines using Virtualbox. I use this to have a local environment for development and testing.

I followed many of the steps Austin Ouyang laid out in the blog post here. Hopefully, next I can document using moving these virtual machines to another cloud provider.

Prerequisites

It assumes you are using the following software versions.

  • MacOS 10.11.3
  • Vagrant 1.8.5
  • Java 1.8.0 (Using JRE is fine, use the JDK to run MapReduce examples later)
  • Hadoop 2.7.3

Here are the steps I used:

  1. First, create a workspace.

    mkdir -p ~/vagrant_boxes/hadoop

    cd ~/vagrant_boxes/hadoop

  2. Next, create a new vagrant box. I'm using a minimal CentOS vagrant box.

    vagrant box add "CentOS 6.5 x86_64" https://github.com/2creatives/vagrant-centos/releases/download/v6.5.3/centos65-x86_64-20140116.box

  3. We are going to create a vagrant box with the packages we need. So, first we initialize the vagrant box.

    vagrant init -m "CentOS 6.5 x86_64" hadoop_base

  4. Next, change the Vagrantfile to the following:

    Vagrant.configure(2) do |config|
    	config.vm.box = "CentOS 6.5 x86_64"
    	config.vm.box_url = "hadoop_base"
    	config.ssh.insert_key = false
    end
    

  5. Now, install Hadoop and it's dependencies.

    vagrant up

    vagrant ssh

    sudo yum install java-1.8.0-openjdk-devel

    sudo yum install wget

    wget http://apache.claz.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz ~

    gunzip -c *gz | tar xvf -

  6. Open up your ~/.bash_profile and append the following lines.

      export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64
      export PATH=$PATH:$JAVA_HOME/bin
      export HADOOP_HOME=~/hadoop-2.7.3
      export PATH=$PATH:$HADOOP_HOME/bin
      export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    

  7. Source the profile.

    source ~/.bash_profile

  8. In /etc/hosts, add the following lines:

      192.168.50.21 namenode.example.com
      192.168.50.22 datanode1.example.com
      192.168.50.23 datanode2.example.com
      192.168.50.24 datanode3.example.com
      192.168.50.25 datanode4.example.com
    

  9. In $HADOOP_CONF_DIR/hadoop-env.sh, replace the ${JAVA_HOME} variable.

      # The java implementation to use.
      export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64
    

  10. Edit the $HADOOP_CONF_DIR/core-site.xml file to have the following XML:

      <configuration>
        <property>
          <name>fs.defaultFS</name>
          <value>hdfs://namenode.example.com:9000</value>
        </property>
      </configuration>
    

  11. Edit the $HADOOP_CONF_DIR/yarn-site.xml file to have the following XML:

      <configuration>
        <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
        </property>
        <property>
          <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
          <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
          <name>yarn.resourcemanager.hostname</name>
          <value>namenode.example.com</value>
        </property>
      </configuration>
    

  12. Now, copy the mapred-site.xml file from a template.

    cp $HADOOP_CONF_DIR/mapred-site.xml.template $HADOOP_CONF_DIR/mapred-site.xml

  13. Edit the $HADOOP_CONF_DIR/mapred-site.xml to have the following XML:

      <configuration>
        <property>
          <name>mapreduce.jobtracker.address</name>
          <value>namenode.example.com:54311</value>
        </property>
        <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
        </property>
      </configuration>
    

  14. Edit the $HADOOP_CONF_DIR/hdfs-site.xml file to have the following XML:

      <configuration>
        <property>
          <name>dfs.replication</name>
          <value>3</value>
        </property>
        <property>
          <name>dfs.namenode.name.dir</name>
          <value>/data/hadoop/hdfs/namenode</value>
        </property>
        <property>
          <name>dfs.datanode.data.dir</name>
          <value>/data/hadoop/hdfs/datanode</value>
        </property>
      </configuration>
    

  15. Make the data directories.

    sudo mkdir -p /data/hadoop/hdfs/namenode

    sudo mkdir -p /data/hadoop/hdfs/datanode

    sudo chown -R vagrant:vagrant /data/hadoop

  16. In $HADOOP_CONF_DIR/masters, add the following line:

      namenode.example.com
    

  17. In $HADOOP_CONF_DIR/slaves, add the following lines:

      datanode1.example.com
      datanode2.example.com
      datanode3.example.com
      datanode4.example.com
    

  18. Create a ~/.ssh/config file to avoid host key checking for SSH. Since these are DEV servers, this is ok. Note that the indentation here before StrictHostKeyChecking must be a tab.

      Host *
            StrictHostKeyChecking no
    

  19. Now run these commands to finish the password-less authentication.

    chmod 600 ~/.ssh/config

    sudo hostname namenode.example.com

    ssh-keygen -f ~/.ssh/id_rsa -t rsa -P ""

    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

  20. Exit the SSH session and copy the VM for the other hadoop nodes.

    exit

    vagrant halt

    vagrant package

    vagrant box add hadoop ~/vagrant_boxes/hadoop/package.box

  21. Edit the Vagrantfile to look like the following below. This will create 5 Hadoop nodes for us using the new Hadoop VM.

      Vagrant.configure("2") do |config|
        config.vm.define "hadoop-namenode" do |node|
          node.vm.box = "hadoop"
          node.vm.box_url = "namenode.example.com"
          node.vm.hostname = "namenode.example.com"
          node.vm.network :private_network, ip: "192.168.50.21"
          node.ssh.insert_key = false
    
          # Start Hadoop
          node.vm.provision "shell", inline: "hdfs namenode -format -force", privileged: false
          node.vm.provision "shell", inline: "~/hadoop-2.7.3/sbin/start-dfs.sh", privileged: false
          node.vm.provision "shell", inline: "~/hadoop-2.7.3/sbin/start-yarn.sh", privileged: false
          node.vm.provision "shell", inline: "~/hadoop-2.7.3/sbin/mr-jobhistory-daemon.sh start historyserver", privileged: false
        end
    
        (1..4).each do |i|
          config.vm.define "hadoop-datanode#{i}" do |node|
            node.vm.box = "hadoop"
            node.vm.box_url = "datanode#{i}.example.com"
            node.vm.hostname = "datanode#{i}.example.com"
            node.vm.network :private_network, ip: "192.168.50.2#{i+1}"
            node.ssh.insert_key = false
          end
        end
      end
    

  22. Bring the new Vagrant VMs up.

    vagrant up --no-provision

  23. Start Hadoop up on the namenode.

    vagrant provision

To test to see if the Hadoop is working, you can do the following.

First, from you local machine, you should be able to access the Web UI (http://192.168.50.21:50070/). You should see 3 live nodes running. Follow the MapReduce Tutorial to test your cluster further.

The main overview screen of the hadoop admin console.
The main overview screen of the hadoop admin console.

How to setup a basic ZooKeeper ensemble

This article is a guide to setup a ZooKeeper ensemble. I use this to have a local environment for development and testing.

For Zookeeper to work, you really only need to configure a couple of things. The first is the zoo.cfg file, and the second is a myid file in the dataDir. See this link for more info.

Prerequisites

It assumes you are using the following software versions.

  • MacOS 10.11.3
  • Vagrant 1.8.5
  • Java 1.8.0
  • Zookeeper 3.4.8

Here are the steps

  1. First, create a workspace.

    mkdir -p ~/vagrant_boxes/zookeeper

    cd ~/vagrant_boxes/zookeeper

  2. Next, create a new vagrant box. I'm using a minimal CentOS vagrant box.

    vagrant box add "CentOS 6.5 x86_64" https://github.com/2creatives/vagrant-centos/releases/download/v6.5.3/centos65-x86_64-20140116.box

  3. We are going to create a vagrant box with the packages we need. So, first we initialize the vagrant box.

    vagrant init -m "CentOS 6.5 x86_64" zoo_base

  4. Next, change the Vagrantfile to the following:

      Vagrant.configure(2) do |config|
        config.vm.box = "CentOS 6.5 x86_64"
        config.vm.box_url = "zoo_base"
        config.ssh.insert_key = false
      end
    

  5. Now, install Zookeeper and it's dependencies.

    vagrant up

    vagrant ssh

    sudo yum install java-1.8.0-openjdk-devel

    sudo yum install wget

    wget http://apache-mirror.rbc.ru/pub/apache/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz ~

    gunzip -c *gz | tar xvf -

  6. Open up your ~/.bash_profile and append the following lines.

      export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64
      export PATH=$PATH:$JAVA_HOME/bin
      export ZOOKEEPER_HOME=~/zookeeper-3.4.8
      export PATH=$PATH:$ZOOKEEPER_HOME/bin
      export ZOOKEEPER_CONF_DIR=$ZOOKEEPER_HOME/conf
    

  7. Source the profile.

    source ~/.bash_profile

  8. Create a ~/zookeeper-3.4.8/conf/zoo.cfg file with the following contents.

      tickTime=2000
      dataDir=/tmp/zookeeper/
      clientPort=2181
      initLimit=5
      syncLimit=2
      server.1=192.168.50.11:2888:3888
      server.2=192.168.50.12:2888:3888
      server.3=192.168.50.13:2888:3888
      server.4=192.168.50.14:2888:3888
      server.5=192.168.50.15:2888:3888
    

  9. Exit the SSH session and copy the VM for the other zookeeper nodes.

    exit

    vagrant halt

    vagrant package

    vagrant box add zookeeper ~/vagrant_boxes/zookeeper/package.box

  10. Edit the Vagrantfile to look like the following below. This will create 5 zookeeper nodes for us using the new Zookeeper VM.

      Vagrant.configure("2") do |config|
        (1..5).each do |i|
          config.vm.define "zoo#{i}" do |node|
            node.vm.box = "zookeeper"
            node.vm.box_url = "zoo#{i}"
            node.vm.hostname = "zoo#{i}"
            node.vm.network :private_network, ip: "192.168.50.1#{i}"
    
            # Zookeeper needs an ID file for each node
            node.vm.provision "shell", inline: "mkdir -p /tmp/zookeeper; echo '#{i}' >> /tmp/zookeeper/myid", privileged: false
    
            # Start Zookeeper
            node.vm.provision "shell", inline: "~/zookeeper-3.4.8/bin/zkServer.sh start", privileged: false
    
            node.ssh.insert_key = false
          end
        end
      end
    

  11. Bring the new Vagrant VMs up.

    vagrant up --no-provision

    vagrant provision

Running ZooKeeper

To test to see if the Zookeeper works, you can do the following.

  1. SSH into zoo1.

    vagrant ssh zoo1

  2. Start Zookeeper CLI.

    ~/zookeeper-3.4.8/bin/zkCli.sh -server 192.168.50.11:2181

  3. Create a new znode and associates the string "my_data" with the node.

    create /zk_test my_data

  4. Now exit the CLI and SSH session and log into zoo4.

    quit

    exit

    vagrant ssh zoo4

  5. Connect to the Zookeeper CLI again (notice the IP changed).

    ~/zookeeper-3.4.8/bin/zkCli.sh -server 192.168.50.14:2181

  6. You should be able to see the /zk_test znode with an ls command (it should look like so: "[zookeeper, zk_test]")

    ls /
    [zookeeper, zk_test]