How to Setup an Apache Cassandra Cluster

This article is a guide to setup an Apache Cassandra cluster. The cluster runs on local CentOS virtual machines using Virtualbox. I use this to have a local environment for development and testing.

Prerequisites

It assumes you are using the following "software" versions.

  • MacOS 10.11.3
  • Vagrant 1.8.5
  • Java 1.8.0
  • Apache Cassandra 1.2.2

Here are the steps I used:

  1. First, create a workspace.

    mkdir -p ~/vagrant_boxes/cassandra

    cd ~/vagrant_boxes/cassandra

  2. Next, create a new vagrant box. I'm using a minimal CentOS vagrant box.

    vagrant box add "CentOS 6.5 x86_64" https://github.com/2creatives/vagrant-centos/releases/download/v6.5.3/centos65-x86_64-20140116.box

  3. I need a basic VM to install the packages. This command creates one.

    vagrant init -m "CentOS 6.5 x86_64" cassandra_base

  4. Next, change the Vagrantfile to the following:

      Vagrant.configure(2) do |config|
        config.vm.box = "CentOS 6.5 x86_64"
        config.vm.box_url = "cassandra_base"
        config.ssh.insert_key = false
      end
    

  5. Now, install Cassandra and it's dependencies.

    vagrant up

    vagrant ssh

    sudo yum install java-1.8.0-openjdk-devel

    sudo yum install wget

    wget http://www-us.apache.org/dist/cassandra/3.9/apache-cassandra-3.9-bin.tar.gz ~

    gunzip -c *gz | tar xvf -

  6. Open up your ~/.bash_profile and append the following lines.

      export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64
      export PATH=$PATH:$JAVA_HOME/bin
      export CASSANDRA_HOME=~/apache-cassandra-3.9
      export PATH=$PATH:$CASSANDRA_HOME/bin
      export CASSANDRA_CONF_DIR=$CASSANDRA_HOME/conf
    

  7. Source the profile.

    source ~/.bash_profile

  8. Create a ~/.ssh/config file to avoid host key checking for SSH. Since these are DEV servers, this is ok. Note that the indentation here before StrictHostKeyChecking must be a tab.

      Host *
            StrictHostKeyChecking no
    

  9. Now run these commands to finish the password-less authentication.

    chmod 600 ~/.ssh/config

    ssh-keygen -f ~/.ssh/id_rsa -t rsa -P ""

    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

  10. In /etc/hosts, add the following lines.

      192.168.50.41 cassandra1.example.com
      192.168.50.42 cassandra2.example.com
      192.168.50.43 cassandra3.example.com
      192.168.50.44 cassandra4.example.com
      192.168.50.45 cassandra5.example.com
    

  11. In ~/apache-cassandra-3.9/conf/cassandra.yaml, comment out the line below. This will tell Cassandra to bind to the IP Address and allow our host machine to connect to it.

    rpc_address: localhost

  12. It's a good idea to install Python 2.7 here, since the CQL shell (cqlsh) command requires that version. The image of CentOS we are using has Python 2.6. Follow the instructions here to update Python.

  13. Exit the SSH session and copy the VM for the other Cassandra nodes.

    exit

    vagrant halt

    vagrant package

    vagrant box add cassandra ~/vagrant_boxes/cassandra/package.box

  14. Edit the Vagrantfile to look like the following below. This will create 5 Cassandra nodes.

      Vagrant.configure("2") do |config|
        (1..5).each do |i|
          config.vm.define "cassandra#{i}" do |node|
            node.vm.box = "cassandra"
            node.vm.box_url = "cassandra#{i}.example.com"
            node.vm.hostname = "cassandra#{i}.example.com"
            node.vm.network :private_network, ip: "192.168.50.4#{i}"
            node.ssh.insert_key = false
    
            # Replace the "listen_address" line in the conf file.
            node.vm.provision "shell", inline: "sed -i 's/^cluster_name: .*/cluster_name: \"My Cluster\"/' ~/apache-cassandra-3.9/conf/cassandra.yaml", privileged: false
            node.vm.provision "shell", inline: "sed -i 's/- seeds: .*/- seeds: \"192.168.50.41\"/' ~/apache-cassandra-3.9/conf/cassandra.yaml", privileged: false
            node.vm.provision "shell", inline: "sed -i 's/listen_address: .*/listen_address: \"192.168.50.4#{i}\"/' ~/apache-cassandra-3.9/conf/cassandra.yaml", privileged: false
          end
        end
      end
    

  15. Bring the new Vagrant VMs up.

    vagrant up --no-provision

    vagrant provision

  16. Start Cassandra. It's tricky because each node has to start up completely so it can join the cluster. I ssh into each VM individually and start Cassandra and wait about 60 seconds until it has completely started.

    vagrant ssh cassandra1

    ~/apache-cassandra-3.9/bin/cassandra

    # Wait 60 seconds

    exit

    vagrant ssh cassandra2

    ~/apache-cassandra-3.9/bin/cassandra

    # Wait 60 seconds

    exit

    vagrant ssh cassandra3

    ~/apache-cassandra-3.9/bin/cassandra

    # Wait 60 seconds

    exit

    vagrant ssh cassandra4

    ~/apache-cassandra-3.9/bin/cassandra

    # Wait 60 seconds

    exit

    vagrant ssh cassandra5

    ~/apache-cassandra-3.9/bin/cassandra

    # Wait 60 seconds

    exit

To verify Cassandra

  1. Verify that Cassandra is running correctly.

    vagrant ssh cassandra1

    ~/apache-cassandra-3.9/bin/nodetool status
    Datacenter: datacenter1
    =======================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    -- Address Load Tokens Owns (effective) Host ID Rack
    UN 192.168.50.41 108.49 KiB 256 38.0% 74b9bb75-e941-4e4c-9900-0b30c7b204b4 rack1
    UN 192.168.50.42 108.2 KiB 256 41.8% 8ee92d98-2a83-491d-b39f-fc3263217dc7 rack1
    UN 192.168.50.43 84.1 KiB 256 41.0% f30a37fd-8763-4f11-a4be-7f413e062bd9 rack1
    UN 192.168.50.44 115.62 KiB 256 38.8% 15356519-30e0-435e-a865-47abfbef5a81 rack1
    UN 192.168.50.45 96.62 KiB 256 40.4% 213e082a-92fd-4948-84be-fe941a0801b1 rack1

Leave a Reply