How to Setup an Apache Cassandra Cluster

This article is a guide to setup an Apache Cassandra cluster. The cluster runs on local CentOS virtual machines using Virtualbox. I use this to have a local environment for development and testing.

Prerequisites

It assumes you are using the following "software" versions.

  • MacOS 10.11.3
  • Vagrant 1.8.5
  • Java 1.8.0
  • Apache Cassandra 1.2.2

Here are the steps I used:

  1. First, create a workspace.

    mkdir -p ~/vagrant_boxes/cassandra

    cd ~/vagrant_boxes/cassandra

  2. Next, create a new vagrant box. I'm using a minimal CentOS vagrant box.

    vagrant box add "CentOS 6.5 x86_64" https://github.com/2creatives/vagrant-centos/releases/download/v6.5.3/centos65-x86_64-20140116.box

  3. I need a basic VM to install the packages. This command creates one.

    vagrant init -m "CentOS 6.5 x86_64" cassandra_base

  4. Next, change the Vagrantfile to the following:

      Vagrant.configure(2) do |config|
        config.vm.box = "CentOS 6.5 x86_64"
        config.vm.box_url = "cassandra_base"
        config.ssh.insert_key = false
      end
    

  5. Now, install Cassandra and it's dependencies.

    vagrant up

    vagrant ssh

    sudo yum install java-1.8.0-openjdk-devel

    sudo yum install wget

    wget http://www-us.apache.org/dist/cassandra/3.9/apache-cassandra-3.9-bin.tar.gz ~

    gunzip -c *gz | tar xvf -

  6. Open up your ~/.bash_profile and append the following lines.

      export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64
      export PATH=$PATH:$JAVA_HOME/bin
      export CASSANDRA_HOME=~/apache-cassandra-3.9
      export PATH=$PATH:$CASSANDRA_HOME/bin
      export CASSANDRA_CONF_DIR=$CASSANDRA_HOME/conf
    

  7. Source the profile.

    source ~/.bash_profile

  8. Create a ~/.ssh/config file to avoid host key checking for SSH. Since these are DEV servers, this is ok. Note that the indentation here before StrictHostKeyChecking must be a tab.

      Host *
            StrictHostKeyChecking no
    

  9. Now run these commands to finish the password-less authentication.

    chmod 600 ~/.ssh/config

    ssh-keygen -f ~/.ssh/id_rsa -t rsa -P ""

    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

  10. In /etc/hosts, add the following lines.

      192.168.50.41 cassandra1.example.com
      192.168.50.42 cassandra2.example.com
      192.168.50.43 cassandra3.example.com
      192.168.50.44 cassandra4.example.com
      192.168.50.45 cassandra5.example.com
    

  11. In ~/apache-cassandra-3.9/conf/cassandra.yaml, comment out the line below. This will tell Cassandra to bind to the IP Address and allow our host machine to connect to it.

    rpc_address: localhost

  12. It's a good idea to install Python 2.7 here, since the CQL shell (cqlsh) command requires that version. The image of CentOS we are using has Python 2.6. Follow the instructions here to update Python.

  13. Exit the SSH session and copy the VM for the other Cassandra nodes.

    exit

    vagrant halt

    vagrant package

    vagrant box add cassandra ~/vagrant_boxes/cassandra/package.box

  14. Edit the Vagrantfile to look like the following below. This will create 5 Cassandra nodes.

      Vagrant.configure("2") do |config|
        (1..5).each do |i|
          config.vm.define "cassandra#{i}" do |node|
            node.vm.box = "cassandra"
            node.vm.box_url = "cassandra#{i}.example.com"
            node.vm.hostname = "cassandra#{i}.example.com"
            node.vm.network :private_network, ip: "192.168.50.4#{i}"
            node.ssh.insert_key = false
    
            # Replace the "listen_address" line in the conf file.
            node.vm.provision "shell", inline: "sed -i 's/^cluster_name: .*/cluster_name: \"My Cluster\"/' ~/apache-cassandra-3.9/conf/cassandra.yaml", privileged: false
            node.vm.provision "shell", inline: "sed -i 's/- seeds: .*/- seeds: \"192.168.50.41\"/' ~/apache-cassandra-3.9/conf/cassandra.yaml", privileged: false
            node.vm.provision "shell", inline: "sed -i 's/listen_address: .*/listen_address: \"192.168.50.4#{i}\"/' ~/apache-cassandra-3.9/conf/cassandra.yaml", privileged: false
          end
        end
      end
    

  15. Bring the new Vagrant VMs up.

    vagrant up --no-provision

    vagrant provision

  16. Start Cassandra. It's tricky because each node has to start up completely so it can join the cluster. I ssh into each VM individually and start Cassandra and wait about 60 seconds until it has completely started.

    vagrant ssh cassandra1

    ~/apache-cassandra-3.9/bin/cassandra

    # Wait 60 seconds

    exit

    vagrant ssh cassandra2

    ~/apache-cassandra-3.9/bin/cassandra

    # Wait 60 seconds

    exit

    vagrant ssh cassandra3

    ~/apache-cassandra-3.9/bin/cassandra

    # Wait 60 seconds

    exit

    vagrant ssh cassandra4

    ~/apache-cassandra-3.9/bin/cassandra

    # Wait 60 seconds

    exit

    vagrant ssh cassandra5

    ~/apache-cassandra-3.9/bin/cassandra

    # Wait 60 seconds

    exit

To verify Cassandra

  1. Verify that Cassandra is running correctly.

    vagrant ssh cassandra1

    ~/apache-cassandra-3.9/bin/nodetool status
    Datacenter: datacenter1
    =======================
    Status=Up/Down
    |/ State=Normal/Leaving/Joining/Moving
    -- Address Load Tokens Owns (effective) Host ID Rack
    UN 192.168.50.41 108.49 KiB 256 38.0% 74b9bb75-e941-4e4c-9900-0b30c7b204b4 rack1
    UN 192.168.50.42 108.2 KiB 256 41.8% 8ee92d98-2a83-491d-b39f-fc3263217dc7 rack1
    UN 192.168.50.43 84.1 KiB 256 41.0% f30a37fd-8763-4f11-a4be-7f413e062bd9 rack1
    UN 192.168.50.44 115.62 KiB 256 38.8% 15356519-30e0-435e-a865-47abfbef5a81 rack1
    UN 192.168.50.45 96.62 KiB 256 40.4% 213e082a-92fd-4948-84be-fe941a0801b1 rack1

Setting up Python on Linux and using Virtualenv

I am writing this because I found out that there are a couple of different ways to install Python on a *nix based computer. I ran around in circles for a while before I figured it out.

Basically, I realized this. In order to have multiple versions of Python

A) Install the Python version and then alias to it.

B) Use something like virtualenv to allow me to switch between versions.

After reading this post, virtualenv seemed like a powerful option. It seems to follow the similar pattern of nvm and sdkman, which are really great tools for developers that want to rapidly switch between different projects.

So, here is how I install Python now.

This is on a CentOS 6.5 machine which has Python 2.6 installed on it.

  1. From the command prompty, install the libraries needed for Python.

    sudo yum install zlib-devel openssl openssl-devel

  2. Install wget.

    sudo yum install wget

  3. Install Python 2.7. I use the --prefix command to install it to an isolated location.

    cd ~

    wget http://www.python.org/ftp/python/2.7.8/Python-2.7.8.tar.xz

    xz -d Python-2.7.8.tar.xz

    tar -xvf Python-2.7.8.tar

    cd ~/Python-2.7.8

    ./configure --prefix=/opt/python2.7

    make

    sudo make altinstall

  4. Now, create an alias in your ~/.bash_profile to the new version of Python. Add the following line to the bottom of the file.

      alias python="/opt/python2.7/bin/python2.7"
    

  5. Source the profile.

    source ~/.bash_profile

Now, when you type python at the command prompt, it should bring you into the 2.7 shell and not the 2.6. The downside here is all of your python packages will be shared. You risk running the wrong package when you update Python.

Virtualenv

Using virtualenv, you could skip the step of having to mess with the aliases. Another benefit is the pip packages stay isolated from the other versions.

So, from the last step, here's how we can install virtualenv.

  1. In the ~/.bash_profile, remove the alias for python that we just added.

  2. Source the profile.

    source ~/.bash_profile

  3. Install virtualenv.

    sudo yum install python-pip

    sudo pip install virtualenv

  4. Create the virtual environment.

    virtualenv --system-site-packages -p /opt/python2.7/bin/python2.7 ~/python-vms/python2.7

    cd ~/python-vms/python2.7

    source bin/activate

How to setup a basic SpringBoot application

I spent some time trying to figure out how SpringBoot worked recently. I thought I would document the steps I took to get a project up and running.

Here is the final repository in Github.

Setup Steps

  1. First, create a workspace.

    mkdir -p ~/projects

    cd ~/projects

    mvn archetype:generate -DgroupId=com.tonyzampogna -DartifactId=base-springboot-app -DarchetypeArtifactId=maven-archetype-quickstart -DinteractiveMode=false

    cd ~/projects/base-springboot-app

  2. In the pom.xml, add the Spring Boot dependencies. First, add the parent block. I put it above the dependencies block.

      <parent>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-parent</artifactId>
        <version>1.4.0.RELEASE</version>
        <relativePath/> <!-- lookup parent from repository -->
      </parent>
    

  3. Then, in the dependencies block, add the following dependencies.

      <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
      </dependency>
    
      <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <scope>test</scope>
      </dependency>
    

  4. Add the plugin for building SpringBoot with maven. This goes either above or below the dependencies block. This solves an error stating: "no main manifest attribute".

      <build>
        <plugins>
          <plugin>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-maven-plugin</artifactId>
            <executions>
              <execution>
                <goals>
                  <goal>repackage</goal>
                </goals>
              </execution>
            </executions>
          </plugin>
        </plugins>
      </build>
    

  5. Move the App.java file in src/main/java/com/tonyzampogna to a new package called "main".

    mkdir -p src/main/java/com/tonyzampogna/main

    mv src/main/java/com/tonyzampogna/App.java src/main/java/com/tonyzampogna/main

  6. Replace the contents of the App.java file with the text below.

      package com.tonyzampogna.main;
    
      import org.slf4j.Logger;
      import org.slf4j.LoggerFactory;
      import org.springframework.boot.SpringApplication;
      import org.springframework.boot.autoconfigure.SpringBootApplication;
      import org.springframework.context.annotation.ComponentScan;
    
      /**
       * Main application file.
       */
      @SpringBootApplication
      @ComponentScan("com.tonyzampogna")
      public class App {
        private static final Logger logger = LoggerFactory.getLogger(App.class);
    
        /////////////////////////////////////////////////
        // Application Start Methods
        /////////////////////////////////////////////////
    
        /**
         * This method runs on applications start.
         */
        public static void main(String[] args) {
          logger.info("Starting application");
          SpringApplication.run(App.class, args);
        }
      }
    

  7. Make a controller so that we can test our server. First, create the package directory.

    mkdir -p src/main/java/com/tonyzampogna/controller

  8. Then, add HomeController.java code to the controller package.

      package com.tonyzampogna.controller;
    
      import org.springframework.web.bind.annotation.RequestMapping;
      import org.springframework.web.bind.annotation.RestController;
    
      @RestController
      public class HomeController {
    
        @RequestMapping("/home")
        public String getHome() {
          return "Hello";
        }
      }
    

  9. Create a resources directory for a log4j properties file.

    mkdir -p src/main/resources

  10. Add the log4.properties file below to src/main/resources.

      #################################################################
      ## Log Levels
      #################################################################
    
      log4j.rootLogger=INFO, STDOUT
      log4j.logger.com.tonyzampogna=DEBUG, STDOUT, FILE
    
    
      #################################################################
      ## Appenders
      #################################################################
    
      # Console
      log4j.appender.STDOUT=org.apache.log4j.ConsoleAppender
      log4j.appender.STDOUT.layout=org.apache.log4j.PatternLayout
      log4j.appender.STDOUT.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
    
      # File
      log4j.appender.FILE=org.apache.log4j.RollingFileAppender
      log4j.appender.FILE.File=application.log
      log4j.appender.FILE.MaxFileSize=10MB
      log4j.appender.FILE.MaxBackupIndex=10
      log4j.appender.FILE.layout=org.apache.log4j.PatternLayout
      log4j.appender.FILE.layout.ConversionPattern=%d{yyyy-MM-dd HH:mm:ss} %-5p %c{1}:%L - %m%n
    

To run the project (from either Eclipse, Intellij, or the command line):

From the command line

  1. In the project directory, package the jar file and then run it.

    mvn package

    java -jar target/base-springboot-app-1.0-SNAPSHOT.jar

From Eclipse

  1. First, run the maven eclipse target to setup some essential eclipse files.

    mvn eclipse:eclipse

  2. Open Eclipse. Make sure you get the SpringSource Tool Suite from the Eclipse Marketplace. Then, import the project into the workspace as an existing eclipse project.

  3. Now, you can right-click on the project and choose either "Run as SpringBoot application" or "Debug as SpringBoot application".

From Intellij

  1. Open Intellij and choose File -> New Project.

  2. Select Java project and choose Next.

  3. For project name, type in base-springboot-app. For project location, type in ~/projects/base-springboot-app.

  4. In the Run menu, choose Edit Configurations. Click the plus sign and select JAR Application. Fill in the information below.

    • Name: Server
    • Path to JAR: ~/projects/base-springboot-app/target/base-springboot-app-1.0-SNAPSHOT.jar
    • VM Arguments: -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=1044
    • Working directory: ~/projects/base-springboot-app
  5. Select ok.

  6. Now, from the Run menu, you can choose either

How to setup a basic HBase cluster

This article is a guide to setup a HBase cluster. The cluster runs on local CentOS virtual machines using Virtualbox. I use this to have a local environment for development and testing.

Prerequisites

This setup guides assumes you have gone through the Hadoop Setup Guide and the Zookeeper Setup Guide.

It assumes you are using the following software versions.

  • MacOS 10.11.3
  • Vagrant 1.8.5
  • Java 1.8.0
  • Zookeeper 3.4.8
  • Hadoop 2.7.3
  • HBase 1.2.2

Here are the steps I used:

  1. First, create a workspace.

    mkdir -p ~/vagrant_boxes/hbase

    cd ~/vagrant_boxes/hbase

  2. Next, create a new vagrant box. I'm using a minimal CentOS vagrant box.

    vagrant box add "CentOS 6.5 x86_64" https://github.com/2creatives/vagrant-centos/releases/download/v6.5.3/centos65-x86_64-20140116.box

  3. We are going to create a vagrant box with the packages we need. So, first we initialize the vagrant box.

    vagrant init -m "CentOS 6.5 x86_64" hbase_base

  4. Next, change the Vagrantfile to the following:

      Vagrant.configure(2) do |config|
        config.vm.box = "CentOS 6.5 x86_64"
        config.vm.box_url = "hbase_base"
        config.ssh.insert_key = false
      end
    

  5. Now, install HBase and it's dependencies.

    vagrant up

    vagrant ssh

    sudo yum install java-1.8.0-openjdk-devel

    sudo yum install wget

    wget https://www.apache.org/dist/hbase/stable/hbase-1.2.2-bin.tar.gz ~

    gunzip -c *gz | tar xvf -

  6. Open up your ~/.bash_profile and append the following lines.

      export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64
      export PATH=$PATH:$JAVA_HOME/bin
      export HBASE_HOME=~/hbase-1.2.2
      export PATH=$PATH:$HBASE_HOME/bin
      export HBASE_CONF_DIR=$HBASE_HOME/conf
    

  7. Source the profile.

    source ~/.bash_profile

  8. Create a ~/.ssh/config file to avoid host key checking for SSH. Since these are DEV servers, this is ok. Note that the indentation here before StrictHostKeyChecking must be a tab.

      Host *
            StrictHostKeyChecking no
    

  9. Now run these commands to finish the password-less authentication.

    chmod 600 ~/.ssh/config

    ssh-keygen -f ~/.ssh/id_rsa -t rsa -P ""

    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

  10. In /etc/hosts, remove any lines starting with 127.0.*, and add the following lines.

      192.168.50.11 zoo1.example.com
      192.168.50.12 zoo2.example.com
      192.168.50.13 zoo3.example.com
      192.168.50.14 zoo4.example.com
      192.168.50.15 zoo5.example.com
      192.168.50.21 hdfs-namenode.example.com
      192.168.50.22 hdfs-datanode1.example.com
      192.168.50.23 hdfs-datanode2.example.com
      192.168.50.24 hdfs-datanode3.example.com
      192.168.50.25 hdfs-datanode4.example.com
      192.168.50.31 hbase-master.example.com
      192.168.50.32 hbase-region1.example.com
      192.168.50.33 hbase-region2.example.com
      192.168.50.34 hbase-region3.example.com
      192.168.50.35 hbase-region4.example.com
    

  11. In ~/hbase-1.2.2/conf/hbase-env.sh, append the following lines to the bottom of the file.

      export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64
      export HBASE_MANAGES_ZK=false
    

  12. Edit ~/hbase-1.2.2/conf/hbase-site.xml to contain the following:

      <configuration>
        <property>
          <name>hbase.rootdir</name>
          <value>hdfs://hdfs-namenode.example.com:9000/hbase</value>
        </property>
        <property>
          <name>hbase.master.hostname</name>
          <value>hbase-master.example.com</value>
        </property>
        <property>
          <name>hbase.cluster.distributed</name>
          <value>true</value>
        </property>
        <property>
          <name>hbase.zookeeper.quorum</name>
          <value>zoo1.example.com,zoo2.example.com,zoo3.example.com,zoo4.example.com,zoo5.example.com</value>
        </property>
        <property>
          <name>hbase.zookeeper.property.dataDir</name>
          <value>/tmp/zookeeper</value>
        </property>
      </configuration>
    

  13. In ~/hbase-1.2.2/conf/regionservers, remove localhost and add the following lines:

      hbase-region1.example.com
      hbase-region2.example.com
      hbase-region3.example.com
      hbase-region4.example.com
    

  14. The docs say you can create a "backup-masters" file in the conf directory, but I had a problem starting my cluster when I did. So, I skipped this step.

  15. Exit the SSH session and copy the VM for the other hbase nodes.

    exit

    vagrant halt

    vagrant package

    vagrant box add hbase ~/vagrant_boxes/hbase/package.box

  16. Edit the Vagrantfile to look like the following below. This will create 5 hbase nodes for us using the new HBase VM.

      Vagrant.configure("2") do |config|
        config.vm.define "hbase-master" do |node|
          node.vm.box = "hbase"
          node.vm.box_url = "hbase-master.example.com"
          node.vm.hostname = "hbase-master.example.com"
          node.vm.network :private_network, ip: "192.168.50.31"
          node.ssh.insert_key = false
    
          # Change hostname
          node.vm.provision "shell", inline: "hostname hbase-master.example.com", privileged: true
        end
    
        (1..4).each do |i|
          config.vm.define "hbase-region#{i}" do |node|
            node.vm.box = "hbase"
            node.vm.box_url = "hbase-region#{i}.example.com"
            node.vm.hostname = "hbase-region#{i}.example.com"
            node.vm.network :private_network, ip: "192.168.50.3#{i+1}"
            node.ssh.insert_key = false
    
            # Change hostname
            node.vm.provision "shell", inline: "hostname hbase-region#{i}.example.com", privileged: true
          end
        end
      end
    

  17. Bring the new Vagrant VMs up.

    vagrant up --no-provision

  18. Start HBase. For some reason, I can start HBase from the provisioner. So, I ssh in and start it up.

    vagrant provision

    vagrant ssh hbase-master

    ~/hbase-1.2.2/bin/start-hbase.sh

To test the cluster:

  1. Log into the Master Server and run 'jps' on the command line. You should see at least these two process.

    jps
    Jps
    HMaster

  2. Log into one of the Region Servers and run 'jps' on the command line. You should see at least these three processes.

    jps
    Jps
    HMaster
    HRegionServer

  3. Go to http://192.168.50.31:16010/ and you should see all of the Region Servers running.

  4. From the Master Server, start the HBase shell.

    vagrant ssh hbase-master

    sudo ~/hbase-1.2.2/bin/hbase shell

  5. At the command prompt, you should be able to create a table.

    create 'test', 'cf'

  6. And you should be able to list the table.

    list

  7. And you should be able to put date into the table.

    put 'test', 'row1', 'cf:a', 'value1'

    put 'test', 'row2', 'cf:b', 'value2'

    put 'test', 'row3', 'cf:c', 'value3'

  8. And you should be able to view all the data in the table.

    scan 'test'

  9. Or just get one row.

    get 'test', 'row1'

How to setup a basic Hadoop cluster

This article is a guide to setup a Hadoop cluster. The cluster runs on local CentOS virtual machines using Virtualbox. I use this to have a local environment for development and testing.

I followed many of the steps Austin Ouyang laid out in the blog post here. Hopefully, next I can document using moving these virtual machines to another cloud provider.

Prerequisites

It assumes you are using the following software versions.

  • MacOS 10.11.3
  • Vagrant 1.8.5
  • Java 1.8.0 (Using JRE is fine, use the JDK to run MapReduce examples later)
  • Hadoop 2.7.3

Here are the steps I used:

  1. First, create a workspace.

    mkdir -p ~/vagrant_boxes/hadoop

    cd ~/vagrant_boxes/hadoop

  2. Next, create a new vagrant box. I'm using a minimal CentOS vagrant box.

    vagrant box add "CentOS 6.5 x86_64" https://github.com/2creatives/vagrant-centos/releases/download/v6.5.3/centos65-x86_64-20140116.box

  3. We are going to create a vagrant box with the packages we need. So, first we initialize the vagrant box.

    vagrant init -m "CentOS 6.5 x86_64" hadoop_base

  4. Next, change the Vagrantfile to the following:

    Vagrant.configure(2) do |config|
    	config.vm.box = "CentOS 6.5 x86_64"
    	config.vm.box_url = "hadoop_base"
    	config.ssh.insert_key = false
    end
    

  5. Now, install Hadoop and it's dependencies.

    vagrant up

    vagrant ssh

    sudo yum install java-1.8.0-openjdk-devel

    sudo yum install wget

    wget http://apache.claz.org/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz ~

    gunzip -c *gz | tar xvf -

  6. Open up your ~/.bash_profile and append the following lines.

      export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64
      export PATH=$PATH:$JAVA_HOME/bin
      export HADOOP_HOME=~/hadoop-2.7.3
      export PATH=$PATH:$HADOOP_HOME/bin
      export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
    

  7. Source the profile.

    source ~/.bash_profile

  8. In /etc/hosts, add the following lines:

      192.168.50.21 namenode.example.com
      192.168.50.22 datanode1.example.com
      192.168.50.23 datanode2.example.com
      192.168.50.24 datanode3.example.com
      192.168.50.25 datanode4.example.com
    

  9. In $HADOOP_CONF_DIR/hadoop-env.sh, replace the ${JAVA_HOME} variable.

      # The java implementation to use.
      export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64
    

  10. Edit the $HADOOP_CONF_DIR/core-site.xml file to have the following XML:

      <configuration>
        <property>
          <name>fs.defaultFS</name>
          <value>hdfs://namenode.example.com:9000</value>
        </property>
      </configuration>
    

  11. Edit the $HADOOP_CONF_DIR/yarn-site.xml file to have the following XML:

      <configuration>
        <property>
          <name>yarn.nodemanager.aux-services</name>
          <value>mapreduce_shuffle</value>
        </property>
        <property>
          <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
          <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
          <name>yarn.resourcemanager.hostname</name>
          <value>namenode.example.com</value>
        </property>
      </configuration>
    

  12. Now, copy the mapred-site.xml file from a template.

    cp $HADOOP_CONF_DIR/mapred-site.xml.template $HADOOP_CONF_DIR/mapred-site.xml

  13. Edit the $HADOOP_CONF_DIR/mapred-site.xml to have the following XML:

      <configuration>
        <property>
          <name>mapreduce.jobtracker.address</name>
          <value>namenode.example.com:54311</value>
        </property>
        <property>
          <name>mapreduce.framework.name</name>
          <value>yarn</value>
        </property>
      </configuration>
    

  14. Edit the $HADOOP_CONF_DIR/hdfs-site.xml file to have the following XML:

      <configuration>
        <property>
          <name>dfs.replication</name>
          <value>3</value>
        </property>
        <property>
          <name>dfs.namenode.name.dir</name>
          <value>/data/hadoop/hdfs/namenode</value>
        </property>
        <property>
          <name>dfs.datanode.data.dir</name>
          <value>/data/hadoop/hdfs/datanode</value>
        </property>
      </configuration>
    

  15. Make the data directories.

    sudo mkdir -p /data/hadoop/hdfs/namenode

    sudo mkdir -p /data/hadoop/hdfs/datanode

    sudo chown -R vagrant:vagrant /data/hadoop

  16. In $HADOOP_CONF_DIR/masters, add the following line:

      namenode.example.com
    

  17. In $HADOOP_CONF_DIR/slaves, add the following lines:

      datanode1.example.com
      datanode2.example.com
      datanode3.example.com
      datanode4.example.com
    

  18. Create a ~/.ssh/config file to avoid host key checking for SSH. Since these are DEV servers, this is ok. Note that the indentation here before StrictHostKeyChecking must be a tab.

      Host *
            StrictHostKeyChecking no
    

  19. Now run these commands to finish the password-less authentication.

    chmod 600 ~/.ssh/config

    sudo hostname namenode.example.com

    ssh-keygen -f ~/.ssh/id_rsa -t rsa -P ""

    cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

  20. Exit the SSH session and copy the VM for the other hadoop nodes.

    exit

    vagrant halt

    vagrant package

    vagrant box add hadoop ~/vagrant_boxes/hadoop/package.box

  21. Edit the Vagrantfile to look like the following below. This will create 5 Hadoop nodes for us using the new Hadoop VM.

      Vagrant.configure("2") do |config|
        config.vm.define "hadoop-namenode" do |node|
          node.vm.box = "hadoop"
          node.vm.box_url = "namenode.example.com"
          node.vm.hostname = "namenode.example.com"
          node.vm.network :private_network, ip: "192.168.50.21"
          node.ssh.insert_key = false
    
          # Start Hadoop
          node.vm.provision "shell", inline: "hdfs namenode -format -force", privileged: false
          node.vm.provision "shell", inline: "~/hadoop-2.7.3/sbin/start-dfs.sh", privileged: false
          node.vm.provision "shell", inline: "~/hadoop-2.7.3/sbin/start-yarn.sh", privileged: false
          node.vm.provision "shell", inline: "~/hadoop-2.7.3/sbin/mr-jobhistory-daemon.sh start historyserver", privileged: false
        end
    
        (1..4).each do |i|
          config.vm.define "hadoop-datanode#{i}" do |node|
            node.vm.box = "hadoop"
            node.vm.box_url = "datanode#{i}.example.com"
            node.vm.hostname = "datanode#{i}.example.com"
            node.vm.network :private_network, ip: "192.168.50.2#{i+1}"
            node.ssh.insert_key = false
          end
        end
      end
    

  22. Bring the new Vagrant VMs up.

    vagrant up --no-provision

  23. Start Hadoop up on the namenode.

    vagrant provision

To test to see if the Hadoop is working, you can do the following.

First, from you local machine, you should be able to access the Web UI (http://192.168.50.21:50070/). You should see 3 live nodes running. Follow the MapReduce Tutorial to test your cluster further.

The main overview screen of the hadoop admin console.
The main overview screen of the hadoop admin console.

How to setup a basic ZooKeeper ensemble

This article is a guide to setup a ZooKeeper ensemble. I use this to have a local environment for development and testing.

For Zookeeper to work, you really only need to configure a couple of things. The first is the zoo.cfg file, and the second is a myid file in the dataDir. See this link for more info.

Prerequisites

It assumes you are using the following software versions.

  • MacOS 10.11.3
  • Vagrant 1.8.5
  • Java 1.8.0
  • Zookeeper 3.4.8

Here are the steps

  1. First, create a workspace.

    mkdir -p ~/vagrant_boxes/zookeeper

    cd ~/vagrant_boxes/zookeeper

  2. Next, create a new vagrant box. I'm using a minimal CentOS vagrant box.

    vagrant box add "CentOS 6.5 x86_64" https://github.com/2creatives/vagrant-centos/releases/download/v6.5.3/centos65-x86_64-20140116.box

  3. We are going to create a vagrant box with the packages we need. So, first we initialize the vagrant box.

    vagrant init -m "CentOS 6.5 x86_64" zoo_base

  4. Next, change the Vagrantfile to the following:

      Vagrant.configure(2) do |config|
        config.vm.box = "CentOS 6.5 x86_64"
        config.vm.box_url = "zoo_base"
        config.ssh.insert_key = false
      end
    

  5. Now, install Zookeeper and it's dependencies.

    vagrant up

    vagrant ssh

    sudo yum install java-1.8.0-openjdk-devel

    sudo yum install wget

    wget http://apache-mirror.rbc.ru/pub/apache/zookeeper/zookeeper-3.4.8/zookeeper-3.4.8.tar.gz ~

    gunzip -c *gz | tar xvf -

  6. Open up your ~/.bash_profile and append the following lines.

      export JAVA_HOME=/usr/lib/jvm/java-1.8.0-openjdk.x86_64
      export PATH=$PATH:$JAVA_HOME/bin
      export ZOOKEEPER_HOME=~/zookeeper-3.4.8
      export PATH=$PATH:$ZOOKEEPER_HOME/bin
      export ZOOKEEPER_CONF_DIR=$ZOOKEEPER_HOME/conf
    

  7. Source the profile.

    source ~/.bash_profile

  8. Create a ~/zookeeper-3.4.8/conf/zoo.cfg file with the following contents.

      tickTime=2000
      dataDir=/tmp/zookeeper/
      clientPort=2181
      initLimit=5
      syncLimit=2
      server.1=192.168.50.11:2888:3888
      server.2=192.168.50.12:2888:3888
      server.3=192.168.50.13:2888:3888
      server.4=192.168.50.14:2888:3888
      server.5=192.168.50.15:2888:3888
    

  9. Exit the SSH session and copy the VM for the other zookeeper nodes.

    exit

    vagrant halt

    vagrant package

    vagrant box add zookeeper ~/vagrant_boxes/zookeeper/package.box

  10. Edit the Vagrantfile to look like the following below. This will create 5 zookeeper nodes for us using the new Zookeeper VM.

      Vagrant.configure("2") do |config|
        (1..5).each do |i|
          config.vm.define "zoo#{i}" do |node|
            node.vm.box = "zookeeper"
            node.vm.box_url = "zoo#{i}"
            node.vm.hostname = "zoo#{i}"
            node.vm.network :private_network, ip: "192.168.50.1#{i}"
    
            # Zookeeper needs an ID file for each node
            node.vm.provision "shell", inline: "mkdir -p /tmp/zookeeper; echo '#{i}' >> /tmp/zookeeper/myid", privileged: false
    
            # Start Zookeeper
            node.vm.provision "shell", inline: "~/zookeeper-3.4.8/bin/zkServer.sh start", privileged: false
    
            node.ssh.insert_key = false
          end
        end
      end
    

  11. Bring the new Vagrant VMs up.

    vagrant up --no-provision

    vagrant provision

Running ZooKeeper

To test to see if the Zookeeper works, you can do the following.

  1. SSH into zoo1.

    vagrant ssh zoo1

  2. Start Zookeeper CLI.

    ~/zookeeper-3.4.8/bin/zkCli.sh -server 192.168.50.11:2181

  3. Create a new znode and associates the string "my_data" with the node.

    create /zk_test my_data

  4. Now exit the CLI and SSH session and log into zoo4.

    quit

    exit

    vagrant ssh zoo4

  5. Connect to the Zookeeper CLI again (notice the IP changed).

    ~/zookeeper-3.4.8/bin/zkCli.sh -server 192.168.50.14:2181

  6. You should be able to see the /zk_test znode with an ls command (it should look like so: "[zookeeper, zk_test]")

    ls /
    [zookeeper, zk_test]

iOS Safari Enhancement

I have run into an odd issue with iOS Safari on the iPad recently. In the current application I am working on, we have a list of items that can be dragged and dropped on the page. This works great with the mouse. However, I run into issues when I start throwing the touchstart, touchmove, and touchend events in.

The Problem

When a user starts to move their finger around on the iPad, normally this will cause the iPad to "pan" the window around (assuming there is area to scroll). Let's say the page is long, though, so the window can scroll down.

Apple does not trigger any javascript events for this pan (see the document here). The window just scrolls around. However, if the user happens to touch on an item that is capturing events, one of two things can happen:

1. The touchmove event returns false (or calls preventDefault), in which case the window will not scroll up or down ever when a user swipes on one of those items.

2. The touchemove returns true, in which case the window may scroll up or down with, but if you are "dragging" the item, that will scroll as well.

I have created a jsFiddle here to show you what I mean. List 1 prevents the default behavior, whereas List 2 returns true and allows dragging and scrolling.

Neither of these is preferable.

The Solution

The issue is that I think iOS needs to provide a way of canceling the default pan behavior that a user initiates. I have opened up a bug report with Apple, so I will keep this article posted to in case of any update. Let's hope they do.

I'm not sure why there really isn't any way to control (from the browser) at least a way to cancel this event.

From a psuedo-code perspective, what I would like to do, is allow the user to scroll (pan) around the screen as long as the touch-and-hold event is less than say 500ms. In that case, I will return true (but I won't move the dragged item). After 500ms, though, if the user is still moving around, I would cancel the scroll (pan) event, and then move the dragged item around.

Anyway, for now, we are making a small grab region on the item. However, I think it would be nice to fix this in the future.

XSS Sanitizer Plugin (v0.2) released

Well, after shamefully waiting over a year to do any kind of updates to this plugin, I've finally made some changes and merged in pull requests from others.

Next steps are going to be fix some of the issues. Some great suggestions have come up in the Issues area on Github. In fact, I plan on releasing a patch later today.

I chose version 0.2 after some long debate with myself (Hello, me). I don't really want to call this a 1.0 release quite yet. I think some things like not being able to override the ESAPI.properties file as well as not enough unit tests make this still a beta plugin. I'd love to know if others are using it, too. If so, and people are having success, then maybe a 1.0 release is in order. Until then, there's still some work left to do.

XSS Sanitizer Grails Plugin

Well, earlier this week I published my first Grails plugin. I'm hoping that people will find it useful to add a general security plugin to parse out, and prevent XSS attacks on their website. It's a long way from being done, but I think it's a good start.

It uses OWASP's ESAPI to strip out any unwanted script, iframe, and img tags that come in on the request. It also has the added benefit of doing this in a Java filter (in case you access the request via the HttpRequest) and the Grails "params" attribute.

Next steps are to write tests for each of the potential hacks on http://ha.ckers.org/xss.html to make sure they all pass. Plus, in my opinion, this is just a general replace of all values. There are potentially times when you might want to submit something that falls into one of these categories, and you feel that it's safe to not have to filter it. So, I'd like to allow users to be able to annotate methods to allow/disallow the filter to run give a certain action.

Here's a link to the source code:

https://github.com/tonyzampogna/XssSanitizer

If you would like to install it, just type:

grails install-plugin xss-sanitizer

If you are interested in contributing, please let me know. I'd love to have some collaboration.

How to configure “Cross-cell single sign-on” in WebSphere with Jython

To configure "Cross-cell single sign-on" in the WebSphere 6.1 admin console with a Jython script, you can use the script below. This assumes that you've exported the keys from the server you are going to connect to.

import java.lang.String as jstr
import java.util.Properties as jprops
import java.io as jio
import javax.management as jmgmt

keyfilepassword = "somepassword"

# Import LTPA Keys
AdminConfig.save(); # This needs to happen so you can write to the Security file.
keyFile = "C:/projects/custom-security/was61keys";
fin = jio.FileInputStream(keyFile);
wasdev61keys = jprops();
wasdev61keys.load(fin);
fin.close();
password = jstr(keyfilepassword).getBytes();
securityAdmin = AdminControl.queryNames('*:*,name=SecurityAdmin');
securityObjectName = jmgmt.ObjectName(securityAdmin);
params = [wasdev61keys, password];
signature = ['java.util.Properties', '[B'];
AdminControl.invoke_jmx(securityObjectName, 'importLTPAKeys', params, signature);

# Save Config at the end.
AdminConfig.save();