Installing Datastax Cassandra and Python Driver on CentOS 5


Cassandra Logo

Cassandra can run on CentOS 5.x, but there is no yum repo support.

If you can’t upgrade linux distros, here’s how to install Datastax Cassandra Community Edition and the python cassandra driver on CentOS 5.x.

It’s not difficult, but there’s several steps, including updating java.

(The following steps would make a complete chef or puppet recipe for a non-SSL install with vnodes.)


# setup environment
groupadd -g 602 cassandra
useradd -u 602 -g cassandra -m -s /sbin/nologin cassandra
mkdir /var/lib/cassandra /var/log/cassandra /var/run/cassandra
touch /var/log/cassandra/system.log
chown -R cassandra:cassandra /var/lib/cassandra /var/log/cassandra /var/run/cassandra
mkdir -p /opt && cd /opt


cat >> /etc/security/limits.conf <<EOD
cassandra soft memlock unlimited
cassandra hard memlock unlimited
cassandra soft nofile 8192
cassandra hard nofile 10240
EOD


# upgrade java
yum remove java
# download, then install JDK 7.x from oracle.com
rpm -Uvh jdk-7u67-linux-x64.rpm
# download, then install recent jna.jar from https://github.com/twall/jna
mv jna.jar /usr/share/java
ln -s /usr/share/java/jna.jar /opt/cassandra/lib/
# update envariables
cat >> /etc/profile <<"EOD"
export JAVA_HOME=/usr/java/default
export JRE_HOME=/usr/java/default/jre
export CASSANDRA_HOME=/opt/cassandra
export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:$CASSANDRA_HOME/bin
EOD


# get Datastax DCE
curl -L http://downloads.datastax.com/community/dsc.tar.gz >dsc-cassandra-2.0.9.tar.gz
tar zxvf - < dsc-cassandra-2.0.9.tar.gz ln -s /opt/dsc-cassandra-2.0.9 /opt/cassandra chown -R root:root /opt/cassandra/ bash cassandra/switch_snappy 1.0.4

# open cassandra firewall ports if necessary (not needed if using internal interface on most servers)
vi /etc/sysconfig/iptables
-A INPUT -i eth0 -m state --state NEW -m multiport -p tcp --dport 7000,7199,9042,9160 -j ACCEPT
service iptables restart
# configure /opt/cassandra/conf/cassandra.yaml (at least listen_address, rpc_address, seeds and tokens before starting server. If you need a do-over, clean the cassandra data with # rm -fr /var/lib/cassandra/*)

# download startup script:
wget http://jebriggs.com/php/start_cassandra.txt -O /etc/init.d/cassandra
chown root:root /etc/init.d/cassandra
chmod 755 /etc/init.d/cassandra
chkconfig --add cassandra

# start cassandra server (if it is standalone, or a seed server. otherwise start after the seed servers):
service cassandra start

# cat /etc/redhat-release 
CentOS release 5.10 (Final)

[root@www1 conf]# nodetool status
Datacenter: datacenter1
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address   Load       Tokens  Owns   Host ID                               Rack
UN  10.0.1.2  71.87 KB   256     66.8%  8302c6d5-4c88-4695-bbf4-762bc7f24544  rack1
UN  10.0.1.3  136.63 KB  256     69.9%  eddb03b2-98d3-46ff-be63-95435414a883  rack1
UN  10.0.1.4  100.08 KB  256     63.3%  2a8dde5e-29b0-4a67-8204-40769376c44a  rack1

If you only see the node on localhost, then you have a problem:

  • read and fix any errors in /var/log/cassandra/system.log until there are zero errors. snappy-related errors are from /tmp being noexec or not running the switch_snappy 1.0.4 command above.
  • disable iptables firewall, test and reenable later
  • in log4j-server.properties, increase log4j.rootLogger to DEBUG
  • if you have multiple NICs, JMX (ie. nodetool) can bind to the wrong interface. You likely need to configure the-Djava.rmi.server.hostname=[address] option in cassandra-env.sh - to the address you want to listen on
  • public/private IP address problems in AWS EC2. You may need to set broadcast_address: [public_ec2_address]
  • normally rmiregistry is not needed unless you have some atypical firewalling or routing (NAT.)

Datastax Opscenter 5.0

You can install the binary from yum or tarball, but the important things to know are:

  • the monitoring agent will be installed on each cassandra node and uses port 61621. The init script is called datastax-agent.
  • the UI only needs to be installed once, but needs ports 61620, and 8888 for HTTP.
  • to allow Opscenter to remotely manage nodes with ssh, remove old ssh entries from .ssh/known_hosts first, connect manually to each node, then Opscenter should be happy
  • by default, Opscenter listens for agents on 0.0.0.0, phones home to Datastax.com each day, and does not require web authentication, so you likely want to change those.

Python also needs to be upgraded if you want to use cqlsh or the python client cassandra driver.


# install python 2.6 and dependencies
yum install gcc python26 python26-devel libev libev-devel


# install python's pip module
curl --silent --show-error --retry 5 https://bootstrap.pypa.io/get-pip.py | python26


# install cassandra driver for python
pip install cassandra-driver


# install blist.py
tar zxvf - < blist-1.3.6.tar.gz cd blist-1.3.6 python26 setup.py install cd ..

# cluster.py - test installation

from cassandra.cluster import Cluster

cluster = Cluster(['127.0.0.1'])

def dump(obj):
   for attr in dir(obj):
       if hasattr( obj, attr ):
           print( "obj.%s = %s" % (attr, getattr(obj, attr)))

dump(cluster);
# python26 cluster.py

obj.__class__ = <class 'cassandra.cluster.Cluster'>
[...]

Troubleshooting connection problems in JConsole
datastax.com: Storing OpsCenter Data in a Separate Cluster

This entry was posted in Cassandra, Cloud, Linux, Open Source, Tech. Bookmark the permalink.

Leave a Reply

Your email address will not be published.

This site uses Akismet to reduce spam. Learn how your comment data is processed.