Friday, December 30, 2011

How to Setup smstools in debian

3G dongle is the only bandwidth service I have when i'm home. (If my neighbor didn't turn on his WIFI connection ). Due to heavy usage of mobile broadband, my 3g Dongle frequently get caught to service disruptions by exceeding the reserved data bundle, Most of the time I had to call 3G service provider to get the connection temporary to do an online payment.

Normally service provider send a SMS when reaching the reserved quota (Probably when reaching 75%). But as I can't use Mobile Partner software in linux, I always missed those SMSs, if I didn't connect my dongle in to a Windows powered machine (As I hardly use Windows OS, most of the time I was in trouble).

As a solution I installed and configured smstools to avoid this mess and getting alone without the Internet.

Install smstools
  1. sudo apt-get install smstools (Install smstools)
  2. dmesg | grep usb (get the device. ex: /dev/ttyUSB0)
  3. sudo vim /etc/smsd.conf (smstools config file)
  4. sudo /etc/init.d/smstools start (Start smstools after changing configurations)
  5. cd  /var/spool/sms/incoming (to see incoming sms)

Configuration  (/etc/smsd.conf)

devices = GSM1
outgoing = /var/spool/sms/outgoing
checked = /var/spool/sms/checked
incoming = /var/spool/sms/incoming
logfile = /var/log/smstools/smsd.log
infofile = /var/run/smstools/smsd.working
pidfile = /var/run/smstools/smsd.pid
outgoing = /var/spool/sms/outgoing
checked = /var/spool/sms/checked
failed = /var/spool/sms/failed
incoming = /var/spool/sms/incoming
sent = /var/spool/sms/sent
receive_before_send = no
autosplit = 3

[GSM1]
device = /dev/ttyUSB0
incoming = yes
baudrate = 19200
memory_start = 1

If you configured the smstools properly you will get sms to  /var/spool/sms/incoming

Test your settings.
  1. Send a test sms to your 3G Dongle from a mobile phone
  2. then ls /var/spool/sms/incoming (if the dongle received the sms, you will see a file, name is slimier to GSM1.AuvV6s  )
  3. vim  /var/spool/sms/incoming/$filename to read the sms
Debug your setings
  1. sudo tail -f /var/log/smstools/smsd.log
Note: I didn't test sms sending through the 3G dongle as my service provider has blocked the facility.


Friday, December 9, 2011

Install Repcached (Memcached Replication) For High-Availability

When you have a dynamic website which handles lots of user queries, as a Web Master my top priorities are to keep the site up and running with minimum downtime (I meant 0 downtime)  and to keep the site healthy to response back to users in very very short time.

So, keeping those 2 main tasks in my mind, I was able to track down the problem which was haunting for sometimes.

It was non optimized queries which ran through our WSO2 Developer Portal . Due to this issue, portal's MySQL load was always high. So the answer was to reduce the MySQL load. 

I used Memcached to minimize database load. Memcached increases the performance and scalability of dynamic MySQL-driven websites by caching data and objects in memory.

Setting up Memcached is fairly simple. You can Install using APT (Debian based) or download the tar and compile it on the server.

After Installing Memcached with MySQL, it gave a good performance boost to our Developer Portal until a node's cache got expired. 

I noticed that some users couldn't upload / attach files to our forum or new article. After some series of testing and debugging sessions, I was able confirm that we had a problem in Memcached while accessing in a cluster enviroment.

The reason was : Lets' say when users are accessing the site in a peak time, first request is severed from node1.Then the second request gets routed to node3 or Node4 (backup nodes) due to high load in the cluster, User Drupal can not access cache objects created during the first request. Because of that, user receives lots of unexpected results.

Then I Installed RepCached to support Replication in Memcached. Repcached helps to keep redundancy memcached system and that was the solution I was looking for.

Installing RepCached
  1. Download the Latest version of repcached from http://repcached.lab.klab.org/
  2. Install some extra packages on Debian (apt-get install libevent-dev g++ make)
  3. Install repcached from tar
  4. Extract files (tar xvf memcached-1.2.8-repcached-2.2.tar)
  5. go to the directory (cd memcached-1.2.8-repcached-2.2/)
  6. Enable replication before install (./configure --enable-replication)
  7. Install (make && make install)
Configure repcache
  1. create the file (vim /etc/memcachedrep)
  2. Create the init Script (vim /etc/init.d/memcachedrep)
  3. chmod +x /etc/init.d/memcacherep
  4. update-rc.d memcachedrep defaults

Copy / Past code1 to step 1
## extra commandline options to start memcached in replicated mode
# -x < ip_addr > hostname or IP address of the master replication server
# -X < num > TCP port number of the master (default: 11212)
DAEMON_ARGS="-m 128 -p 11211 -u root -P /var/run/memcachedrep.pid -d -x 10.100.1.10

Copy / Past code2 to step 2
#! /bin/sh
### BEGIN INIT INFO
# Provides:             memcached
# Required-Start:       $syslog
# Required-Stop:        $syslog
# Should-Start:         $local_fs
# Should-Stop:          $local_fs
# Default-Start:        2 3 4 5
# Default-Stop:         0 1 6
# Short-Description:    memcached - Memory caching daemon replicated
# Description:          memcached - Memory caching daemon replicated
### END INIT INFO
# Author: Michael 
#
# Please remove the "Author" lines above and replace them
# with your own name if you copy and modify this script.
# Do NOT "set -e"
# PATH should only include /usr/* if it runs after the mountnfs.sh script
PATH=/sbin:/usr/sbin:/bin:/usr/bin
DESC="memcachedrep"
NAME=memcached
DAEMON=/usr/local/bin/$NAME
DAEMON_ARGS="--options args"
PIDFILE=/var/run/memcachedrep.pid
SCRIPTNAME=/etc/init.d/$DESC
VERBOSE="yes"
# Exit if the package is not installed
[ -x "$DAEMON" ] || exit 0
# Read configuration variable file if it is present
[ -r /etc/$DESC ] && . /etc/$DESC
# Load the VERBOSE setting and other rcS variables
. /lib/init/vars.sh
# Define LSB log_* functions.
# Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
. /lib/lsb/init-functions
#
# Function that starts the daemon/service
#
do_start()
{
# Return
#   0 if daemon has been started
#   1 if daemon was already running
#   2 if daemon could not be started
start-stop-daemon --start --quiet --pidfile $PIDFILE --exec $DAEMON --test > /dev/null \
|| return 1
start-stop-daemon --start --quiet --pidfile $PIDFILE --exec $DAEMON -- \
$DAEMON_ARGS \
|| return 2
# Add code here, if necessary, that waits for the process to be ready
# to handle requests from services started subsequently which depend
# on this one.  As a last resort, sleep for some time.
}
#
# Function that stops the daemon/service
#
do_stop()
{
# Return
#   0 if daemon has been stopped
#   1 if daemon was already stopped
#   2 if daemon could not be stopped
#   other if a failure occurred
    start-stop-daemon --stop --quiet --retry=TERM/30/KILL/5 --pidfile $PIDFILE --name $NAME
    RETVAL="$?"
    [ "$RETVAL" = 2 ] && return 2
# Wait for children to finish too if this is a daemon that forks
# and if the daemon is only ever run from this initscript.
# If the above conditions are not satisfied then add some other code
# that waits for the process to drop all resources that could be
# needed by services started subsequently.  A last resort is to
# sleep for some time.
start-stop-daemon --stop --quiet --oknodo --retry=0/30/KILL/5 --exec $DAEMON
[ "$?" = 2 ] && return 2
# Many daemons don't delete their pidfiles when they exit.
rm -f $PIDFILE
return "$RETVAL"
}
#
# Function that sends a SIGHUP to the daemon/service
#
do_reload() {
#
# If the daemon can reload its configuration without
# restarting (for example, when it is sent a SIGHUP),
# then implement that here.
#
start-stop-daemon --stop --signal 1 --quiet --pidfile $PIDFILE --name $NAME
return 0
}
case "$1" in
  start)
[ "$VERBOSE" != no ] && log_daemon_msg "Starting $DESC" "$NAME"
do_start
case "$?" in
0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
esac
;;
  stop)
[ "$VERBOSE" != no ] && log_daemon_msg "Stopping $DESC" "$NAME"
do_stop
case "$?" in
0|1) [ "$VERBOSE" != no ] && log_end_msg 0 ;;
2) [ "$VERBOSE" != no ] && log_end_msg 1 ;;
esac
;;
  #reload|force-reload)
#
# If do_reload() is not implemented then leave this commented out
# and leave 'force-reload' as an alias for 'restart'.
#
#log_daemon_msg "Reloading $DESC" "$NAME"
#do_reload
#log_end_msg $?
#;;
  restart|force-reload)
#
# If the "reload" option is implemented then remove the
# 'force-reload' alias
#
log_daemon_msg "Restarting $DESC" "$NAME"
do_stop
case "$?" in
 0|1)
do_start
case "$?" in
0) log_end_msg 0 ;;
1) log_end_msg 1 ;; # Old process is still running
*) log_end_msg 1 ;; # Failed to start
esac
;;
 *)
  # Failed to stop
log_end_msg 1
;;
esac
;;
  *)
#echo "Usage: $SCRIPTNAME {start|stop|restart|reload|force-reload}" >&2
echo "Usage: $SCRIPTNAME {start|stop|restart|force-reload}" >&2
exit 3
;;
esac
:

Test the repcached

In Server 1
  1. telnet 127.0.0.1 11211
  2. set foo 0 0 3
  3. bar
In Server 2
  1. telnet 127.0.0.1 11211
  2. get foo (You will get bar as the Output)

Thursday, December 8, 2011

Drupal scaling and performance tuning - Part 3

Running a Developer Portal is not an easy task as I thought. When it comes to WSO2 Oxygentank It is more critical as WSO2 community is heavily depend on the portal.

To be frankly if the site went offline for 1 minute, I used to get more than 20 email /  IM chats or calls. Most of the solutions described in the web really didn't work for us. So we had to find our own way to stabilize the system. After spending so much of time configuring and tuning we ended up with a scalable and stable solution. In part 1 and 2 I have described methods which we used.

During peak hours Our portal started to send 502 Bad gateway message from Nginx. The reason was due to mysql high-load in master 1 server, Nginx didn't receive the response with in defined time. 

Solution was we configured Nginx with backup nodes setting instead of setting up 4 nodes with weight balancing to handle the load.

Sample nginx.conf

upstream wso2.org {
    server node1server:80 weight=5 fail_timeout=20s;
    server node2server.org:80 weight=5 fail_timeout=20s;
    server node3server:80 backup;
    server node4server:80 backup;
  }

with this setup 
  1. Nginx always send traffic to node1 and node2 servers
  2. Primary data source is master1 mysql instance.
  3. Secondary data source is master2 MySQL instance
When node1 or node2 couldn't response to Nginx with in 20 seconds we route the traffic to backup nodes
  1. Nginx send traffic to node3 and node4 when node1 and node2 can not response 
  2. Primary data source is master2 MySQL Instance
  3. Secondary data source is master1 MySQL instance
with that setting we were able to keep the site with 0% downtime.


(Overview of WSO2 Oxygentank)

Nginx always looked after load balancing when http/s traffic get high and with the backup node method we were able to keep our MySQL instance up and running without getting meltdown.   


Friday, October 7, 2011

Drupal scaling and performance tuning - Part 2

Drupal is highly relational.  When it comes to performance, MySQL has a big role to play in Drupal world. As I explained in Drupal scaling and performance tuning - Part 1, we were able tune Apache to handle much load during high traffic hours. But MySQL didn't give a chance to rest.

As the first step, we decided to look at the MySQL slow queries to identify bad queries. Server logged all queries that took more than 2 seconds to process and most of them were node permissions, user sessions, access log, cache, comments, watchdog and node contents.

The server load was gone up and it was almost at the frozen state most of the time. As a solution, I knew that we would have to end up with a modified MySQL configuration file.

But, What are the Parameters I need to change and what are the values for them, like i did for Apache?

Then I came across with a tool called MySQLTuner-perl which is a script written in Perl that allows us to review a MySQL installation quickly and make adjustments to increase performance and stability.  The current configuration variables and status data are retrieved and presented in a brief format along with some basic performance suggestions

Tools used
  1. MySQLTuner-perl
  2.  Maatkit ( Power tools for open-source databases)

Mysql Optimization - I will not list all configurations here as MySQL Tuner gives you a good guide
  1. MySQL Tuner
    1. Download the Mysql Tuner (wget https://github.com/rackerhacker/MySQLTuner-perl/blob/master/mysqltuner.pl)
    2. Make it executable (chmod +x mysqltuner.pl)
    3. Run MySQL Turner - You need your MySQL root password in order to execute this (./mysqltuner.pl) 
    4.  You should carefully read the output, especially the recommendations at the end. It shows exactly which variables you should adjust in the [mysqld] section of your my.cnf (on Debian and Ubuntu the full path is /etc/mysql/my.cnf). Whenever you change your my.cnf, make sure that you restart MySQL. You can then run MySQLTuner again to see if it has further recommendations to improve the MySQL performance. This way, you can optimize MySQL step by step.
  2. Maatkit
    1. mk-duplicate-key-checker - helped me to find duplicate indexes and foreign keys on MySQL tables. 
    2. Removed all duplicate indexes and foreign keys - This helped MySQL to process SQLs smoothly.
    3. mk-query-digest and mk-query-profiler helped to profile and test new configurations/ Modifications to the database.
    4.  mk-variable-advisor to double check changes and recommendations made by MySQL Tuner.
  3. Converted comments, node, users tables from MyISAM to InnoDB
  4. mysqlcheck -o -A -p command optimized other tables in the Drupal Database
After going through few cycles of above points, we were able to get the MySQL server to a level that it can handle much load without any hiccups. Slow query log didn't report any slow quries. SHOW PROCESSLIST was always below 20 - 30.

Other Optimizations used
  1. Disable watchdog, Statistic modules from Drupal to reduce the read/write load to MySQL
  2. Uninstalled all unused modules.
Finally I was able to get Our Oxygentank Database to a state where it can breathe freely without running on full power all the time. 

But I was not happy with this setting since the second MySQL master server was always running silently without helping the Primary Master server to handle its load.

I used NGINX to solve this problem.

I will discuss how I used NGINX to share MySQL load between Databases / Drupal nodes with the help of Memcached, in part 3.








Thursday, September 29, 2011

Drupal scaling and performance tuning - Part 1

As WSO2Con 2011 was a huge hit in IT sector, I had to face a problem with wso2.org (Oxygentank developer Portal). Which site couldn't handle a large traffic constantly.

The old system we had was NGINX (Load Balancer) fronted 4 Drupal nodes which was running with Apache2. with master - master MySQL replication.

But during a high-traffic Our servers got more than 1 minute (average) time to response  on a request. As  NGINX's fail timeout for 45 seconds, users got 504 Gateway timeout most of the time. Same time Apache server load went above 30 - 70.

First as a quick solution we spinned new 2 servers and routed the load across them. That helped us to cater for few hours but again there was a huge delay when site was loading. Then we found that MySQL server load also gone up and process-list has grown.

Then I found that MySQL Server take time process queries than it was before. Root cause for this new problem was 6 drupal nodes had started stress MySQL servers continually.

Then I started to ask some questioned by my self. 

1. OK, Then why we didn't see this new problem before we plug new nodes?
Answer was simple, Drupal nodes couldn't stress the DB as it got self killed (Frozen) during high traffics.

Before jump in the Apache, Our WSO2 Infrastructure team revisit our monitoring systems (Ganglia) and found that servers were running on swap most of the time

As a solution I stated to tune up the Apache to avoid memory swapping problem. Then I saw that there was some miss configurations which take more memories when Apache process start to handle the traffic. Due to that configuration server can easily excused. 

MaxClients as was one of a major key parameter I had to change during the Apache memory optimization process

Apache Memory Optimization  
  1. Find the non-swapped physical memory Apache has used (RES)
    1. Run top command and press shift + m to find highest  RES value which is Apache process use
  2. Find the AVAILABLE APACHE MEMORY POOL.
    1. Run service apache2 stop and type free -m
    2. Note the used memory and Subtract it from total (This will give you the FREE MEMORY POOL)
    3. Multiply MEMORY POOL by 0.8 to find the average AVAILABLE APACHE POOL (This will allow server  20% memory reserve for burst periods)
  3. Calculate MaxClients
    1. Divide AVAILABLE APACHE POOL by the highest RES memory used by Apache (Step 1). Now we have MaxClients number
    2. Open apache2.conf and change MaxClients value
  4. Other tweaks
    1. Set Keepalive On (keep it off if you haven't keep 20% memory reserve. )
    2. set keepalivetimeout to the lowest value (This will prevent connection hanging, If you experience high latency to your server, set it to 2-5 seconds)
    3. Set your Timeout to a reasonable value like 10 - 40 (Keep it low)
    4. set MaxKeepAliveRequests with in 70-150 (If you have a good idea about objects in a 1 page set it to match the object count)
    5. Set MinSpareServers equal to 10-25% of MaxClients
    6. Set MaxSpareServers equal to 25-50% of MaxClients
    7. Set StartServers equal to MaxSpareServers
    8. Set MaxRequestsPerChild somewhere between 400 - 700 (if you see rapid Apache child process memory use growth) to 10000
  5. Apply Changes to Apache
    1. Run service apache2 start
Once I restart our servers with above setting, server could handle 150% traffic from Apache end and server load was always below 2 (in a burst server-load was 1-2)

This solved my Apache hanging problem. Then I ran a another load test to verify my setting. I was amazed with the improvement but that haven't solved my problem. 

server is taking bit long to response but I didn't see connection drop.


2. Now, What is wrong with my settings, Why server response is slow ? MySQL!! 

I will discuss in part 2, how I addressed this problem .



Monday, June 27, 2011

Pressflow 6 with mysql Replication

Pressflow6 should support database replication according to the pressflow wiki. but when I configured master and slave, pressflow instance always pick master. If master was down, site couldn't load and site redirected to the site-offline error page.

I did some modifications in database.inc, database.mysqli.inc and settings.php in order to work when master mysql server was down.

Scenario

  1. If mysql master is inactive, connect to Slave,
  2. if slave is inactive but Master is active, then connect to Master
  3. if Master and Slave both are inactive, Redirect user to Offline page


Modifications can be found in red text.


database.inc

  1. Below change will modify the db_connect call to Master instance by passing 2nd parameter ("master") and by assigning $db_slave_conns[$name] values to $active_db.
  2. Validate before slave connection create. If master is active we are not calling Slave by adding $db_conns[$name]->connect_errno > 0 

function db_set_active($name = 'default') {
  global $db_url, $db_slave_url, $db_type, $active_db, $active_slave_db;
  static $db_conns, $db_slave_conns, $active_name = FALSE;


  if (empty($db_url)) {
    include_once 'includes/install.inc';
    install_goto('install.php');
  }


  if (!isset($db_conns[$name])) {
    // Initiate a new connection, using the named DB URL specified.
    if (is_array($db_url)) {
      $connect_url = array_key_exists($name, $db_url) ? $db_url[$name] : $db_url['default'];
      if (is_array($db_slave_url[$name])) {
        $slave_index = mt_rand(0, count($db_slave_url[$name]) - 1);
        $slave_connect_url = $db_slave_url[$name][$slave_index];
      }
      else {
        $slave_connect_url = $db_slave_url[$name];        
      }
    }
    else {
      $connect_url = $db_url;
      if (is_array($db_slave_url)) {
        $slave_index = mt_rand(0, count($db_slave_url) - 1);
        $slave_connect_url = $db_slave_url[$slave_index];
      }
      else {
        $slave_connect_url = $db_slave_url;        
      }
    }


    $db_type = substr($connect_url, 0, strpos($connect_url, '://'));
    $handler = "./includes/database.$db_type.inc";


    if (is_file($handler)) {
      include_once $handler;
    }
    else {
      _db_error_page("The database type '". $db_type ."' is unsupported. Please use either 'mysql' or 'mysqli' for MySQL, or 'pgsql' for PostgreSQL databases.");
    }


    $db_conns[$name] = db_connect($connect_url,'master');
    if ($db_conns[$name]->connect_errno > 0 && !empty($slave_connect_url)) {
      $db_slave_conns[$name] = db_connect($slave_connect_url); 
    }
  }


  $previous_name = $active_name;
  // Set the active connection.
  $active_name = $name;
  $active_db = $db_conns[$name];
  if (isset($db_slave_conns[$name])) {
    $active_slave_db = $db_slave_conns[$name];
    $active_db = $db_slave_conns[$name];    
  }
  else {
    unset($active_slave_db);
  }


  return $previous_name;
}