Monitoring a Web Server on AWS


Travers Annan

Web servers break. Thankfully, we have tools like Autoscaling for EC2 and Elastic Load Balancers to minimize the impact of server downtime, but it remains a fact of IT life that servers can fail for any number of reasons. To get ahead of a failure event, it’s good practice to have some monitoring set up on your machines. In this article, we’ll go through some basic monitoring you can deploy on an Apache EC2 instance and integrate with Cloudwatch Metrics.

AWS has many built in monitoring tools, however it can be useful to extract data and pipe it to a custom Cloudwatch Metric, allowing you to create custom alarms and actions. These metrics can be put into Cloudwatch using the command line interface and scheduled with CRON to collect data at regular intervals. This solution will use two scripts, one to set up the cron job and another to capture the desired metrics.

For reference, the EC2 instance is running a production multi-node Wordpress website, using Elastic File System (EFS) to share files. The EC2 instances are deployed in an Autoscaling group and we’re using Predictive scaling to manage the number of Instances required at any one time.

This solution also uses an S3 bucket to store the most up to date script files, for easy updating.

Let’s start with the metric script.

Metric Collection Script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
#!/bin/bash

#Get the instance and asg ids.
REGION=us-east-1
INSTANCE_ID=$(wget -q -O - http://169.254.169.254/latest/meta-data/instance-id)
ASG_ID=$(aws autoscaling describe-auto-scaling-instances --instance-ids `curl --silent http://169.254.169.254/latest/meta-data/instance-id 2>&1` --region $REGION | grep AutoScalingGroupName | cut -d'"' -f 4)

# Get Busy and Idle workers
BUSYWORKERS=$(wget -q -O - http://localhost/server-status?auto | grep BusyWorkers | awk '{ print $2 }')
IDLEWORKERS=$(wget -q -O - http://localhost/server-status?auto | grep IdleWorkers | awk '{ print $2 }')

# Process counts
PHP_FPM_PROCESSES=$(ps aux | grep php-fpm -c)
HTTPD_PROCESSES=$(ps aux | grep httpd -c)

# Other Metrics
UPTIME=$(wget -q -O - http://localhost/server-status?auto | grep ServerUptimeSeconds | awk '{ print $2 }')
TOTAL_ACCESSES=$(wget -q -O - http://localhost/server-status?auto | grep 'Total Accesses' | awk '{ print $3 }')
REQ_SEC=$(bc <<<"scale=2;$TOTAL_ACCESSES/$UPTIME")

# Namespace Setup
PROPERTY=WordPress
WEBSERVER=Apache
FPM=PHP-FPM

# Put Busy and Idle workers into CloudWatch as custom metrics
aws cloudwatch put-metric-data --region $REGION --metric-name "httpd-busyWorkers" \
--unit Count --value $BUSYWORKERS --dimensions AutoScalingGroupName=$ASG_ID,InstanceId=$INSTANCE_ID --namespace $PROPERTY:$WEBSERVER

aws cloudwatch put-metric-data --region $REGION --metric-name "httpd-idleWorkers" \
--unit Count --value $IDLEWORKERS --dimensions AutoScalingGroupName=$ASG_ID,InstanceId=$INSTANCE_ID --namespace $PROPERTY:$WEBSERVER

# Put calculated Request/Sec
aws cloudwatch put-metric-data --region $REGION --metric-name "httpd-total-accesses" --unit Count \
--value $TOTAL_ACCESSES --dimensions AutoScalingGroupName=$ASG_ID,InstanceId=$INSTANCE_ID --namespace $PROPERTY:$WEBSERVER

# For each Instance
aws cloudwatch put-metric-data --region $REGION --metric-name "httpd-requests-per-second" --unit Count \
--value $REQ_SEC --dimensions AutoScalingGroupName=$ASG_ID,InstanceId=$INSTANCE_ID --namespace $PROPERTY:$WEBSERVER

# For the ASG
aws cloudwatch put-metric-data --region $REGION --metric-name "httpd-requests-per-second" --unit Count \
--value $REQ_SEC --dimensions AutoScalingGroupName=$ASG_ID --namespace $PROPERTY:$WEBSERVER

# Put Processes into CloudWatch as custom metrics
aws cloudwatch put-metric-data --region $REGION --metric-name "asg-instance-httpd-Processes" \
--unit Count --value $HTTPD_PROCESSES --dimensions AutoScalingGroupName=$ASG_ID,InstanceId=$INSTANCE_ID --namespace $PROPERTY:$WEBSERVER

aws cloudwatch put-metric-data --region $REGION --metric-name "asg-instance-php-fpm-processes" \
--unit Count --value $PHP_FPM_PROCESSES --dimensions AutoScalingGroupName=$ASG_ID,InstanceId=$INSTANCE_ID --namespace $PROPERTY:$FPM

The first thing we do is collect metadata from our EC2 instance, using wget and an autoscaling describe command to get the instance and Autoscaling Group IDs.

1
2
3
4
# Get the instance and asg ids.
REGION=us-east-1
INSTANCE_ID=$(wget -q -O - http://169.254.169.254/latest/meta-data/instance-id)
ASG_ID=$(aws autoscaling describe-auto-scaling-instances --instance-ids `curl --silent http://169.254.169.254/latest/meta-data/instance-id 2>&1` --region $REGION | grep AutoScalingGroupName | cut -d'"' -f 4)

Next, we use a combination of grep and wget commands piped into awk to extract the metrics we’re interested in, i.e. active http processes, busy workers, and server uptime.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
 
# Get Busy and Idle workers
BUSYWORKERS=$(wget -q -O - http://localhost/server-status?auto | grep BusyWorkers | awk '{ print $2 }')
IDLEWORKERS=$(wget -q -O - http://localhost/server-status?auto | grep IdleWorkers | awk '{ print $2 }')

# Process counts
PHP_FPM_PROCESSES=$(ps aux | grep php-fpm -c)
HTTPD_PROCESSES=$(ps aux | grep httpd -c)

# Other Metrics
UPTIME=$(wget -q -O - http://localhost/server-status?auto | grep ServerUptimeSeconds | awk '{ print $2 }')
TOTAL_ACCESSES=$(wget -q -O - http://localhost/server-status?auto | grep 'Total Accesses' | awk '{ print $3 }')
REQ_SEC=$(bc <<<"scale=2;$TOTAL_ACCESSES/$UPTIME")

The final part of this script uses the AWS CLI to put the collected data into Cloudwatch, sorting it into custom namespaces based on the type of metric. We use the put-metric-data command with the appropriate parameters to accomplish this.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# Namespace Setup
PROPERTY=WordPress
WEBSERVER=Apache
FPM=PHP-FPM

# Put Busy and Idle workers into CloudWatch as custom metrics
aws cloudwatch put-metric-data --region $REGION --metric-name "httpd-busyWorkers" \
--unit Count --value $BUSYWORKERS --dimensions AutoScalingGroupName=$ASG_ID,InstanceId=$INSTANCE_ID --namespace $PROPERTY:$WEBSERVER

aws cloudwatch put-metric-data --region $REGION --metric-name "httpd-idleWorkers" \
--unit Count --value $IDLEWORKERS --dimensions AutoScalingGroupName=$ASG_ID,InstanceId=$INSTANCE_ID --namespace $PROPERTY:$WEBSERVER

# Put calculated Request/Sec
aws cloudwatch put-metric-data --region $REGION --metric-name "httpd-total-accesses" --unit Count \
--value $TOTAL_ACCESSES --dimensions AutoScalingGroupName=$ASG_ID,InstanceId=$INSTANCE_ID --namespace $PROPERTY:$WEBSERVER

# For each Instance
aws cloudwatch put-metric-data --region $REGION --metric-name "httpd-requests-per-second" --unit Count \
--value $REQ_SEC --dimensions AutoScalingGroupName=$ASG_ID,InstanceId=$INSTANCE_ID --namespace $PROPERTY:$WEBSERVER

# For the Autoscaling Group
aws cloudwatch put-metric-data --region $REGION --metric-name "httpd-requests-per-second" --unit Count \
--value $REQ_SEC --dimensions AutoScalingGroupName=$ASG_ID --namespace $PROPERTY:$WEBSERVER

# Put Processes into CloudWatch as custom metrics
aws cloudwatch put-metric-data --region $REGION --metric-name "asg-instance-httpd-Processes" \
--unit Count --value $HTTPD_PROCESSES --dimensions AutoScalingGroupName=$ASG_ID,InstanceId=$INSTANCE_ID --namespace $PROPERTY:$WEBSERVER

aws cloudwatch put-metric-data --region $REGION --metric-name "asg-instance-php-fpm-processes" \
--unit Count --value $PHP_FPM_PROCESSES --dimensions AutoScalingGroupName=$ASG_ID,InstanceId=$INSTANCE_ID --namespace $PROPERTY:$FPM

Next is the CRON script. This script sets up cron jobs that trigger the monitoring script at regular intervals, pushing the metric into CloudWatch every 5 minutes. It also updates the scripts to the latest versions stored on the script S3 bucket, in case you want to add more metrics later.

CRON Setup Script:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#!/bin/bash
S3_BUCKET="YOUR_BUCKET_CONTAINING_SCRIPTS HERE"

# goto root
cd

# Download files for monitoring solution
sudo aws s3 sync s3://$S3_BUCKET/scripts/ monitoring/scripts/ --delete

# write out current crontab
sudo crontab -l > mycron

# echo new cron into cron file but don't duplicate
if ! grep -Fxq "* * * * * sudo bash monitoring/scripts/mon-script.sh >/dev/null 2>&1" mycron; then
  echo "* * * * * sudo bash monitoring/scripts/mon-script.sh >/dev/null 2>&1" >> mycron
fi
if ! grep -Fxq "0 * * * * sudo bash monitoring/scripts/mon-updater.sh >/dev/null 2>&1" mycron; then
  echo "0 * * * * sudo bash monitoring/scripts/mon-updater.sh >/dev/null 2>&" >> mycron
fi

# install new cron file
crontab mycron
rm mycron

# Send logs to cloudwatch
sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c ssm:AmazonCloudWatch-linux

And there you have it, a monitoring solution for Apache web servers that sends data directly from your web server to CloudWatch metrics. Once these metrics are in CloudWatch, you have multiple options to help you keep on top of them, like custom dashboards or notification alerts. You can also customize the metrics you collect if you want to get fancy with the Bash shell. Regardless, having more data at your disposal can make all the difference when you’re dealing with server downtime, and can help you prevent future failure events if used properly. If you’re anything like me, you’ll sleep sounder knowing you have that much more monitoring on your production machines.


Travers Annan


Orbit

Like what you read? Why not subscribe to the weekly Orbit newsletter and get content before everyone else?