Using CloudWatch Log Insights and AWS Athena to troubleshoot Network issues


Chuck Ingram

Enabling logs and configuring them is an important step in securing your AWS environment, however you have to also be using them effectively! Just monitoring your environment isn’t enough, you should also be proactive with the data you’re collecting. Today I’ll focus on VPC Flow Logs, where and how they’re stored, and how to use that data to troubleshoot common network issues.

Finding your Logs

First off, determine if VPC Flow Logs are enabled. There are two possible ways to deliver flow logs data; to CloudWatch Logs or to an S3 bucket. If Flow Logs already exist you’ll be able to see the destination.

Otherwise you’ll need to create a new one. For this example I’ve selected to filter ALL traffic but that can be narrowed down to Accept only and Reject only as well. Regardless of which destination you choose you’ll need to have it set up ahead of time, so you’ll need to make a CloudWatch Log Group, or an S3 Bucket before you can point Flow Logs to it. You’ll also need an IAM role which you can set up beforehand, or select “Set Up Permissions” to generate a role.

Once the Flow Logs have been created and populated with data we can start using it for things like queries.

Using queries with Flow Log Data

CloudWatch Logs

Here’s what the raw flow log data looks like in a CloudWatch Log group. While using the console is a good start, we can do better.

By using the CloudWatch Logs Insights we can get AWS to do all the heavy lifting for us. Select Insights under Logs and then choose your log group. There are also example queries accessible from the folder button on the right.

Simple Storage Service (S3)

When using an S3 bucket as the destination, a table has to be set up that queries can be run from. We’ll be using AWS Athena to run those queries and set up the initial table. Those familiar with SQL will feel right at home. I will provide the code here but it’s also available in the AWS Documentation.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
CREATE EXTERNAL TABLE IF NOT EXISTS vpc_flow_logs (
  version int,
  account string,
  interfaceid string,
  sourceaddress string,
  destinationaddress string,
  sourceport int,
  destinationport int,
  protocol int,
  numpackets int,
  numbytes bigint,
  starttime int,
  endtime int,
  action string,
  logstatus string
)  
PARTITIONED BY (dt string)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ' '
LOCATION 's3://<bucket_name>/AWSLogs/<account_id>/vpcflowlogs/<region>/'
TBLPROPERTIES ("skip.header.line.count"="1");

The important part is editing the LOCATION set here to your S3 bucket that stores the VPC flow logs you want to query.

You may also need to create a partition to be able to read the data as mentioned in step 4 of the documentation. The code you’ll need looks like this:

1
2
3
ALTER TABLE vpc_flow_logs
ADD PARTITION (dt ='YYYY-MM-dd')
location 's3://<bucket_name>/AWSLogs/<account_id>/vpcflowlogs/<region>/YYYY/MM/dd';

AWS provides some query examples in the documentation but I will also be listing some other queries here as well.

Queries

Basic/Test Queries

While these two queries are basic, they will test if your Log Group or Athena tables are set up correctly. The ‘limit’ is there to keep the query from taking a long time to parse your data and can be optionally inserted into any other query in both CloudWatch Logs and Athena.

CloudWatch Logs

1
2
3
fields @timestamp, interfaceId, srcAddr, srcPort, dstAddr, dstPort, protocol, action, packets, bytes, version |
sort @timestamp desc |
limit 50
NOTE: When using Insights you’ll often use pipes (|) to add filters to your queries.

Athena
1
2
3
SELECT *
FROM vpc_flow_logs
LIMIT 50;

AWS Elastic Network Interface (ENI) focus

This code checks for all traffic coming through a specific AWS ENI. This is great for when you’re troubleshooting specific instances or other network connections to narrow down traffic and errors.

CloudWatch Logs Search on specific ENI
1
2
3
fields @timestamp, interfaceId, srcAddr, dstAddr |
filter interfaceId = <AWS ENI ID> |
limit 50
Athena Search on specific ENI
1
2
3
4
SELECT * 
FROM vpc_flow_logs 
WHERE interfaceid = <AWS ENI ID>
LIMIT 50;

Port Scanner

CloudWatch Logs
1
2
3
filter (action="REJECT") |
stats count_distinct(dstPort) as portcount by srcAddr |
sort portcount desc
Athena
1
2
3
4
SELECT COUNT(destinationport) AS Hits, sourceaddress
FROM vpc_flow_logs
GROUP BY sourceaddress
ORDER BY Hits DESC;

Vulnerability Scanners

This query checks to see how many times an inbound IP address has tried to hit the instance in question using the same source port.

CloudWatch Logs
1
2
3
4
filter (srcAddr != "<INSTANCE IP>" and srcPort > 1024) |
stats count(*) as hits by srcAddr,srcPort |
sort hits desc |
limit 50
Simple Storage Service (S3)
1
2
3
4
5
SELECT sourceaddress, sourceport, COUNT(*) as Hits
FROM vpc_flow_logs
WHERE sourceport > 1024 AND sourceaddress != '<INSTANCE IP>'
GROUP BY sourceaddress, sourceport
ORDER BY Hits DESC;

Summary

Today you’ve learned about VPC Flow Logs, where they can go, and how you can use them to query data in meaningful ways. There are plenty of other queries you could try, and AWS also allows you to monitor pretty much every service they provide in some way. Remember that monitoring is just the first step, it’s how you use that data that counts.


Chuck Ingram


Orbit

Like what you read? Why not subscribe to the weekly Orbit newsletter and get content before everyone else?