Building an Sentiment Analysis Solution With AWS Comprehend
At the last Re: Invent conference AWS announced several additions to their Artificial Intelligence and Machine Learning portfolio. One of the new services, in particular, caught my attention - AWS Comprehend. In previous posts, I’ve talked about how important AWS’ AI/ML solutions are to organizations, as well as how easy they are to use.
AWS Comprehend is no different. It’s an easy to use Natural Language Processing (NLP) service which allows you to analyze text. Since Re: Invent I’ve had some different ideas on how to test out the service. This week I landed on a plan to examine sentiment in tweets to @awscloud.
Here’s how it all came together:
EC2 Instance (Python script)
You’ll notice there’s an EC2 instance in the mix - if you’ve been following along for awhile you know how I feel about EC2 instance and OS’, but in this case, it made sense since we’re leveraging the Twitter Stream API and needed a method of continuously pulling updates from the stream.
One thing to note: I used an Instance profile and IAM role to provide the Python script access to the Firehose service (more on this in a bit). By using an IAM Role, we’re able to avoid hardcoded Access Keys.
Outside of the Python script running on the EC2 instance, the rest of the solution is 100% serverless. Since the majority of the solution is serverless, it means that we have no services to manage and the solution will scale quickly to handle any increase in load.
Firehose is one of the offerings available within the Kinesis service, and it allows you to load streaming data into storage services - in our case we’re using Firehose to store Twitter updates in an S3 bucket.
When Firehose uploads an object to the S3 bucket, I’m using S3 Event notifications to invoke a Lambda function. The function parses the Twitter update and passes the text field to AWS Comprehend to be analyzed. For now, I’m merely determining sentiment. Here’s the code snippet that identifies sentiment:
response = comprehend.detect_sentiment( Text = tweet['text'], LanguageCode = 'en' ) score['mixScore'] = response['SentimentScore']['Mixed'] score['posScore'] = response['SentimentScore']['Positive'] score['negScore'] = response['SentimentScore']['Negative'] score['neuScore'] = response['SentimentScore']['Neutral']
Like the Python script on the EC2 instance, I created an IAM role to allow the Lambda function to interact with S3, Comprehend and DynamoDB and using least privilege I limited the actions the function can perform.
Simple eh? That is why services like Comprehend are so essential - they have the potential to provide insight to organizations with little heavy lifting.
Finally, since Lambda is stateless, I needed somewhere to store the processed updates. The easiest way to do this is to create a table in DynamoDB. I’m using the update ID as the partition key and the timestamp as a sort key.
There’s still some work to do here.
While the Lambda function is relatively simple, I’d like to break it into smaller pieces and use Step Functions to tie it all together.
Finally, I’m using just one piece of Comprehend in this solution. I’d like to extend the functionality by extracting phrases and topics.