Real-Time Social Media Monitoring
Twitter is a powerful tool for enterprises to learn how people perceive their brand and for marketing teams to track sentiment for messaging and campaign development. Real-time Twitter monitoring then allows them to keep a close eye on the results of those marketing efforts.
This use case required the development of a real-time pipeline to ingest the full Twitter stream and then cleanse, transform and perform sentiment analysis of the Tweets related to the campaign, delivering a valuable real-time decision-making platform. The aggregated data was exposed through REST APIs to an internal visualization tool, making output consumption easy for marketing teams.
The pipeline was built using Storm, HBase, MySQL and JBoss. Storm is used to ingest and process the stream of tweets; tweets are then analyzed using Natural Language Process (NLP) algorithms to determine sentiment; and finally, analyzed tweets are aggregated on multiple dimensions (number of re-tweets, attitude, such as positive, negative or neutral). The aggregations are then stored in HBase and periodically throughout the day (twice a day), the data from HBase is moved into MySQL. JBoss exposes REST APIs for accessing the data in MySQL.
The goal was to reduce the overall complexity of the pipeline, eliminating the need for maintaining a separate cluster for processing the real-time Twitter stream, integrating NLP scoring algorithms for sentiment analysis, and exposing the aggregated data from HBase with lower latency, thereby reducing the latency between the data being available in HBase to delivery via REST API. The result is an easy to build, deploy and manage real-time pipeline with better operational insights and easy consumption for line of business leaders.
Benefits of Cask Solution
- Cask Hydrator was used to build the pipeline for processing the full Twitter stream, and it was completed in two weeks.
- This pipeline processes tweets at ~6K/sec in-flight, quickly providing sentiment analysis for marketing teams related to campaigns, delivering a valuable real-time decision-making platform.
- Consolidated infrastructure into a single Hadoop cluster.
- Java Developers were able to build the pipeline and plugin with a smaller learning curve.
- The customer’s organization was able to make decisions faster with the analytics provided.
- CDAP and Cask Hydrator seamlessly and transparently provided easy operational insights through custom dashboards and aggregated logs for debugging.