Case Study

Security Analytics and Reporting

<< Back to Customers

Challenge

A Fortune 50 financial institution had a legacy pipeline that aggregated batched data onto a secured Hadoop cluster to create daily aggregates and reports. While it performed multiple transformations which created new datasets, the organization faced multiple issues:

• The data pipeline was inefficient, taking up to six hours to run and requiring manual intervention almost daily
• Reports were misaligned with day boundaries
• Any points of failure required reconfiguring and restarting the pipeline, which was time consuming and frustrating
• Major setup and development time was needed to add new sources
• The team was unable to test and validate the pipeline prior to deployment, so testing was conducted directly on the production cluster - a poor use of resources

Using CDAP, the organization’s data development team created independent, parallel pipelines that moved the data from SQL into Time Partitioned Datasets. Transformations were then performed in-flight with the ability to handle error records. After completing the initial transfers, another pipeline combined the data into a single Time Partitioned Dataset and fed it into an aggregation and reporting pipeline.

Benefits of Cask Solution

Rapid Time to Value
  • In-house Java developers with limited knowledge of Hadoop built and ran the complex pipelines at scale within two weeks after only four hours of training.
  • The new data pipeline took approximately two hours – compared to 6 hours before – to run without any manual intervention.
Improved Operations
  • Transforms were performed in-flight with the ability to handle error records.
  • The visual interface enabled the team to develop, test, debug, deploy, run, automate and view pipelines during operations.
  • The development experience was improved by reducing unnecessary cluster utilization.
Simplified Management
  • Tracking tools made it easy to rerun the process from any point of failure.
  • The new process reduced system complexity, which simplified pipeline management.
Download Case Study