Extract, Encrypt, Mask and Reload

The customer, a Fortune 50 company in the Telecom sector, developed a legacy custom data pipeline that performed format-preserving encryption and data masking on a KerberOS Hadoop cluster. The pipeline extracted data from Teradata to HDFS, performed transformations, and loaded the results back into Teradata on a daily basis. This pipeline, built by a third-party service, was operationally unstable and required constant costly intervention to keep it running.

CDAP allowed the in-house team to reproduce and replace the existing pipeline. The new process performed the extraction, encryption, masking and reload to and from Teradata in-flight and created a copy of the data on HDFS so the team could run complex ad-hoc queries using Hive. The new pipeline built using CDAP resulted in far more robust pipeline, which was more stable and much easier to manage.

Benefits of Cask Solution

Rapid Time to Value and Reduced Costs
  • Using the code-free drag-and-drop visual interface in CDAP, the in-house team built the pipeline in five days.
  • Additional complex ad-hoc queries were offloaded from Teradata, further reducing overall cost.
Scalability and Reliability
  • The company easily achieved scale with CDAP to monitor and achieve its SLAs.
  • IT gained immediate insights into the performance of the data pipeline and was able to easily determine and handle failure scenarios.
