Extract, Encrypt, Mask and Reload
The customer, a Fortune 50 company in the Telecom sector, developed a legacy custom data pipeline that performed format-preserving encryption and data masking on a KerberOS Hadoop cluster. The pipeline extracted data from Teradata to HDFS, performed transformations, and loaded the results back into Teradata on a daily basis. This pipeline, built by a third-party service, was operationally unstable and required constant costly intervention to keep it running.
CDAP allowed the in-house team to reproduce and replace the existing pipeline. The new process performed the extraction, encryption, masking and reload to and from Teradata in-flight and created a copy of the data on HDFS so the team could run complex ad-hoc queries using Hive. The new pipeline built using CDAP resulted in far more robust pipeline, which was more stable and much easier to manage.
Benefits of Cask Solution
- Using the code-free drag-and-drop visual interface in CDAP, the in-house team built the pipeline in five days.
- Additional complex ad-hoc queries were offloaded from Teradata, further reducing overall cost.
- The company easily achieved scale with CDAP to monitor and achieve its SLAs.
- IT gained immediate insights into the performance of the data pipeline and was able to easily determine and handle failure scenarios.