Data Warehouse Offload
A health insurance company was using Netezza to aggregate and report on multidimensional data related to healthcare services. They were looking for offload alternatives to reduce the workload, and eventually to transition completely to Hadoop for cost savings. Data administrators, analysts, scientists and engineers were operating and supporting the loading, cleansing and reporting efforts and they faced various difficulties while transitioning to Hadoop:
• The original method of moving data in and out of Hadoop was error prone, and it took a great deal of time and effort to add new aggregations or reports.
• Data administrators, analysts, scientists and engineers were spending a lot of time building complex workflows, rather than maximizing time looking at data insights and reports.
• Every new report or data source ingestion from Netezza took months to generate.
• Due to health care regulations, the organization required a reliable solution to track the flow of data into and out of the cluster.
• Integrating the data and making it available within Impala was challenging.
Using CDAP, the company was able to extract data from Netezza and other SQL sources, perform complex joins and transformations, and load it into HDFS. They were then able to perform different aggregations and joins to generate the final report. Loading the final report data back into Netezza was seamless. The company’s in-house team built a data pipeline in less than a week using the dragand-drop visual interface in CDAP and was able to schedule it to run daily and report on errors, giving them the visibility into the data they needed. Beyond those capabilities, they were able to build a pipeline-level dashboard that provided them deep insights into how the offloading and report generation process was functioning.
Benefits of Cask Solution
- CDAP empowered the company’s data administrators, analysts, scientists and engineers to quickly and seamlessly build, deploy and operationalize the Netezza pipelines in less than a week.
- The visual interface of CDAP enabled the team to develop, test, debug, deploy, run, automate and view pipelines during normal operations.
- The pipelines automatically made the data available in Impala, making the ingested data immediately available for ad-hoc queries to validate reports or gain other insights.
- Using CDAP, the customer was able to accept new requirements for generating reports within weeks or days rather than months.
- The system complexity was reduced, simplifying pipeline management and providing more insights into the process that generates critical reports for the company.
- With CDAP the organization was able to seamlessly track the flow of data with zero integration effort.