Managed Data Lake

<< Back to Solutions

While data warehousing is highly structured and needs schema to be defined before data is stored, data lakes take the opposite approach. They collect information in a near-native form without considering semantics, quality, or consistency. Data collected in data lakes is defined as schema-on-read, where the structure of the data collected is not known upfront, but needs to be evaluated through discovery when it is read. Data lakes unlock value which was not previously attainable or was hard to achieve. But there are substantial challenges that can result in unmanageable data swamps.

With CDAP, Cask offers a Unified Integration Platform for Big Data that provides all aspects of data lake management, including data integration, metadata and lineage, security and operations, as well as app development on Hadoop and Spark. It helps avoid writing large amounts of custom integration code typically created by the use of point solutions, meaning that companies can focus on application logic and insights instead of infrastructure and integration.

Unified Integration Platform

CDAP is a unified integration platform which integrates application management, data integration, security and governance, and a self-service environment, speeding up the process for building and running a data lake. CDAP provides a broad set of ecosystem integrations for runtime, transport, and storage, including MapReduce, Spark, Spark Streaming, Tigon, Kafka, and HBase.

Mitigates Risk

CDAP provides a comprehensive collection of pre-built building blocks to support data manipulation, data storage, and key insight extraction to build smarter end-to-end solutions with maximum flexibility in the fast evolving big data ecosystem. This mitigates risk by empowering users to quickly go from Hadoop ideation to deployment using our CLI or sleek visual interface, reducing cost and delays.

Rapid Time to Value

CDAP enables developers to get started quickly with built-in data ingestion, exploration, and transformation capabilities available through a rich user interface and interactive shell.

Ensures Data Consistency

CDAP makes all data in Hadoop available for access in real-time and batch without the need to write code, manage metadata, or copy data. Advanced functionality for scale-out, high-throughput, real-time ingestion and transactional event processing while maintaining data consistency enables new use cases.