Managed Data Lake
While data warehousing is highly structured and needs schema to be defined before data is stored, data lakes take the opposite approach. They collect information in a near-native form without considering semantics, quality, or consistency. Data collected in data lakes is defined as schema-on-read, where the structure of the data collected is not known upfront, but needs to be evaluated through discovery when it is read. Data lakes unlock value which was not previously attainable or was hard to achieve. But there are substantial challenges that can result in unmanageable data swamps.
With CDAP, Cask offers a Unified Integration Platform for Big Data that provides all aspects of data lake management, including data integration, metadata and lineage, security and operations, as well as app development on Hadoop and Spark. It helps avoid writing large amounts of custom integration code typically created by the use of point solutions, meaning that companies can focus on application logic and insights instead of infrastructure and integration.
Unified Integration Platform
CDAP is a unified integration platform which integrates application management, data integration, security and governance, and a self-service environment, speeding up the process for building and running a data lake. CDAP provides a broad set of ecosystem integrations for runtime, transport, and storage, including MapReduce, Spark, Spark Streaming, Tigon, Kafka, and HBase.
CDAP provides a comprehensive collection of pre-built building blocks to support data manipulation, data storage, and key insight extraction to build smarter end-to-end solutions with maximum flexibility in the fast evolving big data ecosystem. This mitigates risk by empowering users to quickly go from Hadoop ideation to deployment using our CLI or sleek visual interface, reducing cost and delays.
Rapid Time to Value
CDAP enables developers to get started quickly with built-in data ingestion, exploration, and transformation capabilities available through a rich user interface and interactive shell.
Ensures Data Consistency
CDAP makes all data in Hadoop available for access in real-time and batch without the need to write code, manage metadata, or copy data. Advanced functionality for scale-out, high-throughput, real-time ingestion and transactional event processing while maintaining data consistency enables new use cases.
Case Study: Thomson Reuters Cuts Time for App Development on Hadoop with CDAP
Case Study: Data Cleansing and Validation
Case Study: Data Discovery for Data Science
Case Study: Security Analytics and Reporting
Case Study: XML Ingest and Transform
Case Study: Security Reporting
Case Study: Data Lake