The first unified integration platform for big data that cuts down the time to production for data applications and data lakes by 80%.

“The team at Cask do some fantastic work in product and embrace an open source strategy that’s just what developers need to help them build big data applications on Cloudera.”

– Mike Olson, Co-founder and Chief Strategy Officer, Cloudera

Application Framework

CDAP helps you build, test, and run distributed data applications quickly and with confidence across their entire life cycle.

Modern Data

With Cask Hydrator, you can ingest, blend, normalize, index, and track data from many data sources and data types.

Security & Governance

CDAP and Cask Tracker integrate with existing authentication solutions, offer fine-grained access control, and enable comprehensive audit trails for all data movement.

User Experience

With Cask Hydrator and Cask Tracker, you can build, run, and automate data pipelines, and perform self-service data discovery and metadata.

Cask Data Application Platform CDAP 4 now in Preview!

CDAP accelerates time to value from Hadoop through standardized APIs, configurable templates and visual interfaces, and it increases efficiencies through reusable and portable components.

With CDAP and its included extensions Cask Hydrator for data pipelines, and Cask Tracker for data discovery and metadata, IT organizations empower their business to make better decisions faster by enabling seamless and well-governed self-service analytics.

CDAP removes barriers to innovation as an extensible and future-proof platform that provides consistency across environments and easily integrates with existing MDM, BI, and security solutions.

100% Open Source and Fully Extensible

CDAP is 100% open source and supports the latest open source projects. Through its plug-in architecture and multiple integration points, CDAP provides a highly extensible and future-proof application and integration framework.

Runs Natively on All Hadoop Distros – On-Premises and in the Cloud

CDAP separates application logic from data logic and integration logic, creating reusable components that are portable across different runtime environments, applications and Hadoop distributions on-premises and in the cloud. It has been certified by Cloudera, Hortonworks, and MapR.

Designed for Interoperability

CDAP offers enterprise-grade big data lifecycle management and easy integration with existing enterprise solutions for security, MDM and BI, protecting past investments by the business in these solutions.

Features & Benefits

Rapidly Innovate

Rapidly deliver reliable and operational Data Lakes and production Data Applications faster and better. Extensible libraries and components promote reuse and further accelerate the pace of innovation.

Provide Self-Service

Broaden the user base of your big data platform with a radically simplified developer experience and code-free Extensions for non-developers. Reusable libraries can be assembled and run as data pipelines through drag-and-drop interfaces.

Enable Governance

Automatic tracking of all audits and data lineage with discovery and search. Integrate into existing security and governance systems with authentication, authorization, and audit built-in automatically.

CDAP Architecture

CDAP provides a container architecture for your data and applications on Hadoop. Simplified abstractions and deep integrations with diverse Hadoop technologies dramatically increase productivity and quality in order to accelerate development and reduce time-to-production to get your Hadoop projects to market faster.

Data Containers

CDAP Datasets provide a standardized, logical container and runtime framework for data in varied storage engines. They integrate with other systems for instant data access and allow the creation of complex, reusable data patterns.

Program Containers

CDAP Programs provide a standardized, logical container and runtime framework to compute in varied processing engines. They simplify testing and operations with standard lifecycle and operational and can consistently interact with any data container.

Application Containers

CDAP Applications provide a standardized packaging system and runtime framework for Datasets and Programs. They manage the lifecycle of data and apps and simplify the painful integration and operation processes in heterogeneous infrastructure.


CDAP Extensions are powerful vertical applications built using public CDAP APIs, similar to the custom applications you would build on CDAP. They immediately help with solving data ingestion and data tracking problems for your Data Lake without writing any code. They are available as part of the standard CDAP distribution.

Cask Hydrator, powered by CDAP, is a code-free visual application for building complex data pipelines and managing them on your Data Lake. With Cask Hydrator, you can ingest data from varied sources, ingest CSV, XML, Excel, etc., cleanse, normalize and transform data, build machine learning models on-fly, perform aggregations, run custom scripts, and more.

Learn More

Cask Tracker, powered by CDAP, is a data management application that provides data governance capabilities for your Data Lake. With Cask Tracker, you can create datasets based on technical, business, or operational metadata. You can also annotate and manage ontology of tags, understand the usage of your datasets on the cluster, use lineage to debug data issues, and more.

Learn More