The first unified integration platform for big data that cuts down the time to production for data applications and data lakes by 80%.
"The team at Cask do some fantastic work in product and embrace an open source strategy that's just what developers need to help them build big data applications on Cloudera."
- Mike Olson, Co-founder and Chief Strategy Officer, Cloudera
CDAP helps you build, test, and run distributed data applications quickly and with confidence across their entire life cycle.
With Cask Hydrator, you can ingest, blend, normalize, index, and track data from many data sources and data types.
Security & Governance
CDAP and Cask Tracker integrate with existing authentication solutions, offer fine-grained access control, and enable comprehensive audit trails for all data movement.
With Cask Hydrator and Cask Tracker, you can build, run, and automate data pipelines, and perform self-service data discovery and metadata.
CDAP accelerates time to value from Hadoop through standardized APIs, configurable templates and visual interfaces, and it increases efficiencies through reusable and portable components.
With CDAP and its included extensions Cask Hydrator for data pipelines, and Cask Tracker for data discovery and metadata, IT organizations empower their business to make better decisions faster by enabling seamless and well-governed self-service analytics.
CDAP removes barriers to innovation as an extensible and future-proof platform that provides consistency across environments and easily integrates with existing MDM, BI, and security solutions.
100% Open Source and Fully Extensible
CDAP is 100% open source and supports the latest open source projects. Through its plug-in architecture and multiple integration points, CDAP provides a highly extensible and future-proof application and integration framework.
Runs Natively on All Hadoop Distros - On-Premises and in the Cloud
CDAP separates application logic from data logic and integration logic, creating reusable components that are portable across different runtime environments, applications and Hadoop distributions on-premises and in the cloud. It has been certified by Cloudera, Hortonworks, and MapR.
Designed for Interoperability
CDAP offers enterprise-grade big data lifecycle management and easy integration with existing enterprise solutions for security, MDM and BI, protecting past investments by the business in these solutions.
Features & Benefits
Rapidly deliver reliable and operational Data Lakes and production Data Applications faster and better. Extensible libraries and components promote reuse and further accelerate the pace of innovation.
Broaden the user base of your big data platform with a radically simplified developer experience and code-free Extensions for non-developers. Reusable libraries can be assembled and run as data pipelines through drag-and-drop interfaces.
Automatic tracking of all audits and data lineage with discovery and search. Integrate into existing security and governance systems with authentication, authorization, and audit built-in automatically.
CDAP provides a container architecture for your data and applications on Hadoop. Simplified abstractions and deep integrations with diverse Hadoop technologies dramatically increase productivity and quality in order to accelerate development and reduce time-to-production to get your Hadoop projects to market faster.
CDAP Datasets provide a standardized, logical container and runtime framework for data in varied storage engines. They integrate with other systems for instant data access and allow the creation of complex, reusable data patterns.
CDAP Programs provide a standardized, logical container and runtime framework to compute in varied processing engines. They simplify testing and operations with standard lifecycle and operational and can consistently interact with any data container.
CDAP Applications provide a standardized packaging system and runtime framework for Datasets and Programs. They manage the lifecycle of data and apps and simplify the painful integration and operation processes in heterogeneous infrastructure.
CDAP Extensions are powerful vertical applications built using public CDAP APIs, similar to the custom applications you would build on CDAP. They immediately help with solving data ingestion and data tracking problems for your Data Lake without writing any code. They are available as part of the standard CDAP distribution.
Cask Hydrator, powered by CDAP, is a code-free visual application for building complex data pipelines and managing them on your Data Lake. With Cask Hydrator, you can ingest data from varied sources, ingest CSV, XML, Excel, etc., cleanse, normalize and transform data, build machine learning models on-fly, perform aggregations, run custom scripts, and more.Learn More
Cask Tracker, powered by CDAP, is a data management application that provides data governance capabilities for your Data Lake. With Cask Tracker, you can create datasets based on technical, business, or operational metadata. You can also annotate and manage ontology of tags, understand the usage of your datasets on the cluster, use lineage to debug data issues, and more.Learn More