The first unified integration platform for big data that cuts down the time to production for data applications and data lakes by 80%.
“The team at Cask do some fantastic work in product and embrace an open source strategy that’s just what developers need to help them build big data applications on Cloudera.”
– Mike Olson, Co-founder and Chief Strategy Officer, Cloudera
CDAP helps you build, test, and run distributed applications across their entire lifecycle. Simple and easy-to-use APIs help maximize developer productivity, reducing the time to deliver big data solutions.
CDAP helps you ingest, transform, egress, blend, normalize, index, and track data from diverse sources, types, and formats.
CDAP provides extensive security, compliance, and comprehensive audit capabilities to mitigate risk, improve visibility of data, and identify provenance and lineage.
CDAP provides a scalable and reliable production runtime environment and operational tools for easy deployment and management of solutions on Hadoop.
Extensions are rich, intuitive, visual tools that empower users to accomplish tasks with minimal IT support.
Cask Data Application Platform (CDAP) is the first Unified Platform for Big Data. It provides standardization and deep integrations with diverse Hadoop technologies allowing companies to focus on application logic and insights, rather than infrastructure and integration. The platform is 100% open-source, highly extensible, and delivers enterprise-class features to help accelerate time to build, deploy, and manage data-centric applications & data lakes on Hadoop and Spark.
There are 3 extensions packaged with CDAP: Cask Hydrator, Cask Wrangler and Cask Tracker. CDAP Extensions are self-service, purpose-built applications on CDAP designed to solve common and critical big data challenges. Cask Hydrator for data pipelines, Cask Wrangler for data wrangling and Cask Tracker for data discovery and metadata.
CDAP removes barriers to innovation as an extensible and future-proof platform that provides consistency across environments and easily integrates with existing MDM, BI, and security solutions.
100% Open Source and Fully Extensible
CDAP is 100% open source and supports the latest open source projects. Through its plug-in architecture and multiple integration points, CDAP provides a highly extensible and future-proof application and integration framework.
Runs Natively on All Hadoop Distros – On-Premises and in the Cloud
CDAP separates application logic from data logic and integration logic, creating reusable components that are portable across different runtime environments, applications and Hadoop distributions on-premises and in the cloud. It has been certified by Cloudera, Hortonworks, and MapR.
Designed for Interoperability
CDAP offers enterprise-grade big data lifecycle management and easy integration with existing enterprise solutions for security, MDM and BI, protecting past investments by the business in these solutions.
Features & Benefits
Rapidly deliver reliable and operational Data Lakes and production Data Applications faster and better. Extensible libraries and components promote reuse and further accelerate the pace of innovation.
Broaden the user base of your big data platform with a radically simplified developer experience and code-free Extensions for non-developers. Reusable libraries can be assembled and run as data pipelines through drag-and-drop interfaces.
Automatic tracking of all audits and data lineage with discovery and search. Integrate into existing security and governance systems with authentication, authorization, and audit built-in automatically.
CDAP provides a container architecture for your data and applications on Hadoop. Simplified abstractions and deep integrations with diverse Hadoop technologies dramatically increase productivity and quality in order to accelerate development and reduce time-to-production to get your Hadoop projects to market faster.
CDAP Datasets provide a standardized, logical container and runtime framework for data in varied storage engines. They integrate with other systems for instant data access and allow the creation of complex, reusable data patterns.
CDAP Programs provide a standardized, logical container and runtime framework to compute in varied processing engines. They simplify testing and operations with standard lifecycle and operational and can consistently interact with any data container.
CDAP Applications provide a standardized packaging system and runtime framework for Datasets and Programs. They manage the lifecycle of data and apps and simplify the painful integration and operation processes in heterogeneous infrastructure.
CDAP Extensions are powerful vertical applications built using public CDAP APIs, similar to the custom applications you would build on CDAP. They immediately help with solving data ingestion and data tracking problems for your Data Lake without writing any code. They are available as part of the standard CDAP distribution.
Cask Hydrator, powered by CDAP, is a code-free visual application for building complex data pipelines and managing them on your Data Lake. With Cask Hydrator, you can ingest data from varied sources, ingest CSV, XML, Excel, etc., cleanse, normalize and transform data, build machine learning models on-fly, perform aggregations, run custom scripts, and more.Learn More
Cask Tracker, powered by CDAP, is a data management application that provides data governance capabilities for your Data Lake. With Cask Tracker, you can create datasets based on technical, business, or operational metadata. You can also annotate and manage ontology of tags, understand the usage of your datasets on the cluster, use lineage to debug data issues, and more.Learn More
Cask Wrangler, powered by CDAP, provides an easy and interactive way to visualize, transform, and cleanse data. It helps data scientists and data engineers derive new schemas and operationalize the data preparation with a few clicks.Learn More
Cask Market is Cask’s “Big Data App Store” with push button deployment of pre-built applications, pipelines, and plugins from within CDAP. It provides step-by-step wizards to help configure and deploy new entities within the platform. Companies can easily reuse common components and use cases built internally or published by Cask and its partners. The pre-built components and solutions include:
- Use Cases: solutions built using pipelines and plugins
- Pipelines: data transformation and processing pipelines
- Applications: big data applications
- Plugins: Cask Hydrator plugins
- Datapacks: sample data that can be loaded onto the platform
- Drivers: drivers giving access to external data sources
Want to see CDAP in action? Click the button to request a demo >>