Cask Data App Platform

The Cask Data Application Platform (CDAP) is an open source, integrated platform for developers and organizations to build, deploy, and manage data applications.

Integrates with major
Apache Hadoop distributions

Supports the latest
open source technologies

Fully open source
and highly extensible

CDAP enables developers and organizations to unify batch and real-time to achieve better business results.

Data Integration

  • Data Ingestion Both real-time and batch, push and pull. Support for Flume, Kafka and REST.
  • ETL and Workflow UI-based pluggable ETL framework with simple programmatic workflow and scheduling engine.
  • Data as a Service Expose data via REST APIs for simple microservices or query with SQL/JDBC.

Application Development

  • Simple Java APIs Provide application packaging, reusable data patterns, and standardized application templates.
  • Local SDK Standalone mode enables local development with IDE integration, rich UI and interactive shell.
  • Real-time and Batch Transactions provide framework-level correctness and guaranteed consistency.

Production Operations

  • Standardized Model Datasets, Programs, Logs and Metrics.
  • Runtime Services Deployment, Discovery, and Monitoring. Metadata and Security Management. Configuration, Log and Metrics Collection and Aggregation.
  • DevOps Tools Rich UI Console, Interactive Shell, full REST APIs.

Cask Hydrator provides self-service ingestion and ETL for the rapid enablement of Hadoop Data Lakes.

It is a code-free framework and user interface to configure, deploy and manage production ETL pipelines

Supports common data prep transformations and a wide variety of sources and sinks, including HDFS, HBase, Amazon S3, traditional RDBMS and EDWs, Kafka, Cassandra and ElasticSearch

Customizable framework with rich user interface enables self-service simplicity with infinite flexibility. Build new sources, sinks and transforms in Java, configure them with JSON and expose them in the Hydrator UI

Powered by CDAP, Hydrator Pipelines are horizontally scalable, ready for production and automatically governed with automated metadata and lineage provided by the platform

Take a look at the CDAP UI

How CDAP can help you

Application Developer Challenges

  • You must learn low-level APIs that require you to understand the architecture of the underlying systems and build your business logic around them.
  • Significant effort is spent writing integration code instead of business logic and simple tasks like ingestion and ETL are complicated and time wasting.
  • Lack of support for the full application lifecycle makes testing and debugging cumbersome and getting into production requires even more support infrastructure.

CDAP for Application Developers

  • A Java-based abstraction layer that separates infrastructure from applications and provides reusable data patterns and application templates.
  • Integrated ingestion and programmatic workflow capabilities with an extensible ETL framework focus time and effort on business logic instead of plumbing.
  • A local development environment with end-to-end application packaging and testing along with a production runtime environment for deploying and managing data and applications.

Systems Engineer Challenges

  • A lack of separation between application logic and infrastructure APIs makes solutions tightly-coupled and difficult to maintain.
  • Simple tasks often require writing code and specialized knowledge, putting tremendous strain on Hadoop developers to enable others.
  • The ability to adopt and utilize new open source technologies in production is a battle with operations to support them.

CDAP for Systems Engineers

  • A Java-based abstraction layer that separates infrastructure from applications and provides reusable application templates.
  • A rich user interface and interactive shell with out-of-the-box ingestion, SQL and ETL capabilities for self-service.
  • A standardized operational model around Datasets and Programs with common lifecycle, logs/metrics, and testing facilities.

Administrator Challenges

  • Operating applications using multiple frameworks is complicated and requires expertise in deploying, configuring, monitoring and troubleshooting each one.
  • Packaging and deploying of applications is difficult to automate and there is a significant dependence on manual work with basic command-line tools.
  • Managing, monitoring and troubleshooting of distributed applications, their supporting services, and the underlying infrastructure is hard and makes new projects difficult to adopt.

CDAP for Administrators

  • Standardized containers for Datasets and Programs provides consistent management of data and applications, regardless of the environment or framework.
  • Application packaging provides distributable artifacts that simplify deployment, configuration, upgrades, and versioning of apps.
  • Unified platform provides application-centric dashboards and a consistent set of APIs for all aspects of applications and services.

Data Analyst Challenges

  • Rather than analyzing data, significant amounts of time are spent on data integration and ETL tasks, manually moving, parsing, cleansing and cataloging datasets.
  • A lack of basic capabilities around data quality and complexity in managing schemas slows down experimentation.
  • Complete separation of the science/lab platform from the production/factory platform makes it difficult to get insights into production.

CDAP for Data Analysts

  • A rich user interface and interactive shell with out-of-the-box ingestion, SQL and ETL capabilities for self-service.
  • Built-in and extensible ETL framework with simple workflow and scheduling capabilities makes it easy to setup, run and manage ETL pipelines.
  • Common platform that enables better sharing and collaboration between data scientists, developers and operations to get apps into production.

CIO / CTO Challenges

  • Lack of talent and training necessary to effectively implement Hadoop solutions in a reasonable amount of time, especially within the line-of-business.
  • No standard development methodologies or reference architecture for Hadoop applications leading to wasted time and multiple solutions to the same problems.
  • Significant time to go from initial adoption of Hadoop to a production application and difficult to put Hadoop in the hands of the line-of-business.

CDAP for CIOs / CTOs

  • Higher-level framework with abstractions, services and tools enable developers and operations to quickly and easily build and deploy solutions on Hadoop.
  • An integrated, standardized and consistent framework that supports common tasks and enforces best practices for developers and operations.
  • Production runtime environment dramatically accelerates time-to-market for your applications and serves as a common platform across your organization.

CDAP Features

Data Ingestion

CDAP provides integrated support for scalable and reliable real-time and batch data ingestion through Streams.

Push events using HTTP, Flume, or client libraries. Pull events using Kafka or custom sources.

Automatically process ingested data in real-time, batch, or with SQL queries using Hive.

Learn More >

ETL Pipelines

CDAP enables code-free configuration, deployment and management of ETL pipelines through ETL Templates.

Create, run and monitor ETL pipelines directly from the CDAP UI or shell.

Pluggable and supports custom real-time and batch sources, sinks and transformations.

Learn More >

Transaction Processing

CDAP integrates a scalable, distributed transaction engine for HBase and HDFS called Tephra.

ACID semantics for batches of non-idempotent read and write operations.

Consistent snapshots of data in batch and SQL concurrent with real-time.

Learn More >

Realtime, Batch and SQL

CDAP exposes all data for real-time, batch and SQL access automatically without any copying of data.

Datasets encapsulate access patterns across real-time, batch and SQL.

Schemas for Streams and Datasets automatically mirrored in HCatalog.

Learn More >

Runtime Portability

CDAP supports three different runtime modes: in-memory, local and distributed.

Memory and standalone modes for simple testing and debugging.

Distributed mode guarantees consistent functionality but at scale.

Learn More >

Complex Data Patterns

CDAP supports complex data patterns that span multiple tables, rows and non-idempotent operations.

Complex, multi-table HBase schemas with transactional guarantees.

Expose domain-specific APIs for app development and data exploration.

Learn More >

CDAP Benefits


CDAP provides a higher-level integrated framework that frees developers and operations from learning, integrating, and managing each individual open source project. Applications built on CDAP separate business logic from infrastructure APIs, drastically reducing complexity and total cost of ownership.

  • Integrated capabilities let you avoid wasting time with common tasks
  • Conceptual integrity means you only integrate once with a standard layer
  • Framework correctness allows you to write application logic without worrying


CDAP enables developers to get started quickly with built-in data ingestion, exploration, and transformation capabilities available through a rich user-interface and interactive shell. Reusable abstractions expose simple APIs for developers to quickly build data-centric applications and get them into production.

  • Rich user interface and interactive shell let you use Hadoop without writing code
  • Application templates and data patterns so you write logic against use-case specific APIs
  • Runtime services and tools so you can test reliably and move to production quickly


CDAP makes all data in Hadoop available for access in real-time, batch and for ad-hoc SQL analysis without the need to write code, manage metadata or copy any data. Advanced functionality for scale-out, high-throughput real-time ingestion and transactional event processing while maintaining data consistency enables disruptive new use-cases.

  • Real-time, batch and SQL integration means data is available for all access patterns
  • Flexible real-time and batch ingestion lets you ingest any type of data, from any source
  • Transactions and consistency for you to build user-facing, real-time applications at scale

Learn more about CDAP in this video

CDAP Courses

Learn directly from the engineers who build CDAP. Cask offers live training courses and workshops, on-site or web-based.

Courses Intro to CDAP Advanced CDAP CDAP for Administrators
Description The Introduction to CDAP course delivers key concepts and expertise needed to build real-time and batch applications on CDAP. At the end of this course participants will be able to write CDAP applications, integrate applications in their CI environments and possess the skills to test and debug applications. The Advanced CDAP course will enable participants get deeper insights into CDAP building blocks and advanced features to build CDAP-optimized applications. The CDAP Administration course will cover concepts required to run and manage CDAP and CDAP applications in a production environment. At the end of this course participants will be able to install, configure and operationalize CDAP.
Duration 8 hours 8 hours 8 hours
  • CDAP Concepts & Capabilities
  • Data Ingestion & Exploration
  • Understanding Datasets
  • Data Serving
  • Batch Processing
  • Scheduling & Sequencing Jobs using Workflow
  • Real-time processing with Tigon
  • Testing and Debugging Strategies
  • CDAP Architecture and internals
  • Building Custom Datasets
  • Building and extending ETL pipelines
  • Building Application templates and adapters
  • Optimize CDAP applications
  • Deeper understanding of transactions
  • CDAP concepts & capabilities
  • Architecture deep-dive
  • Installing and configuring CDAP
  • CDAP Application overview
  • Operational aspects of a CDAP application
Download course outline

Contact Sales to schedule training with Cask

CDAP Subscriptions

The Cask team is here to help make your Hadoop projects successful. From development through production, CDAP subscriptions get you direct support from the engineers who build CDAP.

Learn more about our team
CDAP Community CDAP Enterprise
Based on a 100% open source CDAP under Apache 2.0 license
Pricing model Free/Unlimited use Annual subscription based on cluster size and CDAP server count
Includes datasets, programs, runtime services, tools and console
Included support Community support 24x7x365 Web Portal Support E-mail and Phone Support
Response time for Severity 1 issues 1 Hour
Diagnostic support for Hadoop environment
Support for Tigon and Coopr included
Validated for production with added stress, performance, and compatibility testing
Certification with Hadoop distributions
Installation and updates done by or with Cask
Hot-fixes for security issues and bug fixes
Legal assurance

Contact Sales for more information.