Cask Data Application Platform
Virtualization for Hadoop data and apps.
CDAP is an open source application development platform for the Hadoop ecosystem that provides developers with data and application virtualization to accelerate application development, address a broader range of real-time and batch use cases, and deploy applications into production while satisfying enterprise requirements.
Logical representations of physical data as CDAP Datasets within the CDAP Runtime Environment
- Streams for data ingestion
- Supports Kafka, Flume, REST, and custom implemented protocols
- Time-stamped, ordered and horizontally scalable
- Reusable libraries for common Big Data access patterns
- Secondary indexes, Time Series, Key-Value, Objects, Geospatial, OLAP Cube and more
- Libraries expose each pattern as RPCs, Batch Scans, and SQL Tables
- Data available to multiple applications and different paradigms
- Unified batch and real-time processing with the same data used concurrently by MapReduce, Hive, Spark, Flows and more
- Expose data as REST services to quickly enable data as a service
- Simplify data ingestion and Extract Transform Load (ETL) to accelerate time to value
- Maximize value of data by making it easy to find and easy to explore through multiple query methods
- Protect the data through security, audit, lineage, and reporting
Applications deployed as CDAP Containers within the CDAP Runtime Environment
- Framework level guarantees
- Integrated transactions mean applications aren’t required to be idempotent
- Ingestion capabilities and processing engines provide partitioning, ordering and exactly-once execution
- Full development lifecycle and production deployment
- Portable and scalable from laptop to cluster with support for testing and continuous integration
- Logging, metrics, security, and management with low developer overhead
- Standardization of applications across programming paradigms
- Take advantage of Spark, Cascading, Hive, etc. and their User APIs without worrying about the details of how to integrate with each system
- Real-time and batch applications can be packaged, deployed, and managed together.
- Developers can build a broader range of apps focusing on business logic, not writing integration code or building core system services
- Speeds up time from development to testing to production to deployment
- Take advantage of new technology with less need for training and expertise
Provides a single-point of access for data, apps, service, and management APIs with integrated discovery, load balancing and horizontal scalability
Enables ACID properties data operations from within any program container, real-time and batch
Includes services for apps and data like security, discovery, and management throughout the app and data lifecycles
Extract Transform Load (ETL)
ETL is often a tedious and complex task, but it is a critical first step for organizations seeking to gain value from their data. CDAP can help, from day one to data lake.
Unified real-time and batch processing
Many Big Data solutions demand that insights from retrospective data be applied to real-time streams of data, but these two systems are often separate. CDAP enables developers to unify batch and real-time to achieve better business results.