Cask Data Application Platform
The Cask Data Application Platform (CDAP) is an open source integrated platform for developers and organizations to build, deploy, and manage data applications.
- Streams for data ingestion
- Supports Kafka, Flume, REST, and custom implemented protocols
- Time-stamped, ordered and horizontally scalable
- Reusable libraries for common Big Data access patterns
- Secondary indexes, Time Series, Key-Value, Objects, Geospatial, OLAP Cube and more
- Libraries expose each pattern as RPCs, Batch Scans, and SQL Tables
- Data available to multiple applications and different paradigms
- Unified batch and real-time processing with the same data used concurrently by MapReduce, Hive, Spark, Flows and more
- Expose data as REST services to quickly enable data as a service
- Simplify data ingestion and Extract Transform Load (ETL) to accelerate time to value
- Maximize value of data by making it easy to find and easy to explore through multiple query methods
- Protect the data through security, audit, lineage, and reporting
- Framework level guarantees
- Integrated transactions mean applications aren’t required to be idempotent
- Ingestion capabilities and processing engines provide partitioning, ordering and exactly-once execution
- Full development lifecycle and production deployment
- Portable and scalable from laptop to cluster with support for testing and continuous integration
- Logging, metrics, security, and management with low developer overhead
- Standardization of applications across programming paradigms
- Take advantage of Spark, Cascading, Hive, etc. and their User APIs without worrying about the details of how to integrate with each system
- Real-time and batch applications can be packaged, deployed, and managed together.
- Developers can build a broader range of apps focusing on business logic, not writing integration code or building core system services
- Speeds up time from development to testing to production to deployment
- Take advantage of new technology with less need for training and expertise
Extract Transform Load (ETL)
ETL is often a tedious and complex task, but it is a critical first step for organizations seeking to gain value from their data. CDAP can help, from day one to data lake.
Unified real-time and batch processing
Many Big Data solutions demand that insights from retrospective data be applied to real-time streams of data, but these two systems are often separate. CDAP enables developers to unify batch and real-time to achieve better business results.