The first unified integration platform for big data that cuts down the time to production for data applications and data lakes by 80%.
“The team at Cask do some fantastic work in product and embrace an open source strategy that’s just what developers need to help them build big data applications on Cloudera.”– Mike Olson, Co-founder and Chief Strategy Officer, Cloudera
CDAP provides a data ingestion service that simplifies and automates the difficult and time consuming task of building, running, and managing data pipelines. The interactive studio interface allows you to drag-and-drop various sources, transforms, analytics, sinks, and actions.
- Drag & drop graphical studio with various sources, transforms, analytics including machine learning algorithms, sinks, and actions
- Unified interface to preview, debug, deploy, run and manage data pipelines
- Separation of logical data pipeline vs execution environment – making it easy to run pipeline as MapReduce or Spark
- Triggered pipelines for chaining and triggering pipelines based on events
- Extensible pluggable architecture for integrating with new sources, sinks and transforms for processing
CDAP provides an easy and interactive way to visualize, transform, and cleanse data. It helps data scientists and data engineers derive new schemas and operationalize the data preparation with a few clicks.
- Interactive, self-service data prep to work with messy data
- Easily connect to external data (RDBMS, Kafka, etc.)
- Apply transformations using merge, delete, and substring operations
- Quickly visualize pattern both within and across columns
- User-Defined Directives (UDD) for developing, deploying and using custom data processing directives
- Operationalize effortlessly into production pipeline
High-level, Easy-to-Use APIs and Reusable Libraries
CDAP provides an integrated application development framework for Hadoop. It provides standardization and deep integration with diverse Hadoop technologies with easy-to-use APIs to build, deploy and manage complex data analytics applications in the cloud or on-premises.
- High-level Java APIs help maximize developer productivity, reducing the time to deliver big data solutions
- Build, test, and run distributed applications across their entire lifecycle
- Open, standards-based architecture, and REST APIs to integrate and extend existing infrastructure
- Automate deployment and monitoring of solutions in Continuous Integration/Continuous Deployment using comprehensive DevOps tools
Security & Operations
Robust Security and a Portable Production Runtime Environment
- Sophisticated security, authentication, authorization, and encryption for compliance needs and to mitigate risk
- Deep enterprise integrations for security and authentication, such as LDAP, Active Directory, Kerberos, JASPI,Apache Sentry and Apache Ranger
- Isolation of data and operations from users and push down of access control to lower layers
- Robust production runtime environment for easy, secure deployment and management on Hadoop
- High availability, disaster recovery and replication to support production business-critical usage
Abstraction, Standardization and Future-Proofing
CDAP provides a container architecture for your data and applications on Hadoop. High-level abstractions and deep integrations with diverse Hadoop technologies dramatically increase productivity and quality in order to accelerate development and reduce time-to-production to get your Hadoop projects to market faster.
- 100% open source, 100% Hadoop native
- Flexible, multi-tenant deployment capabilities to accommodate shared data and application infrastructure
- Packaging of data and applications simplifies the full production lifecycle
- Encapsulation of data and programs stored and running in systems like HDFS, HBase, Spark, and MapReduce enables portability of big data solutions on-premises, in the cloud and for hybrid environments
- Standardization of data in varied storage engines and compute on varied processing engines promotes reusability and simplified security, operations, and governance across projects and environments
- Maximum flexibility and reduced risk with insulation from changes in the fast evolving big data ecosystem