Interactive Application for Building, Running and Managing Data Pipelines for Enterprise Data Lakes.

Integrate, Prepare
and Blend

Ingest data in minutes from anywhere and any type without writing code. Prepare, cleanse, and enrich using code-free data wrangler and built-in transformation plugins. Blend data from traditional RDBMS to Data Warehouse to Hadoop.

Aggregate
and Analyze

Perform step-by-step aggregation and analytics in Batch or Realtime. Leverage plugins that use state-of-art Spark ML for building models and scoring models in a unified environment.

Automate and
Operationalize

Use REST APIs or CLI tools for automating deployment and management of pipelines in different environments. Use built-in enterprise scheduler to schedule pipelines to run at periodic intervals, aggregate pipeline logs and metrics, and compare different runs of pipelines for diagnosing problems.

Deploy, Audit
and Govern

Deploy pipelines to be executed as MapReduce or Spark or Spark Streaming in case of real-time. Catalog all of the datasets and metadata to support data governance. Secure your data with fine-grained access control, monitor and track user activities through audit logs.

Cask Data Application Platform CDAP 4 now in Preview! Includes a beta of the new Cask Wrangler as part of Cask Hydrator.

Cask Hydrator is a code-free visual application for building complex data pipelines and managing them on your Data Lake. With Cask Hydrator, you can ingest data from varied sources, ingest CSV, XML, Excel, etc., cleanse, normalize, and transform data, build machine learning models on-fly, perform aggregations, run custom scripts, and more.

Open Source and Extensible

Cask Hydrator is 100% open source and highly extensible.

Batch and Real-time on Spark

Cask Hydrator offers support for MapReduce, Spark, Spark Streaming, and Tigon Flows.

Connects to Anything

Cask Hydrator integrates with existing enterprise solutions for security, MDM, and BI, protecting past investments by the business in these solutions.

Features & Benefits

Accelerate Time to Value

Rapidly deliver reliable and operational Data Lakes and production Data Applications faster and better. Extensible libraries and components promote reuse and further accelerate the pace of innovation.

Provide Self-Service

Broaden the user base of your big data platform with a radically simplified developer experience and code-free Extensions for non-developers. Reusable libraries can be assembled and run as data pipelines through drag-and-drop interfaces.

Enable Governance

Automatic tracking of all audits and data lineage with discovery and search. Integrate into existing security and governance systems with authentication, authorization, and audit built-in automatically.

CDAP accelerates time to value from Hadoop through standardized APIs, configurable templates, and visual interfaces, and it increases efficiencies through reusable and portable components. CDAP removes barriers to innovation as an extensible and future-proof platform that provides consistency across environments and easily integrates with existing MDM, BI, and security solutions.

Learn More

Cask Tracker, powered by CDAP, is a data management application that provides data governance capabilities for your Data Lake. With Cask Tracker, you can create datasets based on technical, business, or operational metadata. You can also annotate and manage ontology of tags, understand the usage of your datasets on the cluster, use lineage to debug data issues, and more.

Learn More

Want to see Hydrator in action? Click the button to request a demo >>