Interactive Application for Building, Running and Managing Data Pipelines for Enterprise Data Lakes.
Ingest data in minutes from anywhere and any type without writing code. Prepare, cleanse, and enrich using code-free data wrangler and built-in transformation plugins. Blend data from traditional RDBMS to Data Warehouse to Hadoop.
Perform step-by-step aggregation and analytics in Batch or Realtime. Leverage plugins that use state-of-art Spark ML for building models and scoring models in a unified environment.
Use REST APIs or CLI tools for automating deployment and management of pipelines in different environments. Use built-in enterprise scheduler to schedule pipelines to run at periodic intervals, aggregate pipeline logs and metrics, and compare different runs of pipelines for diagnosing problems.
Deploy pipelines to be executed as MapReduce or Spark or Spark Streaming in case of real-time. Catalog all of the datasets and metadata to support data governance. Secure your data with fine-grained access control, monitor and track user activities through audit logs.
Cask Hydrator is a code-free visual application for building complex data pipelines and managing them on your Data Lake. With Cask Hydrator, you can ingest data from varied sources, ingest CSV, XML, Excel, etc., cleanse, normalize, and transform data, build machine learning models on-fly, perform aggregations, run custom scripts, and more.
Open Source and Extensible
Cask Hydrator is 100% open source and highly extensible.
Batch and Real-time on Spark
Cask Hydrator offers support for MapReduce, Spark, Spark Streaming, and Tigon Flows.
Connects to Anything
Cask Hydrator integrates with existing enterprise solutions for security, MDM, and BI, protecting past investments by the business in these solutions.
Features & Benefits
Accelerate Time to Value
Rapidly deliver reliable and operational Data Lakes and production Data Applications faster and better. Extensible libraries and components promote reuse and further accelerate the pace of innovation.
Broaden the user base of your big data platform with a radically simplified developer experience and code-free Extensions for non-developers. Reusable libraries can be assembled and run as data pipelines through drag-and-drop interfaces.
Automatic tracking of all audits and data lineage with discovery and search. Integrate into existing security and governance systems with authentication, authorization, and audit built-in automatically.
CDAP accelerates time to value from Hadoop through standardized APIs, configurable templates, and visual interfaces, and it increases efficiencies through reusable and portable components. CDAP removes barriers to innovation as an extensible and future-proof platform that provides consistency across environments and easily integrates with existing MDM, BI, and security solutions.Learn More
Cask Tracker, powered by CDAP, is a data management application that provides data governance capabilities for your Data Lake. With Cask Tracker, you can create datasets based on technical, business, or operational metadata. You can also annotate and manage ontology of tags, understand the usage of your datasets on the cluster, use lineage to debug data issues, and more.Learn More
Want to see Hydrator in action? Click the button to request a demo >>