An enterprise data lake provides a centralized repository for storing vast amounts of data from various sources and formats It includes capabilities that make it easy for data scientists, developers and analysts to process data to gain insights. It includes a reliable, repeatable and fully operational data management system:ingestion, storage, transformation, cleansing, tracking, and data distribution. The data management system must capture data flows in various ways and support varied data types and formats - structured, unstructured and semi-structured. Your business will evolve and so will your data. The data lake should seamlessly handle the evolution.
A major IT requirement for building an operational data lake is to eliminate data silos and organizational boundaries by enabling access to data for different types of users and business units. The key challenges are acquiring data from varied sources and formats, security, establishing SLAs for ingestion, continually ensuring the quality and proper transport of the data, and creating appropriate retention policies. The separation of concerns (teams whose primary focus is ingesting data vs LOBs) is an important consideration that must be accounted for while guaranteeing a well-governed and secure self-service environment for bringing data into the lake or processing data once it has landed. A data lake is the backbone for those organizations that offer data as a service. This can be surfacing data within the confines of the organization to internal customers, or as a value added service externally using REST API's, BI tools, and Visualization tools.
CDAP is a unified platform which integrates application management, data integration, security and governance, and a self-service environment, speeding up the process for building and running a data lake. CDAP provides a broad set of ecosystem integrations for runtime, transport, and storage, including MapReduce, Spark, Spark Streaming, Tigon, Kafka, and HBase.
CDAP provides a comprehensive collection of pre-built building blocks to support data manipulation, data storage, and key insight extraction to build smarter end-to-end solutions with maximum flexibility in the fast evolving big data ecosystem. This mitigates risk by empowering users to quickly go from Hadoop ideation to deployment using our CLI or sleek visual interface, reducing cost and delays.
Rapid Time to Value
Cask Hydrator enables developers to get started quickly with built-in data ingestion, exploration, and transformation capabilities available through a rich user interface and interactive shell.
Ensures Data Consistency
CDAP makes all data in Hadoop available for access in real-time and batch without the need to write code, manage metadata, or copy data. Advanced functionality for scale-out, high-throughput, real-time ingestion and transactional event processing while maintaining data consistency enables new use cases.
Want to see CDAP in action? Request a personalized demo. >>
Case Study: Data Lake
Case Study: Data Cleansing and Validation
Case Study: Data Discovery for Data Science
Case Study: Security Analytics and Reporting
Case Study: XML Ingest and Transform
Case Study: Security Reporting