Ankur_Jain.jpg
Vinoth Chandar, VP @ Apache Hudi, ASF

Vinoth Chandar is the original creator & VP of the Apache Hudi project, which has changed the face of data lake architectures over the past few years. Vinoth has a keen interest in unified data storage/processing architectures. He drove various efforts around stream processing/Kafka at Confluent. In the past, Vinoth has built large-scale, mission-critical infrastructure systems at companies like Uber and LinkedIn.

Change Data Capture to Data Lakes Using Apache Pulsar and Apache Hudi

Wed Jun 16, 11:30 AM - 12:15 PM, PT

Apache Hudi is an open data lake platform, designed around the streaming data model. At its core, Hudi provides a transactions, upserts, deletes on data lake storage, while also enabling CDC capabilities. Hudi also provides a coherent set of table services, which can clean, compact, cluster and optimize storage layout for better query performance. Finally, Hudi's data services provide out-of-box support for streaming data from event systems into lake storage in near real-time.

In this talk, we will walk through an end-end use case for change data capture from a relational database, starting with capture changes using the Pulsar CDC connector and then demonstrate how you can use the Hudi deltastreamer tool to then apply these changes into a table on the data lake. We will discuss various tips to operationalizing and monitoring such pipelines. We will conclude with some guidance on future integrations between the two projects including a native Hudi/Pulsar connector and Hudi tiered storage.