March 6, 2019 - 5:00 pm
March 6, 2019 - 6:00 pm
AddressOnline Webinar View map
IDEAS Online Free Webinar
IDEAS & Data Application Lab co-host this live webinar.
IDEAS is a global nonprofit organization that is dedicated to fostering the data engineering and data science ecosystems and broadening the adoption of their underlying technologies to accelerate the innovations data can bring to society. Our goal is to create a community to connect AI, Blockchain, and Data Science enthusiasts. All of the conferences that IDEAS host will demonstrate cutting-edge technology and feature a variety of AI, Blockchain, and Data Science experts covering topics including industry trends, real-world applications, open-source software, solutions-based case studies, and many others.
Bin Fan is the founding member of Alluxio, Inc. and the PMC member of Alluxio open source project. Prior to Alluxio, he worked for Google to build the next-generation storage infrastructure. Bin received his Ph.D. in Computer Science from Carnegie Mellon University on the design and implementation of distributed systems and algorithms including cuckoo filter, memC3 and etc.
The approach to managing the data has evolved starting from HDFS and now moving to newer approaches like cloud storage. With all the possible combinations of accessing data, data engineering has become increasingly complex, particularly in the hybrid and multi-cloud environments. More companies and users have realized the increasing importance to add a new data abstraction layer to the stack for flexibility and performance.
This is the fundamental problem Alluxio (www.alluxio.org) solves. Alluxio is an open-source virtual distributed file system provides a unified data access layer for big data and ML stack, fitting hybrid and multi-cloud environment. Alluxio enables distributed compute engines like Spark, Presto or Machine Learning frameworks like TensorFlow to transparently access data from different persistent storage systems (including HDFS, S3, Azure and etc) while actively leveraging in-memory cache to accelerate data access.
Originally from UC Berkeley AMPLab as a research project “Tachyon”, Alluxio has 900+ contributors and is used by 100+ companies worldwide with the largest production deployment over 1,000 nodes.
This presentation focuses on how Alluxio helps the big data and ML stack to be cloud-native. Bin will present the basic concepts as well as deep dive into architecture, data and metadata path, important features on the roadmap.
About Data Application Lab: