February 24, 2022 Meeting

Ray, Docker, and Kubernetes

This month, we'll have a talk from Jules Damji about Ray, the open source platform for distributed programming. We will also have a lightning talk from Tyler Suard about Docker and Kubernetes. Come join us!

Lightning talk: Docker and Kubernetes: Getting Doom to Run on Your Friends Computer in 1993

The absolute simplest approach to Docker and Kubernetes ever, with an example from the flyest time in history: the 90’s!

Speaker Bio: Tyler Suard

Tyler Suard is a software engineer at Facebook. He likes cats, his girlfriend, and inventing things.

Main Talk: Scaling AI Workloads with the Ray Ecosystem

Today, AI applications are becoming pervasive across all sectors of our industry. Driven by a few fundamental trends, there is no indication of slowing down. In fact, the trend continues rapidly, making distributed computing at scale a norm and necessity.

But distributed computing is not easy. It has its challenges. Building distributed applications today requires tons of expertise. For many developers, it is out of reach. Current solutions to these challenges have their shortcomings and tradeoff.

Ray aims to address these shortcomings. As a general-purpose distributed computing framework, it makes programming a cluster of machines as easy as programming a laptop, thereby enabling many more developers and practitioners to take advantage of the advances in cloud computing and scale their machine learning workloads to solve harder problems, without needing to be experts in distributed systems. Besides a core general-purpose distributed-compute system, Ray encompasses a collection of state-of-the-art native libraries targeting scalable machine learning. These include libraries for hyperparameter tuning, distributed training, reinforcement learning, model serving, and last-mile ML data pre-processing and ingestion for model training.

This talk will introduce Ray’s overview; survey its ecosystem of both native and integrated ML libraries; and discuss key applications and developments in the Ray ecosystem, drawing upon lessons from discussions with practitioners over the years of developing Ray with the community—and at Anyscale. In particular, we will demonstrate how you can easily scale three common ML workloads, from your laptop to the cluster, with Ray’s native libraries: training, hyperparameter tuning and optimization (HPO), and large-scale batch inference.

Using the popular XGBoost for classification, we will show how you can scale model training, hyperparameter tuning, and inference—from a laptop or single node to a Ray cluster, with tangible performance difference when using Ray.

The takeaways from this talk are :

Why distributed computing has become the norm and necessity, not an exception
Learn Ray’s architecture, core concepts, and programming primitives
Understand Ray’s ecosystem of scalable ML libraries
Easily extend or transition your laptop to a Ray cluster
Scale three ML workloads using Ray’s native libraries:
Training on a single node vs. Ray cluster, using XGBoost with/without Ray
Tuning HPO using XGBoost with Ray and Ray Tune
Inferencing at scale, using XGBoost with/without Ray

Speaker Bio: Jules Damji

Jules S. Damji is a lead developer advocate at Anyscale Inc, an MLflow contributor, and co-author of Learning Spark, 2nd Edition. He is a hands-on developer with over 25 years of experience and has worked at leading companies, such as Sun Microsystems, Netscape, @Home, Opsware/LoudCloud, VeriSign, ProQuest, Hortonworks, and Databricks, building large-scale distributed systems. He holds a B.Sc and M.Sc in computer science (from Oregon State University and Cal State, Chico respectively), and an MA in political advocacy and communication (from Johns Hopkins University).

Code of Conduct

https://baypiggies.net/pages/code_of_conduct.html

Interactions online have less nuance than in-person interactions. Please be Open, Considerate and Respectful. Also, please refrain from discussing topics unrelated to the Python community or the technical content of the meeting.

Call for Talks

We are looking for speakers for 2022. Due to the COVID-19 situation, our meetings are currently online-only. We assume this will be the case for the at least the first quarter, but will reevaluate the situation each month.

We are looking for technical talks of interest to Python developers, either about the language and core libraries itself, popular libraries/platforms using Python (for example, Pandas andTensorFlow in Data Science, Flask and Django in web applications, Ansible in DevOPs), or other experiences using Python. See the list of past meetings at https://baypiggies.net/category/meetings.html to get a sense of topics. As our participants are using their personal time for these meetings, we request that talks are inclusive and not overly commercial, political, etc.

Talks can range in length from 5 minutes to 45 minutes (see below). Most of our online meetings last about 90 minutes and include a short talk and a longer talk, but this can vary from month to month.

You can apply for an online talk here.

This is a great opportunity to evangelize a project you love or to get practice with public speaking. We hope to hear from you!