Data Science Night

This month, we will have two exciting talks on data science / engineering topics.

Talks

Talk 1: Green circles for success: hands-off ETL using Airflow with unit tests

Speaker: Sam Zeitlin

Abstract

My team uses Airflow to run daily and hourly jobs that parse logs and transfer data into AWS Redshift for easier access. I’ll talk about why Airflow is better than cron jobs: it’s python, and keeps track of which tasks succeeded and which failed, so you don’t have to restart from scratch if anything goes wrong.

I’ll also talk about what can go wrong with Airflow jobs, and how I came up with reusable templates for regression tests to support creating new jobs, and upgrading from an older version to a newer version.

Author Bio

I’m a former research scientist, self-taught Pythonista, and my current job title is Product Hacker at Oath, Inc. My team does prototyping for our DevOps, Engineering, and Product teams. What that usually means is a mix of data engineering (we often build our own data pipelines), data science, and product management.

Talk 2: Practical Packaging for Machine Learning Solutions

Speaker: Steven Cutting

Abstract

In this talk we will cover the ways in which we can use Python's packaging tools and best practices in order to make sharing our machine learning solutions easier. We will cover the basics concepts and tools for Python packaging. Then we will introduce a few suggested schemes for how to package a machine learning project, followed by some tips and best practices. We will wrap up with a few additional concerns related to creating quality projects and packages.

Steve has made the slides from the talk and an associated blog post available on github here.

Author Bio

I have worked as a consultant for the past 3 years. In some projects, I created machine learning solutions that needed to be incorporated into production applications that either I or someone else had written. As a result I have developed experience packaging machine learning projects to make them easier to share and deploy.

Location

We will be in a different building from last month, just down the street:

LinkedIn, Unify Meeting Room 950 W. Maude Ave, Sunnyvale.

Meeting Details

Meeting Schedule:

  • 7:00 pm Food and Announcements
  • 7:15 pm Talks start
  • 8:30 pm Networking
  • 9:00 pm Event ends