Simple Data Engineering in python 3.5+ with Bonobo

Romain Dorgueil (@rdorgueil)

Developer, sysadmin, technical team builder, founder of two companies), advisor.

Currently helping start-ups to achieve more with less in our acceleration programs in Paris, and in charge of our product development activities.

Sometimes, I play go and make music, but not at the same time.

Abstract

Tags: Python Business Data-Engineering ETL Simple Bonobo

Simple is better than complex, and that's True for data pipelines, too.

Bonobo is a python 3.5+ tool used to write and monitor data pipelines. It’s plain, simple, modern, and atomic python.

This talk is a practical encounter, from zero to a complete data pipeline.

Spoiler : no «big data» here.

Description

Simple is better than complex, right? That’s true for data pipelines too.

For the last 5 years, I hacked together extract-transform-load (ETL) processes in various different positions (ETL is just a fancy term for «bunch of things that take data somewhere and put it elsewhere, eventually transformed»).

I did it as a founder, as a consultant, as a technical co-founder, for some side projects, big corporates and small side projects.

In each case, I felt frustrated with the tools available, and in some serious cases, I had to hack things myself to get the job done. Bonobo is the repackaging of my past experiences for python 3.5+, and grasping the basics should not take more than the length of the presentation.

Outline (subject to small changes, for the greater good) :

  • INTRO : The ETL market, why a new tool, what it is, what it is not.
  • Basics and concepts.
  • Simple example.
  • Complete data pipeline example, using SQL, RDF and a small Django frontend.
  • OUTRO : A glimpse at the future.
  • Q&A

Bonobo is the glue you need to tie together regular functions in a transformation graph (think unix pipes). Execution strategies are abstracted so you can focus on the real operations. As a result, you can engineer simple and testable systems, using the same good computer development practices as you use in .