Uber Open Source: Catching Up with Fritz Obermeyer and Noah Goodman from the Pyro Team

Over the past several years, artificial intelligence (AI) has become an integral component of many enterprise tech stacks, facilitating faster, more efficient solutions for everything from self-driving vehicles to automated messaging platforms. On the AI spectrum, deep probabilistic programming, a discipline that combines deep neural networks with probabilistic models, has emerged as a valuable means of reasoning about large-scale datasets under uncertain conditions.

Given the novelty and complexity of AI, open source is fundamental to Uber’s footprint in this arena by enabling knowledge sharing and collaboration on an unprecedented scale. Open sourced in November 2017, Pyro was developed by Uber AI on top of PyTorch as a deep probabilistic programming language that facilitates more accessible exploration of AI and Bayesian models.

Joining Horovod as an incubator project of the Linux Foundation’s Deep Learning Foundation, Pyro accelerates research and applications of AI techniques by making AI tools more flexible, open, and easy-to-use.

To better understand the potential of this framework, we spoke with Fritz Obermeyer and Noah Goodman, Pyro project co-leads and research scientists with Uber AI, about the inspiration behind the project and how it plays into our commitment to growing open source at Uber:

How did you get interested in engineering and AI? What drew you to Uber and Uber AI?  

Uber AI’s Fritz Obermeyer.

Fritz: My background is in mathematics, and from there, I started digging into probability and statistics. Actually, a lot of the Pyro team comes from a mathematics or physics background. After pursuing my PhD in Pure and Applied Logic, I went into industry as a probabilistic machine learning and AI researcher. Eventually, I came across this Uber position as a senior research engineer, which was really perfect to balance my two backgrounds in probabilistic machine learning and programming languages. The Pyro project is super fun to work on.

Noah: My story is much weirder and more meandering than Fritz’s. I actually started as a mathematician in graduate school, but that was way too esoteric for me. After a few false starts, I ended up as a psychologist. My Stanford appointment [as a professor of Psychology and Computer Science] is in the intersection of psychology and more mathematical stuff, which is mostly artificial intelligence so I ended up doing that.

I came to Uber by way of Geometric Intelligence, an AI startup that Uber eventually acquired to build Uber AI. Coming to Uber gave us the opportunity to really grow out the engineering, especially starting probabilistic programming in a way that we hadn’t been able to do when we were a small startup.

What makes working with Uber AI unique compared to your previous research experiences, whether in academia or industry?

Fritz: I have to say that the uniqueness of Uber and Uber AI is two-fold. One big difference for me was that my previous experience was in academia and then at small companies, mostly startups. The resources that Uber can offer to get engineering done is much larger, making it possible to build really good software that you can use for lots of different things, as opposed to the academic style where you make one little piece of software that lets you get a single project done and then move on to the next one. Of course, also being Uber, there’s a feeling of fast-paced movement and the real world lurking just over your shoulder, which makes things interesting.

Uber AI’s Noah Goodman.

Noah: What makes Uber AI unique for me is also two things. One is the concentration of really great people in probabilistic machine learning. In the past, I’ve worked with maybe one or two people who are experts in probabilistic machine learning, but we have such a large group at Uber that we can address really complex problems and do things that we couldn’t do elsewhere. The other thing is the Uber team’s ability to create great open source software.

Fritz: I’ve worked on a lot of closed source machine learning systems, and it’s really special that we got to open source Pyro and work with academic collaborators to make the framework better.

When did you start using open source technology for your research?

Noah: I’ve always used open source software. I used Linux back when I was a grad student in the Iron Age, when I still had to build my own kernel every time I changed anything. When I first started doing AI stuff, scientists mostly didn’t use open source software for AI. We started out using MATLAB, and I remember distinctly the moment when I realized we didn’t have to do that anymore, that we could use open source software that you could change and see how it works. It was awesome–it was a huge breath of fresh air.

What inspired your team to create and open source Pyro?

Noah: The inspiration for Pyro was realizing that there was an unmet need in the machine learning open source community for probabilistic programming. When we decided to build Pyro in 2017, probabilistic programs had been around for awhile and the probabilistic programming languages had been improving gradually, but most of them were not written in Python, which was the language that almost all machine learning practitioners were using at that point.

Since this community was using Python, there were these other machine learning libraries and tools that had been open sourced, like PyTorch. We figured that if we could bring those tools to bear on probabilistic programming problems, we‘d be able to build a bunch of new, interesting models. So, we said, “Okay, well, nobody else is going to do this so why don’t we step in and integrate these tools together into a new deep probabilistic programming language?” Then off we went.

Fritz: One of the main reasons we decided to open source Pyro was so that we could share and transfer code between different language libraries, enabling us to push the boundaries of what was possible with probabilistic programming. Making Pyro open source allowed us to more flexibly shift those boundaries. We’ve moved a lot of our code from Pyro to upstream dependencies, allowing us to keep Pyro lean and focused.

What does Pyro do and how does the language add value to the AI community?

Frtiz: I’ve been working with probabilistic modeling frameworks for a long time. It’s often taken months to write each new model and then debug it and fix the numerics to make sure it works correctly. Pyro makes it easy to write these models and explore the space of them really quickly, and to do so with components that have already been well-tested.

Since Pyro is built on a deep learning framework, I can use all of my more familiar Bayesian modeling techniques but then stick in neural networks wherever I need them. That makes it really powerful to work on models with many more parameters or on data sets that are much larger.

It’s a really exciting moment in machine learning because a bunch of different paradigms are converging and we’re seeing how to put them together without having to worry about bugs. Probabilities, programs, and deep learning are the three ingredients in Pyro. There’s more and more research from the academic community on how to put these things together, but the combination is really complicated and super-prone to being implemented wrong.

Uber’s Pyro engineering team spends a significant amount of time writing tests, discussing testing strategies, and really thoroughly testing things in a lot of different ways. If every person is trying to put these things together on their own in a new piece of software, almost every piece of software is going to be full of bugs. Doing it in Pyro means that the Pyro system itself has been really extensively tested, and so when you throw your model together, it’s much more likely to be far less buggy than with bespoke AI software.

Which companies and academic organizations currently use Pyro?

Noah: Lots of universities use Pyro for their AI work. To start, MIT, Harvard, Stanford, and Northeastern. The Broad Institute also uses it, and there are a lot of smaller research groups. We expect that a lot of probabilistic programming research coming out in 2019 will implement their models in Pyro.

Fritz: It’s tough to track Pyro usage because we find that many papers don’t even cite Pyro because they don’t cite Python. They think of Pyro as such a fundamental technology that they can just build on it without citing us. We only find out sometimes months afterwards when we look at their code and see, “Oh, they used Pyro to implement their models.” It’s been really difficult to measure users that way; which is great to know that people can use it without thinking about it.

Noah: One of the challenges of open sourcing software in general is that its very openness means that you don’t always know where it goes. Sometimes, you’ll release a piece of software and it’ll be reflected back to you a year or two later as something totally new and different. It’s exciting but also frustrating for getting credit for the work that was done initially.

How does Pyro differentiate itself from other probabilistic programming solutions?

Fritz: When it was released, Pyro was the first open source, universal deep probabilistic programming language, meaning that it could both possess dynamic model structure and incorporate deep neural networks. Another unique feature of Pyro is our effect handling library. It’s a design choice that has allowed us to implement a lot of tricky inference algorithms with very little effort to correct them.

Noah: For the technical aficionados, there’s a bunch of things that distinguish Pyro in terms of the platform that it’s integrated into, a combination of PyTorch and Python. The universality of these languages in terms of the kinds of models that can be expressed, as well as a bunch of nitty gritty features make Pyro easier to use and extend than perhaps other languages.

I think another differentiating factor is that there are not very many other probabilistic programming languages that are as well-documented as Pyro. The ones that are tend to be much older and a little bit stale, including some of my previous projects. Pyro has really great tutorials that I think account for why a lot of people have tried to use it. If it was just a powerful language without any of those tutorials, I don’t think it would have gotten the same amount of interest.

How do we use Pyro at Uber?

Fritz: We currently use Pyro for sensor fusion when building mapping solutions for our self-driving vehicles, some time series forecasting, and ad spend allocation models. We also leverage Pyro for research into other areas like fault attribution in data centers and anomaly detection.

How will Pyro’s donation to the LF Deep Learning Foundation add value to Pyro’s open source community?

Fritz: By contributing Pyro to the Deep Learning Foundation, it’ll be easier for academic contributors to give back to a non-corporate owned software project that is a clearly open software project.

Noah: There’ll be fewer hurdles even for corporate partners to get the okay to go ahead and contribute back. I think having these open source projects moved into neutral, community-governed ownership is a really important step for making all of the stakeholders comfortable with their contributions.

What has been most gratifying about open sourcing Pyro?

Fritz: Uber AI only contributes a small fraction of the code. Even the design processes are really open. We have a large number of people working on Pyro outside of the company. Pyro has been more popular than I expected. Initially, I was surprised by the quality and significance of contributions by external contributors to Pyro, both in their precise code review and the ambition of the features that they’ve implemented on their own.

Noah: Now that Pyro is open and growing and moving to the community, I’m excited to see what people find to do with it that I never thought of. The coolest thing about open source is that people find new uses for software that you write. I think Pyro is particularly fertile for people repurposing it, coming up with things that we never thought of.

What conditions make for a successful open source project?

FO: I think it’s important to have your company’s support for your open source project as testing is really difficult. I’ve open sourced a lot of things on my own in the past and just haven’t had the resources to thoroughly test and document the project. I think one thing that’s really nice about Pyro is that we have the resources at Uber to put a lot of effort into testing and making sure things are correct and well-documented. At the same time, we can be open and accept bug fixes, new features, document changes, and just feature requests from the community or even big contributions of modules. That combination I think is really important for open source AI software.

NG: At a big picture-level, AI is having a really profound impact on humanity. The more that AI belongs to humanity as a whole, the more those impacts are going to be fair and positive. One of the things that we can do as AI researchers is try to make sure that the new techniques and implementations we come up with belong to humanity as a whole and not to a single entity that is going to benefit uniquely.

How can companies empower their engineers to take part in open source?

FO: Given the nature of maintaining an open source project, I would say that the more successful projects often emerge from companies that can provide the support necessary to nurture and grow the project’s community.

Noah: In my opinion, most top technology companies are embracing open source. In some sense, even our motivation for building Pyro was the fact that Facebook had open sourced PyTorch, which is a library we used for making Pyro. This showcases the beneficial cycle that happens when one company decides to open source something. One of the really nice things about working at Uber is the ability to open source our work in an environment that supports its future development.

 

The Pyro team is requesting additional contributions, documentation, and tutorials. Help us grow the software by contributing to its repo.

 

Comments