Open source projects from Tumblr
Projects we'd love to support
We store a lot of instrumentation data about machines, software, networking gear, etc., into OpenTSDB. We want better tools for navigating this data: building a platform on which people can create awesome dashboards easily. and tools for jumping off the dashboards and into the sea of collected data to produce ad hoc views.
All the data we collect is really useful, but doing long-run analysis consumes a lot of data. We want really awesome, accurate, useful, easy downsampling in OpenTSDB — maybe dropping down to one-hour samples and one-day samples. The end goal is to make OpenTSDB as useful for long-run ad hoc analysis as it is for short-run analysis.
This project will be mostly working on the OpenTSDB Java code with one of the project's core developers.
Projects destined for open sourcing
Selected presentations from conferences or events
At Tumblr, we have adopted Scala as our backend language of choice and found it to be an excellent fit due to the expressiveness of the language and the runtime performance on the JVM. Although Scala is often cited as being an excellent target platform for highly concurrent applications, in practice it can be easy to shoot yourself in the foot. In this talk I will review code found in the wild here at Tumblr and discuss the implications in a server context.
Blake Matheny. Feb, 8 2013
Huge, rapidly-growing MySQL architectures can be quite challenging to maintain. In order to stay sane while juggling hundreds of database servers and tens of billions of rows, sophisticated automation becomes a necessity. To solve this problem at Tumblr, we created Jetpants, a multi-purpose toolchain for managing giant MySQL topologies. This talk from Percona Live NYC 2012 outlines how we implemented fast, resilient automation for many complex operational tasks.
Evan Elias. Oct 2, 2012
System Administration is both an art and a science. This talk from Velocity 2012 covers a set of principals and techniques for getting work done efficiently and effectively. The talk also covers a recent Tumblr automation project as an example of these principles and techniques applied.
Joshua Hoffman. June, 27 2012
In this talk I'll discuss the architectural underpinnings of Tumblr's distributed systems infrastructure. I'll focus primarily on Motherboy, an eventually consistent inbox style storage system. Additionally, I'll discuss some of our findings in evaluating various concurrency models as well as how our choice of Scala as our back-end language of choice played into those decisions.
Blake Matheny. June, 19 2012
In this talk I’ll go into detail about Tumblr’s experience developing Motherboy, an eventually consistent inbox style storage system built around HBase. The SLA, write concurrency, data volume, and failure modes for this application created a number of challenges in developing a solution.
Bennett Andrews. May, 22 2012
Panel for the NY scala meetup
Blake Matheny. Feb, 13 2012
Sharding a huge, very active MySQL dataset can mean running into more gotchas than obvious solutions. At this session, we'll outline how, when, and why we sharded Tumblr's primary data set. We'll also cover the application logic and automation tools that we built to initially shard, split, and move billions of rows of MySQL data for millions of blogs, and the mistakes we made along the way.
Evan Elias. Nov, 8 2011
NY scala meetup presentation.
Blake Matheny. Sep, 27 2011