Last monday, I went to DotScale 2015 in the heart of Paris (living half a hour of Paris, it was a rather easy trip for me). DotScale touts itself as "The European Tech Conference on Scalability," and focuses mostly on distributed systems and scalability issues. Started three years ago following the success of the DotRB conference, this year DotScale attracted around 750 attendees, most of them from France.
Thanks to my unending and unwarranted optimism regarding extra- and intra-urban transportation in Paris, a topic so vast I could dedicate a entire book just to rant on it, I managed to arrive at the beautiful Théâtre de Paris around 10am, in time for the start of talks but just after breakfast. Greeted by a small group of students serving as staff, I quickly moved through the small booth area set up in the hall to enter the theatre serving as main room, where I met up with a group of colleagues who had saved a seat for me.
Locksmith, Ops Scaling and the Borg
The first talk was from Matt Bostock, Web operations engineer for gov.uk. The talk started with a discussion of his team's mission, and of the issues they faced coordinating upgrades and reboots in their cluster of servers. Bostock spoke about how his team began using locksmith, an etcd-backed reboot manager from CoreOS, to manage these issues in gov.uk's Ubuntu-based infrastructure.
Bostock outlined the safeguards and features that his team added to its integration script, such as rules to avoid reboots in case of an ongoing outage or policies around maintaining a lock for a group of machines rather than for a complete cluster. He quite interestingly finished by saying that rebooting blindly is not without trouble due to cases in which new kernels sometimes fail to boot.
It might be worthwhile to make locksmith available in EPEL, as I imagine that members of the enterprise Linux community might have similar requirements and strategies for dealing with reboots. However, I expect that I would look to Ansible, or any other generic orchestration system such as MCollective or Saltstack to deal with these issues.
Then the second talk was given by David Mytton, founder of Server Density. The title was "Scaling Ops," and was mostly a summary of on-call and intervention best practices, such as, "make sure there is clear documentation," "give enough time off after outages in the night," "make sure post-mortems are blameless," and so on. While it didn't expose anything new or revolutionary, I recommend watching the video of this talk once it becomes available on the Web, as it provides a good overview of the best practices in this domain in a single place.
Next up with Paul Dix, creator of InfluxDB, a time series database. Not surprisingly, his talk was centered around the time series topic, with discussion of how time series data are everywhere thanks to the Internet of Things, how most SQL engines are a bad fit for this sort of data, and what you need to shard this data efficiently on a cluster. While quite specialized and technical, the topic was well covered, with a good explaination of how the file structure of InfluxDB speeds row deletion and lessens the need rebalancing structures used to index its data. The Q&A after the talk mostly focused on the project's upcoming API rewrite and on how they want to make InfluxDB scale to millions of time series.
The last presenter before the break was John Wilkes, co-author of the Borg and Omega papers which outline the systems used by Google to orchestrate the company's container infrastructure. Wilkes' talk primarily recapped the material presented in these papers, covering the schema of the Borg architecture, and the way in which each container bundles its own metrics systems on top of a webserver. Wilkes spoke of the Site Reliability Engineer (SRE) culture at Google, where SREs maintain that it's OK to fail, and where they conduct exercises every year with big simulated outages to discover new weakness in the system.
The rest of the talk covered efficiency issues and means of packing interactive (or production) jobs with batch (dubbed non prod) jobs on the same infrastructure, how they measure usage, and how Borg, the scheduler, is just one small piece of Google's infrastructure. Unsurprisingly, Wilkes also spoke of Kubernetes, and of how various concepts from Borg map to the new project. The Q&A session was quite interesting, with Wilkes answering a question about when Google plans to use Kubernetes in production by stating that "we plan to make Borg use the same API as Kubernetes." Wilkes also reminded attendees that Borg was nothing new, and that Google just stitched together concepts that have been around for a long time.
After a call to sign the hacker pledge, the lunch break finally came and my years of taking the public transport came to the rescue, permitting me to smoothly and swiftly escape the crowded conference to quiet my rumbling stomach and return with plenty of time to visit the small number of stands conference and gossip with members of the Paris IT community.
Following lunch, conference returned to presentation mode with a series of lightning talks. The talks were not quite as short as usual, but they went quickly nonetheless. The first presenter, Mario Loriedo from Zenika, spoke about his efforts integrating Docker in Eclipse, giving a pointer to a website that was set up for this purpose.
Loriedo was followed by Samir Bessalah, who asked people to stop doing things from scratch, reminded us that the CAP theorem is called a theorem for a reason, and urged people to attend local meetup of Papers We Love to better stand on the shoulders of giants.
Then, Matthieu Nanterre spoke of Cassandra, and its new leader election system based on Paxos. He was followed by Dan Brown who spoke of the Conflict-free replicated data type, defined by Wikipedia as a type of specially-designed data structure used to achieve strong eventual consistency (SEC) and monotonicity (absence of rollbacks). He explained the conclusion of the CAP theorem, the trade-off between consistency and availability and how both latency and lack of consistency are bad for a system, and how you have to make a choice. He finished with a brief discussion of the different ways you can solve conflicts, either as last write win, or using semantic resolution.
Finally, Chris Sinjakli spoke of the topic of zero downtime postgresql schema migration, the need for using fast operations to modify table since Postgres locks the whole table to change it, and how to tweak lock_timeout and handle slow queries to avoid downtime in a production setup.
CAP, Docker, and Data
Then we started again the regular set of talks, this time with Neha Narula, who focused mostly on the topic of CAP theorem (a theme echoed often during the day), of consistency models, and of what exactly people mean by ACID in case of a database. During the QA, she reminded us that while we cannot avoid the limitation of the CAP theorem (ie, we have to pick 2 out of "consistency, availability and resilience to network partition"), there is way to get good performance and strong consistency for most of the time.
As usual for any vaguely tech-related conference held during the past two years, the next talk was on Docker, given by Ben Firshman, creator of Fig and product manager at Docker inc. He started by explaining the idea of data-center as a computer, how other products like IBM Bluemix and SmartDC use the Docker API, and on how using Docker permits developers to run tests remotely, using pipe and GNU Parallel to start anything as if it was a local process. Firshman finished by demonstrating Docker's python API and discussing how it permit developers to build more complicated architecture.
After another break, Simon Riggs spoke on PostgreSQL. The talk was rather general, giving some insights on deployments of the database and on how most SQL databases he encounters at customers sites are rather small ( 90% < 10G, 98% < 100G ). Riggs then showed off some new features of PostgreSQL, presenting the project as a multimodel database, and covering the work done to support different paradigms and to support load distribution. He also reminded attendees of the "yesql" movement, a clever pun on nosql meant to remind us that there is nothing wrong with SQL as such, and that it's mostly been the underlying architecture that was the cause by the nosql movement.
Simon was followed by Kyle Kingsbury to speak of Jepsen, a tool used to find concurrency issues in distributed systems, the equivalent of a fuzzer for such software. Kingsbury first explained the underlying architecture of Jepsen, which generates random writes from a few coordinated clients and tracks the order of writes until something breaks. He spoke of the bugs he found in Redis, Mongo and Riak in 2013, of how Kafka, Nuodb and Cassandra were also examined and found to be broken later the same year, and of how Zookeeper didn't have these issues.
Kingsbury outlined how, in 2014, he started to look at RabbitMQ, Etcd and Elastic Search and how vendors reacted to his work. Then he went on looking at Mongodb and Elastic Search in details, explaining their issues in case of network partitions, but reminding us that these projects have improved. Then he looked at Aeropiske, reviewing claims they make on their website and how some of them were not implemented at all, and dispelling them so much that I felt bad for the vendor. He finished the talk saying that people should carefully review the documentation before choosing a product, and told that despite taking around 1 month to carefully assess a database, anybody can do it since he published everything on github .
Following the Jepsen talk was one on the infrastructure of CloudFlare, given by John Graham-Cumming. He explained that CloudFlare deals with an enormous amount of logs (400T per day, compressed), so they do not store them due to resources, privacy and uselessness issues. He then explained how they leverage the lua module of Nginx and Apache Kafka to do stream processing and efficient computation of the streams, detecting attacks and handling traffic loads among others with streaming algorithm like HyperLogLog. For more information, check out Cloudflare's blog post outlining how their systems tie everything together.
The next speaker was Salvatore Sanfilippo, Pivotal employee and creator of Redis. Unexpectedly, Sanfilippo didn't speak about Redis, but rather presented Disque, a message queue written after Sanfilippo noticed that many were using Redis for this unintended use case. He mostly talked about the different operations supported by Disque, and about the restricted set of semantics supported ("at least once", "at most once"), and how he decided to drop the guarantee of ordering to simplify the design. Disque is also replicated, either synchronously or asynchronously, depending on the needs of the user.
Microservices at Netflix
Finally, Jeremy Edberg came on the stage for the last talk of the day to speak of the infrastructure at Netflix, about what you need for microservices and about whether you need them. Starting with the requirement of using an elastic infrastructure rather than your own DC, he went on speaking of making sure you use CI, monitoring, and manage network, which form a big part of a scalable infrastructure. Weighing in on the tradional discussion of "buy vs build", he recommended to not write your own load balancer, but instead to use HAProxy. According to him, the issue of service discovery in a microservice architecture is not a solved issue yet, so Netflix had to write their own system called Eureka which Edberg characterized as too complicated. He advised against using DNS for service discovery, citing caching latency and other properties of DNS as issues. He then went on the various tools of the chaos monkey suite, such as chaos gorilla, which can kill a complete DC, or chaos kong, which can redirect traffic from one region to another, adding latency at random to make sure everything is working correctly. He finished by reminding attendees that they need a solid data platform for microservices to work well at scale in a resilient way.
The day was concluded by a announce of a new conference from the same team, dedicated to security for developers, called DotSecurity, the Friday before DotScale 2016.