by Will Benton, Red Hat Senior Software Engineer

Last year’s Tour de France winner, Chris Froome, is a freak of nature. His endurance, performance and recovery are unmatched. However, given recent events in the sport, his consistent success over the past couple years was questioned by some. To keep the skeptics at bay, Froome’s team did something unprecedented - they released two years of Froome’s bio and performance data. The analysis not only exonerated Froome of any doping allegations, but also established that “exceptional aerobic potential” and “excellent recovery”, among other things, were factors behind his success.

Performance data analysis is not new to sport, and certainly not new to cycling. The US women’s cycling team who won silver at the 2012 London Olympics famously used light box therapy to compensate for the lack of early morning sun exposure based on their analysis of circadian rhythms and weather. Their analysts started out with spreadsheets and soon realized they would need a purpose built analytics platform to crunch through the numbers and provide them the insights and visualizations to make rapid decisions around training and race day preparation.

r3793315543 Image from Veloviewer:

As part of their training programs, cyclists often plan interval workouts, where periods of hard, steady effort alternate with rest periods. By targeting work intervals of different durations, athletes can train different physiological systems, but rolling terrain and traffic can make it difficult to plan routes with steady efforts of a given duration. Our own Will Benton recently presented at the Spark Summit where he outlined how Apache Spark helped him apply Big Data techniques to this problem by letting him rapidly query a vast amount of personal exercise telemetry data to reveal the best places to do different kinds of interval workouts.

The next stop for Will is to figure out a way to apply this approach to make sense of data from an entire community of cyclists. As with most Big Data use cases, Will’s work incorporates analysis of multiple and disparate data streams including geographic and time-series data; it is also interdisciplinary, incorporating computational geometry, visualization, and domain knowledge of exercise physiology. Check out Will’s blog where you can find a video of his analysis using Spark and a number of other posts about applying analytic techniques to fitness data, realized with open-source software.

Big Data has found its way into all major sports as coaches look to wring out every ounce of advantage a team could derive from analyzing their data and that of their competitors. This article by Roger Craig, ex-49er running back, lists the growing number of uses cases in football.

Read our blog post about Big Data at the recently concluded FIFA World Cup, and how a machine data analytics solution built on Splunk and Red Hat technology can help you mine insights quickly and cost-effectively. If you’d like to hear from a Red Hat customer who successfully deployed Splunk analytics software on Red Hat Storage, please join us for a free webinar on August 6th at 11am PT.