Combining RunKeeper GPS Logs with Weather Data
Health and sports tracking apps have made it possible to keep detailed logs of when, where and how you performed many kinds of exercise activities. This data allows such apps to keep all sorts of statistics for you to feel proud of. In my case, I use Runkeeper to keep track of my running activities, and I can pat myself on the back for my total number of kilometers over the years, or be disappointed when I don’t break my record for 3-5k runs. But can we learn anything useful from this data?
Of course it would be amazing to have access to all of Runkeeper’s data troves, but I’ll just make do with my own dataset. Of course this leads to some caveats. The dataset is not very big (order 100 datapoints) and covers a period of time where I was steadily trying to improve my running, so there is a large variation in performance. Any conclusions drawn from such a dataset must of course be taken with a grain of salt; effects might especially not be visible because they get buried in the huge variance. However, it has still turned out to be an interesting little project to see what this data can tell me about myself.
If you’re a Runkeeper user, you can download your own data and see if you can distill any lessons from it. Or at least you can make cute pictures, like the overlay of the GPS data of all my runs in the image above. The data comes in a set of GPS Exchange (.gpx) files, one for each run, containing the GPS data for your activity and accompanying metadata. These files are not really human-readable, but luckily parsers are available to deal with them. I went for Python and used the
gpxpy GPX parser, which allows for very simple extraction of some basic statistics about a given activity, like start and end time, duration, average velocity, etc.
Of course all of these things you can usually see in your app of choice, but by extracting these values we can now start asking our own aggregate questions. For example, what time of day do I usually run, and does this have any effect on my performance?
To examine this, I generated the plots above. On the left you see a histogram, counting how many of my running activities fell into any hour of the day. I mostly run in the weekends so I can do it during the day, and I obviously prefer the hours around midday for a run, although there are a few early-evening runs as well. However, it doesn’t really seem to matter when I run; there’s no correlation to be found between time of day and the average velocity of the runs.
So what might have an effect? My hypothesis was that the biggest determinant of fitness (that we can measure here) is the regularity of your activities. I have periods of high and periods of low activity when it comes to running, mostly depending on how busy I am with other thing, but also due to the seasons, getting bored with running, etc. So I expect that when gaps fall into my routine, my fitness will go down. As a measure of this, let’s calculate for each run how long ago the last one was, and see how the velocity correlates with that.
Here we do see some correlation. More than a linear correlation, it looks like something of a step-wise function. I visually divided the plot into four quadrants – time since last run less or more than one week, and velocity above or below 12 m/s – and one of those is strikingly empty. It looks like a short time between runs is no guarantee of high velocities, since the bottom left quadrant is well populated; but a time longer than one week between runs does seem to make it impossible for me to hit any peak velocities! That makes for an interesting conclusion: it seems that running at least once a week is necessary to keep the same fitness level.
Running and the Weather
I was also wondering how the weather affected my runs. Luckily, around where I live, very detailed weather data is readily available, courtesy of the KNMI, our national meteorological institute. They provide hour-resolution historic weather data on temperature, precipitation, wind, etc. It only took a matching of the timestamps on my runs and on the weather data to compile a full activity-weather data set.
Rain could potentially be a big factor. If you get really soaked, the added weight can be a nuisance. Unfortunately for the dataset, I do my best to avoid the rain altogether, as you can see in the next diagram.
The data indeed shows that I almost never go out when it’s rainy, and even the activities marked with Rain are only those that took place in the same hour that any sort of rain was measured; it doesn’t mean I actually ran while it was raining.
So unfortunately there’s not much point trying to find an effect here. I simply haven’t tried to measure any. But there are some other quantities for which the data is not so one-sided. Next here are some performance plots for temperature, wind speed and humidity.
It’s not so easy here to come up with a sensible hypothesis. Temperature has competing effects: if it’s colder, it’s easier to lose heat; but on the other hand, if it’s warmer, your muscles are happier to be kicked into gear. Head-wind of course slows you down, but if you usually run in a big circle like I do, you should have the wind both in the face and in the back for parts of the run. High humidity might reduce your ability to expel heat, but is it a strong enough effect to notice at my level?
The plots are pretty clear, they show no significant correlation whatsoever. The effects of the weather are obviously more psychological (“It’s raining? I’ll skip this one…”) than physical, at least at my amateur level. Especially given the earlier conclusion that keeping up the routine is very important, I’ll try to keep this data in mind to push me out the door when it’s looking cold and gray outside!