Interesting Statistical Anomaly – Triathlon Race Timing

I was analyzing the data (well, really I was just processing the data from the ugly text I can rip off the web into something that I can mangle with Excel) from the 2010 Peachtree City Sprint Triathlon and I found some interesting things.

First off, all props to Event Tech for getting the results posted so quickly1 although it would be nice if I could pull the data down and actually use it with a bit less manual processing. Basically I have to import this…

Event Tech Website

…into Excel and do a bunch of manual manipulation to end up with this…

Excel Times Replacement

…which for one allows me to sort by time and do various other things.

However, today I was doing something I hadn’t done before: I summed up the total of the swim/bike/run/T1/T2 times and compared it with the total time that Event Tech had calculated. Interestingly it was generally off. Off by one, two or three seconds.

Hmmm… fascinating.

Even more interesting is that it was rarely off by zero seconds.

I quickly realized that the total time was always equal to or less than the sum of the individual parts which implied to me that the total time was your chip time from the begin timing mat to the end timing mat and that the summation I had conducted was introducing rounding errors of some sort into the equation. After all, if you finish the swim in 10:00.4, your time on the sheet will say “10:00” but that 0.4 seconds still hangs on there and will contribute to your final time.

However, that doesn’t work. If you assume, as I did, that there are three splits that are introducing rounding errors (three because the maximum error was three seconds) it should be equally likely that the split will round down as up. That would mean there should have been instances where the split times added together were less than the total chip time. This, however, did not exist. Something was going on.

The next guess was that all splits were rounding up (three splits total). That would account for the sum of the splits being all greater than the total chip time. However, I ran a Monte Carlo simulation and got these results:

Bin – Frequency
0 – 21
1 – 438
2 – 464
3 – 21

Which is an expected distribution if everything is random however if you analyze the results from the race you get:

Bin – Frequency
0 – 11
1 – 266
2 – 496
3 – 159

This is markedly different from a random result. Something is biasing those numbers. The mean of the difference between the rounded times and the summed times for the Monte Carlo simulation is (as expected) approximately 1.5 but the mean of the race results is 1.8.

At this point I stopped. I could go on theorizing about why the numbers are off in the specific manner they are, but really it’s not that important.

What does this all mean? Absolutely nothing! As I mentioned above, your race results are your chip time from start to finish, it’s only when I started summing up their broken-out numbers that I noticed anything wonky and decided to geek out on this. I’m confident that my race time is accurate and even if it was not, it was only off by 1, 2 or 3 seconds. If those seconds make or break me, I should have trained harder!

I will probably send this link to Event Tech and ask if they have any insight; they probably do. It’s their software after all.


1: 2010 Tri PTC Results, although the white on black background thing should really go, guys. It’s tough on the eyeballs.

Comments

4 responses to “Interesting Statistical Anomaly – Triathlon Race Timing”

  1. Alan Jones Avatar
    Alan Jones

    You are assuming that the times are rounded down or rounded to the next whole number. That is not done. Times are always rounded up. The reasoning is that we cannot give a person a time unless they have run that time. Therefore, a 3:45.1 will always be printed out as 3:46 unless we print out the fractional part.

    I hope this helps.

    Note. I am the guy who wrote the software that Event Tech uses.

    Alan

  2. Alan Jones Avatar
    Alan Jones

    Note: Bill and I have been having some conversations about this problem. It turns out to be an interesting probability problem. I couldn’t solve it analytically so I used Bill’s approach — Monte Carlo. I think Bill’s problem is that he didn’t take into account that the splits are the difference between cumulative times. If the times at the end of each transition are t1, t2, t3, t5, t5. The splits are, for example, ceil(t2 – t1) where ceil() is the ceiling function which gives the next highest integer.

    This is what I get. The first column is the result of my computation and the second is what Bill got from the race results:

    0 – 10 11
    1 – 198 266
    2 – 498 496
    3 – 208 159
    4 – 8 0

    It’s interesting that I found a few that were off by 4 seconds whereas Bill found none. Well, I could still have an error. (The C++ random number generator generates a floating point number between 0 and 1 whereas in the race scoring software, times are stored as units of 0.01 seconds. That is, 100 per second. Maybe there would be a difference if I worked with integers.

  3. Bill Ruhsam Avatar

    Alan:

    Thanks for taking the time to look at this. I’ll throw the ideas we’ve discussed over email into my simulation and see what results I derive.

  4. […] I seem to make it a habit of examining race timings. […]

Leave a Reply

Your email address will not be published. Required fields are marked *