Spending my days elbow deep in transit data means that I see some strange outliers. They can reflect surprising things that humans do, or they can indicate a problem with an algorithm or hardware. Part of my job is figuring out what they mean and what to do about them.
Crazy clickers
We pay attention to how users interact with the Rider app. We recently released a new feature of our Traveler product—Inform—that allows transit agencies to alert riders to service changes and interruptions. We hate notification overload as much as you do, so we wanted to make sure alerts only go to riders for routes and stops the rider really cares about. One way we did this was by looking at what a user clicks on in the app, and developing an algorithm to decide which routes they should receive alerts about. In the process, I discovered a few craaaaazy clickers, or users who appear to have tapped a stop every few seconds for more than 30 minutes. We got it, you care about that one!
Splitting people in half
Part of what we do at TransLoc is making sense of all the data about buses and riders to make it easier for riders to use transit, and easier for transit agencies to serve the needs of their riders. Combining different types of data can paint a picture of how the system is working and how people are using it. We put together vehicle and rider movement patterns, and use an algorithm to figure out when a user is riding a bus. This information helps transit agencies understand how riders use their services and make improvements. While I was analyzing this data to learn which routes riders use together, I noticed that, occasionally, we detect one rider riding two buses at once! This happens sometimes when two buses are following each other for quite a while. We’re working on improving our algorithm to fix this!
Whoa, Super Rider, hold your horses!
I was looking at patterns of where riders in our local area go most often, to help inform future extensions of routes and stop changes. I was surprised when I plotted these locations on a map and discovered they were all over the world! At first I thought I had mis-handled the data, but then I realized our local riders really do trot the globe (although most of their destinations were near home, as expected). Unfortunately, we don’t expect bus routes going from North Carolina to Italy any time soon….
Overall, our data is pretty thorough and clean, however, we do encounter these random data outliers time and again. I have a lot of fun figuring out these anomalies and how we can refine our algorithms to account for them. Our goal is to find the most accurate numbers we can so that our agencies have the most comprehensive data available to inform their decision-making. Keep clicking like crazy, people. We got this.