Datalines: An interview with Elaine McVey

“Thinking about how we collect data, how we use data, and how it affects people is part of the obligation of being in data science.” – Elaine McVey

Picture from Elaine McVey

This is the first interview in a series called Datalines that explores data science careers in the social sector. The idea for the series came from conversations with classmates about the social footprint of algorithms in communities near and far. I reached out to NCSU alumna Elaine McVey to learn about her work as Data Science Lead for TransLoc, a transit technology company based in RTP. Elaine also serves on the board of Insightus, an organization that makes the power of data science available to nonprofits. In our interview, she shares insights about transit simulation, measuring access to early voting, and participating in the R community.

Sophia:  The mission of TransLoc is “to make public transit the first choice for all.” As Data Science Lead, how do you approach this goal, and what tools do you tend to use in your work?

Elaine: TransLoc has a number of products that help transit agencies in different ways, but our main focus at the moment is something called microtransit, which is a really new, up-and-coming approach for transit agencies to think about. Microtransit, as we define it, is demand-driven transit, owned by transit agencies, that optimizes vehicle efficiency and the rider experience. People can request rides based on where they need to go and when they need to get there. Then, the microtransit system assigns those rides to vehicles. This gives transit agencies more options. For example, microtransit can help solve what’s known as the “first mile, last mile” problem. People can only walk so far to get to a bus stop. If you can use other modes of transit to bring people to existing, high quality, fixed-route services, then you can really expand the service area.

Right now, transit agencies are very interested in microtransit. They recognize how it will help them serve their communities, and in some cases, make their operations more efficient. The challenge is that it’s very different than what they’re used to, and it relies heavily on algorithms and technology. We help transit agencies understand how to respond to dynamic demand and anticipate the number of vehicles they’ll need. Specifically, we do this through simulation. We look at how that service would work with a variety of numbers of ride requests and numbers of vehicles, basically an experimental design. And that lets them look at performance in terms of wait time, how much ride pooling is happening, and how productive their vehicles are.

Sophia: I love your description of the “first mile, last mile” problem. That’s why I drive to Raleigh instead of taking the bus from Durham! What you’re describing sounds like a way to bring the technologies behind Uber and Lyft into the domain of public transportation. Is that accurate?

Elaine: That’s a pretty good way to understand it. Because it’s public transit, we have to focus on helping get vehicles off the road — so, not giving every individual their own ride in their own vehicle. The idea is to make it as efficient as possible and also provide the most benefit to the city by reducing traffic and getting people around effectively. It’s up to the transit agency to decide what makes sense for their community. From the rider standpoint, it’s very similar to Uber or Lyft. People are accustomed to that. We want to be able to do something similar with public transit, particularly for people who can’t afford to take Uber and Lyft everywhere they need to go.

Sophia: How does does that difference in mission affect the actual data that you’re collecting?

Elaine: In terms of data, we’re looking at who a transit agency needs to serve. It’s less about normal customer orientation from a marketing standpoint and more of a public service orientation. We make money when agencies pay us for the software, so we’re trying to help serve their mission rather than looking for the most profitable riders. It’s really important in a public transit agency to make sure they’re serving everyone fairly. So when we look at things like census data, we’re looking at everyone who could be served rather than people who are willing to pay for the service.

Sophia: You developed an R package for transit data specifically. How did you first recognize a need for a transit-specific package, and how is it being used?

Elaine: So this gets at another aspect of what TransLoc does, which is not directly related to microtransit, but tangentially related. One form that transit data takes is a specification called GTFS. It’s designed primarily for fixed-route, traditional transit systems. It shows where the stops are, what times buses go on different routes, all those kinds of details. That specification is what an agency needs to submit to Google for their transit information to show up in Google maps, which of course transit agencies want because it helps them communicate to the public. The problem is that creating those feeds and maintaining them is very cumbersome. TransLoc has a free product called Architect that makes it easier for agencies to manage. Architect was developed by our software and design engineers, but the data science team helped research the state of GTFS feeds to inform the product design. To make this research easier, we wrote an R package (gtfsr) that allows R users to pull in GTFS data from an API, put it on a map with routes and stops, and generally interact with the data more easily. We were initially motivated to build the package for our own use, but also wanted to contribute it to the open source community. We haven’t tracked a lot about how people are using it, but we have heard from researchers who have used it. Our hope is that people will build cool things on top of it!

Sophia: Maybe this is a good segue into your involvement in the R community and the R Ladies Meetup here in RTP. Why did you choose to become involved in this forum?

Elaine: At the 2016 useR! Conference, there was a lot of publicity around women being involved in the community, including through R Ladies meetup groups. There wasn’t one in this area at the time, so I posted something on twitter – which is a great way to get involved in the R community – about wanting to start a one. Another woman I’d met at the conference connected me to Mine Çetinkaya-Rundel, who is on the faculty at Duke. We worked together to organize the RTP group. The group meets once a month in different locations, which helps get different parts of the community involved. We’re really trying to help women, especially women who are just getting started in R, share knowledge. R Ladies gives women a friendly place to get experience presenting and networking.

Sophia: Let’s talk for a moment about a separate project that you do on a volunteer basis. You’re on the board of Insightus, an organization that makes the power of data science available to nonprofits. What are some of the ways that nonprofits could use data to increase their impact? What projects that you’ve worked on are you most proud of?

Elaine: Some very worthwhile organizations don’t get the benefits of having the kinds of data science and software teams that are important in industry just because they don’t have the money to fund them. There’s a whole variety of ways that they can make use of that capability when it’s available to them. For example, they make a public records request related to their work and they get a huge number of documents and need to be able to sort through those. That’s a place where algorithms and analytics and software can help a lot. Another example is doing analysis that makes the case for work they’re trying to do using statistical expertise.

One project I was involved with for Insightus that we’re really proud of was to look at early voting in North Carolina. Early voting sites are set by county, and can be used by anyone in the county. About half of voters make use of early voting in presidential election years. It’s particularly popular with minority voters, and anyone who is lower income or has an inflexible work schedule. Every election, the early voting sites can be changed by the counties. So what we looked at was, as those changes are made, how does people’s access to voting change? We used distance from the nearest polling place as a proxy for access. We had latitude and longitude for every voter’s location and calculated the distance to the closest early voting site under different plans. We broke it down many ways, but specifically by race, which is what’s protected by law, to look at whether the changes being made had a disproportional effect on voters of different races. There’s now a software tool available to look prospectively at plans that are being proposed and how they will affect people by various demographics in every county.

Sophia: Something that has come up a lot in conversations at the IAA is the idea of “data for good.” Is that a phrase that you hear often? What is your reaction to it?

Elaine: Data science powers a lot of things that affect people’s lives. I think keeping in mind the need to make sure those tools are also applied to causes that have clear social benefit, particularly that benefit people who belong to groups not well represented in for-profit companies, is really important.That is part of the purpose of Insightus, to make sure those tools are available across the board. Thinking about how we collect data, how we use data, and how it affects people is part of the obligation of being in data science. Think about how the data you’re using was collected and whether it’s representative of the population that it should represent. This is also a really important question in transit. Because transit has to make sure that it’s serving all the populations in the community and transit agencies have an obligation to report and demonstrate that any changes they make don’t have a racially disparate effect — much like voting.

Columnist: Sophia Bessias