Datalines: An Interview with Maia Majumder

Novel data streams allow us to see what’s happening in areas where traditional disease surveillance is weak. As internet penetration around the world improves, we are getting better signal in more places.

Picture from Maia Majumder

Maia Majumder is a computational epidemiology research fellow at HealthMap and a PhD candidate at MIT. In our interview, she discussed her work tracking disease outbreaks, teaching through visualization, and applying a public health approach to gerrymandering. This is the third post in the Datalines series, which explores data science careers in the social sector.

Sophia: You characterize your work for HealthMap as “systems epidemiology.” Would you talk briefly about what that means? What is a public health approach to data science?

Maia: Systems epidemiology is a philosophy for approaching outbreaks that are happening around the world. A lot of the work that I do deals with outbreaks of emerging infectious disease and vaccine-preventable disease. Systems epidemiology focuses on how different diseases interact with each other, with the environment, infrastructure, and even with disease surveillance. That’s where the data science comes in.

Sophia: Traditional epidemiology relies largely on primary data collection, but you’ve used alternative data sources to track outbreaks in real time. What are those data sources and how do you use them in your work?

Maia: At HealthMap, we work with data from Google search trends, local news media feeds, Twitter, and so on. When we look at a person’s Twitter feed, we can see how they feel about things like vaccination, for example. We can see what kinds of topics they’re interested in. That lends a bit more insight than what we would get from their health record alone. These novel datasets have a lot of power to capture someone’s personality and their sentiment toward particular health behaviors.

We’re often getting so much data from these streams that traditional statistical methods can’t be used. Instead, we use other methods that have been developed in recent years to help parse through that data in a way that’s responsible, not just looking for correlations, but also for mechanistic explanations. Novel data streams allow us to see what’s happening in areas where traditional disease surveillance is weak. As internet penetration around the world improves, we are getting better signal in more places.

Sophia: What kinds of specific projects are you working on at HealthMap?

Maia: One of the things that we’d like to do moving forward is to study the ways that negative sentiment about vaccines spreads in social networks. There’s a theory that there are super-spreaders of belief systems and ideologies just as there are super-spreaders of disease. If we can target those folks before they start spreading these ideologies to other people, we can curb vaccine hesitancy. One of the things that we’d love to see is whether this hypothesis can be tested in a way that’s rigorous. That’s something that we’re working on right now.

Assessing the way people feel about things is an interesting research area. Sentiment can help us gauge how likely we are to see an uptake of any sort of intervention. Here in the US, this might be useful in the context of a water main break. How many people start boiling their water as prescribed? If the data show low uptake of boiling drinking water, we might expect more water-borne diseases showing up in hospitals. The vaccine hesitancy project is the first step in opening up this really rich source of information.

Sophia: You work quite a bit with text data, specifically to understand awareness and risk perception. What tools do you use to quantify these ideas? In what context?

Maia: Awareness is something that we can examine easily using Google search trends to gauge interest in certain topics. We can start to get an idea of how people feel about something just based on the phrases that they’re using. It’s as simple as looking at how many people are searching for the phrase “vaccines cause autism” versus “where can I get my flu vaccine.” These two statements, if you’re very careful about how you phrase them, can yield a lot of information at varying levels of granularity (national, state, and local) about how people feel about vaccines.

Most data scientists have used natural language processing of Twitter data to assess sentiment. That’s very valuable, and there are lots of really interesting dictionaries that help us figure out whether someone’s sentiment about something is positive or negative, but we also know those dictionaries aren’t 100% accurate. It’s algorithmic. By turning to Google search trends and curating a list of search topics to plug in, we have a little more human control over what we’re looking for. And that can be really rewarding, too. I think that’s one of the strengths of the HealthMap team in general. Most of what we do is machine operated, but we have a lot of human interaction. We catch things that a machine alone may not.

Sophia: You’ve written about your work for publications like NPR, fivethirtyeight, and Wired. What have you learned about communicating with the general public, especially on topics like vaccine hesitancy where public health messaging has been a challenge?  

Maia: I see a lot of value in doing editorials and op-eds, but I always do this kind of work using data and only data. Part of that is my own training; I develop opinions based on the story the data tell. In general, people feel more trust if there are numbers that are easy to understand attached to opinions. I try to communicate the work that we do at HealthMap in a way that makes sense for a general audience. That does take some finagling. I’ve had excellent editors who have helped me figure out the right balance between technical detail and just getting the message across. In my experience, three things make this kind of work accessible to a  larger number of people: clear numbers, strong narrative, and visualizations that capture everything I’m trying to say in a picture.

One-on-one interactions are also very powerful, especially when it comes to vaccine hesitancy. As great as the CDC and state health departments are at putting out really scientifically accurate information on vaccines, because these are faceless organizations, it makes it harder for parents to trust them. Because of that, it’s the public health practitioner’s duty to set some time aside to talk to parents about the concerns that they have, the things they have heard, to be respectful of the fact that they want to do what’s best for their kid and that, often, they’re scared. They’ve heard conflicting information and just want an authoritative source to give them the details. To give them the hard facts in a way that’s detailed and one-on-one and non-confrontational. This is part of why I’m very active on Twitter (@maiamajumder), and I also do a lot of Q&A’s with parents. I’ll get direct messages from parents and talk them through the questions that they have about vaccinating their child.

Sophia: You mentioned the importance of visualization in data communication. Could you describe a favorite visualization you’ve created and what made it successful?

Maia: When I first got started with HealthMap, I worked on a visualization of Middle East Respiratory Syndrome (MERS) in South Korea. This was a collaboration with Science News Magazine where I worked with their visualization team to curate this massive data set of contacts from the South Korean MERS outbreak. We were able to deduce who infected whom and create a network visual showing the trajectory of infections. We created an interactive version as well as the pdf that appeared in their paper magazine. Most of the feedback I’ve gotten on that is that it shows really clearly how super-spreading works. The South Korean MERS outbreak was really motivated by two or three people who ended up infecting many, many others — about two hundred in total. A good visualization can show phenomena that are really complicated to explain mathematically, like super-spreading. This one was really fun!

Sophia: You’ve applied some of the same methods you use regularly in epidemiology to the problem of gerrymandering. Would you talk a little bit about your awareness research in that context?    

Maia: I’ve applied some of the same approaches to gerrymandering that I’ve used to study public health issues such as infectious diseases, violence, and hate crimes. In fact, my work in the areas of gun violence and hate crimes is what landed me in this space of gerrymandering. Gun violence and hate crimes undoubtedly affect public health, but they’re also very politicized, and that’s the case for gerrymandering as well. I would argue that gerrymandering is a public health problem because it dictates who gets to make decisions for that state’s healthcare, infrastructure, and environment. That’s bringing the systems epidemiology approach.

I wish that there were more science-minded people who were interested in gerrymandering. Fortunately, I think that’s something that is slowly changing. Mathematicians are trying to come up with better ways to quantify whether gerrymandering is happening somewhere or not. Data scientists and social scientists also have a lot to contribute. By doing this work myself, I hope to get others involved.

A lot of the work I’ve done so far is to simply assess how much awareness of gerrymandering varies across the United States. What I’ve found is that there are definitely areas that are more aware and that are more interested in gerrymandering than others. One of the things I was interested in was whether awareness of gerrymandering correlates with presence of gerrymandering, and I actually found that places that were more aware of gerrymandering tended to be less gerrymandered.

It seems that folks who aren’t affected by it at all are actually more interested in it. What that indicates to me is that we haven’t done a good enough job of educating people about what gerrymandering is, why it affects them, and why it’s important to call your representatives and demand fair redistricting. This is the first step in a bigger project in which I plan to look at the correlates of awareness of gerrymandering versus limited interest or no awareness. Specifically, I’m looking at things like education and socioeconomic status.

Sophia: Thank you, Maia. One last question: I read in your bio that you moonlight as a jazz vocalist. Would you be willing to share a recent musical discovery?

Maia: Music is my getaway from so many different things. I’ve been doing a lot of recording in my home studio recently just to blow off some steam. My husband used to run his own podcast, and he’s an excellent sound engineer, so this is something that we do together. We work on different tracks at home. It’s a great alternative when it’s hard to get out and do a show. I think that everybody needs something. Hobbies are really important, and I wish we talked about them more!