How a Data Scientist Looks at Contagion

USC Marshall researcher Kimon Drakopoulos takes a deep dive into how contagion spreads—through computer networks or the public. 


April 17, 2020

Kimon Drakopoulos is an assistant professor of data sciences and operations at the USC Marshall School of Business, and an expert in the spread of contagion through systems.

The dynamics of the virus are determined by the underlying social or travel network, incubation period, susceptibility and healing rates and are the driving force behind this pandemic.

Understanding the dynamics enables the design of effective mitigation techniques. In general, Asian nations, perhaps due to their experience with previous pandemics including SARS and MERS, employed these mitigation techniques to superior results. The United States, however, due to lack of coordinated public health response, lack of tests, and lack of a federal strategy, found itself playing catch-up.

How can nations do better next time? We asked him to answer a few questions based on his research:

The worldwide response to the coronavirus has been social lockdown. Is this the best way to control contagion?

Social lockdown is definitely an effective way to control contagion but it involves too much brute force. Specifically, as we are all experiencing, total lockdown has severe implications on the economy, mental health and well-being of individuals. Once an epidemic reaches a critical mass, social lockdown is necessary to protect the rest of the population, but during the early stages (mid-January in the United States) targeted quarantining and testing using the available data of the progress of the epidemic would have been more effective and prevented the major lockdown that we are now experiencing. 

"We need to create a central surveillance system that aggregates information about the symptoms of confirmed cases. Data on the propagation of the epidemic in early stages can be a defining factor between deaths in the low thousands—and millions of deaths."—Kimon Drakopoulos, assistant professor of data sciences and operations

How could we more effectively have used targeted quarantining, according to your research?

In the example of the coronavirus, targeted quarantining, testing and monitoring would include:

  • Random testing in the Washington-Seattle area and not just tracking the contacts of the positive cases. Data from the propagation of the virus had already shown that carriers can be asymptomatic for a week, hence by tracking contacts and only testing “high likelihood” individuals, we are at least a week behind of the epidemic. Knowing and tracing where the epidemic is can lead to an order of magnitude improvement of how effectively we can fight it (see research here). Clearly, testing was not available. But a central database with hospitals and clinics reporting frequency and intensity of “suspicious” symptoms would have allowed more efficient allocation of the available resources.
  • Prioritize quantity of test over accuracy. My research here (with my colleague Ramandeep Randhawa, professor of Data Sciences and Operations at USC Marshall) suggests that high accuracy in the early stages of an epidemic need not be a priority. 
  • Early targeted quarantining of crucial flight routes (see research here). There are cities and states that are central in the transportation patterns of the country, and if monitoring resources are limited, data science can provide metrics of which paths to monitor and prioritize. 
  • Using social network information and verified case data to update each individual’s probability of infection, and send personalized, real-time warnings and advice. 


How did the Chinese/Asian response differ from Western response? 

Data already shows that their response was more effective. Different countries have different explanations based on the sophistication of their response.

China acted fast and was decisive in isolating Wuhan from the rest of the country. Even a week of delay in taking such action can lead to an exponential increase of cases and therefore deaths. 

South Korea has been tracing the epidemic using credit card purchase records and smartphone location data, hence allowing a very effective intervention and very fine targeted quarantining (as described above). In Western countries, however, clearly, there is an ethical discussion to be had regarding privacy and personal rights. 

Random testing: I believe that this is the most effective measure that can have implication to thousands of deaths, as I explained above. When doing symptom-based testing, we are at least one week behind the epidemic. Random testing, however, which Germany and South Korea both employed to comparative positive outcomes, relies on a ready supply of tests and health care workers able to employ them. The United States unfortunately continues to lag behind in this effort. 

Finally, the creation and maintenance of a central database of cases, as well as per-citizen estimates of infection based on geography, social contacts, family cases, etc. is key. Singapore recently released an app to do just this.  


As a data scientist, where does your research come into play with a public health pandemic such as this one? 

I have attacked the problem from three different aspects. 

  • What is the optimal way of using information about the state of the epidemic to allocate limited resources (of curing, hospitalizing, quarantining) over time to minimize the duration of the epidemic?
  •  What are the crucial travel paths in the U.S. so that we can heavily monitor them and even quarantine if necessary to minimize contagion?
  •  We argue that producing more testing kits is way more important than making them more accurate, especially in the early stages of an epidemic.  


What's your advice to public health authorities in advance of the next pandemic?

As explained above, there are two main areas for improvement:

  • We need to create a central surveillance system that aggregates information about the symptoms of confirmed cases. Data on the propagation of the epidemic in early stages can be a defining factor between deaths in the low thousands—and millions of deaths.
  • Secondly, instead of “bang-bang” solutions (full lockdown vs. no social constraints), we should develop the capability to exploit data and technology and be ahead of the epidemic early on.