Twitter and its iconic blue bird symbol are well known to social media users. It was primarily designed to be used as a platform for people to talk to friends and to get a glimpse into celebrities’ lives. However, as the platform has become more and more popular, Twitter has gained a reputation for its powerful capability to transmit information in real time. GlobalData is exploring how Twitter may be used in epidemiology.
Novel research studies have looked into the possibilities of using Twitter to track disease outbreaks or to measure prevalence. While these epidemiological uses of Twitter are not being widely implemented yet, the findings have been positive and may revolutionise how we study diseases in the future.
Filtering out the real flu from the reactions
It is important for public health agencies to know where infectious disease outbreaks occur as soon as possible, but the current national sentinel surveillance networks do not have complete coverage of their countries and it is costly to set up network expansions. Twitter data mining may be a way to obtain up-to-date disease outbreak information and to validate outbreaks.
Flu surveillance is one of the earliest areas of studies on using Twitter to track disease outbreaks. Several studies in the UK and US have found statistically strong correlations between flu-related tweets and diagnosed influenza-like-illness indicators. However, one difficulty in obtaining accurate surveillance data from Twitter for flu is figuring out how to separate “reactionary” tweets as a result of media storms from useful tweets about getting sick. Therefore, finding indirect ways to measure the disease frequency may be useful in certain situations.
A study by Collier and colleagues in 2011 that looked at counts of tweets of protective behaviours, such as increased sanitation, avoiding activities that are high risk for flu transmission, and wearing masks, have found strong correlation with flu incidence.
Correlations with cholera
The Haitian cholera outbreak after the January 2010 earthquake provided another opportunity to study how Twitter may be used in an outbreak situation. Within half a year of the devastating earthquake, cholera plagued Haiti and resulted in 380,000 reported cases and 5,800 reported deaths. The specific Haitian cholera strain had spread to the Dominican Republic and the US within a month, and then to four other countries, including as far as Spain in Europe. Haiti’s poor healthcare, disaster aid, and water and sewage infrastructure resulted in difficulties in containing the disease and delays in information gathering of the extent of the outbreak.
A study published by Chunara and colleagues in the American Journal of Tropical Medicine and Hygiene assessed that Twitter postings about the disease within the first 100 days of the 2010 Haitian cholera outbreak significantly correlated with official case data, and was even available up to two weeks earlier. The speed of the Haitian cholera strain’s transmission and large geographical spread is an indicator of what we need to anticipate future highly infectious outbreaks.
The reach of the Haitian cholera outbreak was more than any single country’s surveillance system can map, but Twitter’s global reach could potentially lend a hand in mapping out hot spots and track disease spread.
Twitter’s tricky terminology
Twitter’s usage in monitoring chronic diseases had also been studied. One difficulty in using Twitter to study diseases is how to identify disease terminologies accurately within Twitter’s colloquial lexicon. A study conducted by Weeg and colleagues published in the Journal of Medical internet Research in 2015 found mild correlations between Twitter mentions and prevalence of certain diseases, but the correlations are not so straight forward and vary largely depending on the disease; certain diseases are better tracked via Twitter than others.
For example, measuring disease frequency of “heart attacks” can be difficult as the terminology is often used in ways not related to the actual medical condition, such as when a person tweets “I can’t believe I saw my ex, it almost gave me a heart attack.” Similarly, the “stroke” terminologies only referred to the actual medical condition 22% of the time.
Diabetes, however, is a disease that can be tracked more easily via Twitter, as the researchers found that 98% of tweets containing the word “diabetes” referred to the medical condition. For diseases that have high terminology ambiguity, higher correlation with prevalence can be obtained with human-aided adjustments. Difficulties with terminology ambiguity mean the process is going to be difficult to automate through search algorithms.
While the most traditional way to use Twitter in healthcare is to disseminate healthcare information, spread disease awareness, and marketing, several recent epidemiology studies have found new uses for Twitter. However, we are still far from being able to use Twitter effectively in disease trend research, and most of past studies are done with English Tweets only. Twitter had 326 million monthly active users as of Q3 in 2018 and roughly 79% are based outside the US, making it an important data source both in terms of size and geographical spread. More research is needed to refine methodology and to see if the same methodology is applicable on a global scale in non-English speaking regions.