Open Access Database Created to Accelerate Research into Novel Coronavirus

In the wake of the unprecedented coronavirus pandemic, we have witnessed communities and nations come together in unexpected ways, based on the knowledge that collaboration is vital – and uplifting – in times of unease. We have seen rival pharmaceutical companies band together to find a vaccine for the virus, China has sent its own doctors and supplies to Italy and researchers of the closely linked viruses SARS and MERS have come together to shift their attention to the novel coronavirus. 

Now, medical, publishing, technological and governmental bodies have come together to produce the COVID-19 Open Research Dataset – an online database of over 29,000 scholarly articles related to the coronavirus and its family. The dataset, which went live on Semantic Scholar on Monday, was called for creation by the White House’s Office of Science and Technology. The database was brought about by the collaboration of a host of organisations: Microsoft lent its literature curation tools to facilitate searching for relevant literature; the National Library of Medicine provided access to the literature; the Chan Zuckerberg Initiative provided access to preprints; the Allen Institute for AI made the content machine readable so that AI algorithms can efficiently sift through the multitudes of articles; and Georgetown University’s Centre for Security and Emerging Technology coordinated the entire initiative.

Crucially, the database has been made an open access archive. That is, the wealth of information is not hidden behind a large paywall; it is freely accessible to anyone with an internet connection and a few hundred megabytes of free storage space on their computer. The aim is to allow the massive amount of research that has already been conducted into the family of viruses to reach as many people as possible, particularly within the technology community. The White House has issued a call to action for AI researchers to develop data and text mining algorithms that can sift through the masses of data at a much faster rate than humans can.

Alongside the launch of the database, a set of 10 priority questions were posted to Kaggle – an online forum for AI researchers. These questions included ‘What do we know about virus genetics, origin and evolution?’ and ‘What do we know about COVID-19 risk factors?’. The crux of the initiative lies in the first four words of these questions: what do we already know about this virus? We may have never met the coronavirus before, but we’ve met its cousins. This puts us in an advantageous position from the get-go; we are not facing this virus unarmed – there exists decades’ worth of information that we must leverage in order to combat it.

The decision to make this database open access is a no-brainer. We know that allowing research to penetrate as many communities as far and wide as possible accelerates scientific progress. Why then, is almost half of all scientific research in the UK still being hoarded by publishing groups behind a large pay wall? It is an uncomfortable thought that we only prioritise the need for speedy research above revenue in exceptional circumstances such as the pandemic we are currently facing. Why is it so, that the 770,000 deaths caused by HIV and AIDS each year, do not call for accelerated research which can be made possible by mandatory open access? It is unsettling that we are willing to let people die by delaying our progress in the bid to find a cure, all in the name of profit and publishing prestige.

In 2015, a group of senior health researchers published an open letter in the New York Times stating that the Ebola pandemic could have been prevented if research wasn’t kept behind paywalls, as an article published in a journal that required a paid subscription warned of the emerging outbreak as early as 1982. As University students in the UK, we take for granted our access to the abundance of information available to us at the tap of a finger. In developing nations, universities and institutions do not have the £4million a year to throw at journal subscriptions, as top UK universities do. Scientific journals have formulated the most ideal business model for themselves – they charge authors to publish work in their journals, without paying the people they hire to peer-review the articles, while also charging people to read those articles. To paint a picture of how much revenue some of the most prestigious journals are making, Elsevier, a leading science, technology and medicine journal, trumped even Apple’s pay margins in 2017. 

Common access to scientific research should be normalised. It empowers people to make their own judgments on issues that directly affect them. This is global public health – and the weight of public health transcends profits. As so much misinformation has been spread about the coronavirus, this exemplifies the need to make peer-reviewed articles freely accessible for anyone to read evidence-based articles on an issue that is almost certainly going to affect them. Furthermore, as the UK government fund almost 30% of the research that occurs across the nation, it makes no sense that the taxpayer is not granted access to the information that they have funded.

The premise of open access is simple: extend the reach of each piece of research as far as possible so that eventually, knowledge will land in the lap of someone who knows what to do with it. This is how scientific progress is accelerated, and this shouldn’t be leveraged exclusively in times of global emergency. When a full appreciation of the scientific research process is adopted, there can be no case against the need for open access. Dr Mike Taylor, a scientist at the University of Bristol, succinctly states “If you are a scientist, your job is to bring new knowledge into the world. And if you bring new knowledge into the world, it’s immoral to hide it.” There is so much redundancy in producing a piece of research and keeping it contained within one’s community. Nature Medicine recently posted an article that stated that the collective international experiences of the related SARS and MERS outbreaks are the best tool in our arsenal to contain and combat coronavirus. Let’s hope that the major scientific, medical, pharmaceutical and technological advancements that are bound to come about as a result of the COVID-19 Open Research Dataset and beyond will remind the scientific community that the highest quality and most efficient research is fundamentally reliant upon collaboration and global knowledge exchange, and that there is no place for paywalls in scientific research.

image source:, NARVIKK