The Shot Heard Round The World
The University of California system recently canceled its subscription to Elsevier’s extensive database of scientific and medical research literature. This news is the latest and most significant in a string of similar releases from other academic institutions that can no longer afford subscriptions to paywall publishers or support closed science policies that sequester knowledge and only grant access to those who can pay.
The University of California’s cancellation of its Elsevier contract rang a bell that has amplified an ongoing conversation across academic institutions as well as the private sector. Canceling subscriptions to paywalled scientific content is an accelerating trend that marks the beginning of a transition from private “ownership” of scholarly content to services that deliver content on demand when and where researchers need. This post will examine the current state of scientific literature, why cancellations to paid subscriptions are accelerating, how researchers will replace lost access to scientific content when subscriptions are canceled, and what new approaches are required to ensure the research community is even more productive in the future?
“Canceling subscriptions to paywalled scientific content is an accelerating trend that marks the beginning of a transition from private “ownership” of scholarly content to services that deliver content on demand when and where researchers need.”
Where are we today?
Worldwide life sciences research and development (R&D) spending is expected to exceed $180B in 2019. The primary means of communicating results from all this R&D is through publishing scientific articles in specialty journals. Today, researchers find scientific literature through keyword search strategies and read one article at a time. Scientific literature exists as an unleveraged asset that hasn’t evolved and delivers no more value today than it did four decades ago. We need new tools that unlock the massive latent value embedded in the scholarly record. We need to create systems that enable scientific literature to be consumed in fundamentally new ways that speed hypothesis generation, answer questions, reduce wasteful replication of previously conducted research and accelerate scientific workflows.
Innovation targeting more effective ways to use scientific literature is lacking. Scientific literature is a large and valuable dataset that should be ingested by specialized software that preserves the format of the articles on the front-end but transforms the contents on the back-end so that entirely new literature-based workflows are enabled. The emergence of open access scientific literature that can be computed on, combined with modern software solutions designed for literature-based workflows, creates an opportunity to build a new type of scientific content consumption platform that will significantly reduce the costs of accessing literature while simultaneously bringing new productivity tools to researchers.
“Today, researchers find scientific literature through keyword search strategies and read one article at a time. Scientific literature exists as an unleveraged asset that hasn’t evolved and delivers no more value today it did four decades ago. We need new tools that unlock the massive latent value embedded in the scholarly record. We need to create systems that enable scientific literature to be consumed in fundamentally new ways that speed hypothesis generation, answer questions, reduce wasteful replication of previously conducted research and accelerate scientific workflows.”
Why Now?
A combination of several factors is accelerating the trend toward canceling paywall subscriptions. Subscription prices to access paywall content have been rising for decades and have reached a tipping point for many where costs no longer justify the value gained by subscribing to the content. It’s not just price, but price relative to the value gained and price compared to alternatives. The paywall publishers aren’t exactly known as innovators working hard to bring valuable new solutions to their customers. Instead, the business model seems to be one of raising prices consistently while the services delivered to customers remain mostly unchanged. Some improvements have emerged at the margins, but new user experiences that make scientific literature based workflows a lot better and more productive remain absent.
What Now?
Fortunately, when subscriptions are cancelled and researchers lose access to paywall content, open access scientific literature can serve as an alternative. A recent study published in PeerJ documents the rapid growth, diversity, and depth of open access scientific literature. Impressively, approximately half of all recently published scientific literature (since 2015) is now open access. In other words, the total annual output of published scientific literature is now split equally between the paywalled publishers and the open access publishers. As a result, open access scientific literature has emerged as a viable and free alternative to paywalled literature for the scientific community.
With subscription prices continuing to rise, the value of subscription service flattening and viable alternatives emerging, it’s clear that more and more researchers will turn to open access scientific literature as their primary literature resource. There are significant challenges, however, that are likely to frustrate researchers and limit the usefulness of this potentially valuable resource. Just launching a search engine like Google Scholar or PubMed and performing some keyword searches to find papers is not going to create a solution that enables researchers to work more efficiently or extract the maximum value from open access scientific literature. A significant limitation to unlocking the full value of open access literature is an inability to consume it at scale and consume it in a way that advances research agendas. Countless disparate repositories make relevant content hard to find and an absence of new software that makes all that content computable limits what can be done once content is found. New solutions to challenges like these are needed.
The Cost of Status Quo
Researchers require access to varied scientific content such as scientific papers, patents, protocols, regulatory documents, study reports, and written conversations – to name a few. The cost of today’s substandard solutions is staggering. A recent article in The Guardian showed that it requires, on average, 15 clicks to access a single article and the effort often involves multiple logins to different repositories, dead links, and countless redirects. All that effort is required to access a PDF, which is essentially a picture of a scientific article with little additional value beyond portability.
Benjamin Kaube, the author of The Guardian article, states “The scale of this problem is huge: 10 million researchers around the world access 2.5bn journal articles online each year [pdf]. The time wasted trying to access them is a tax on human progress and on the development and dissemination of new scholarly knowledge. By estimating the average amount of time wasted by researchers trying to gain access to a single article, I’ve calculated that research output equivalent to around 11,500 academics is lost each year.” Importantly, this isn’t just an academic researcher problem. The private life sciences sector suffers from the same loss of productivity in their R&D departments as well.
“By estimating the average amount of time wasted by researchers trying to gain access to a single article, I’ve calculated that research output equivalent to around 11,500 academics is lost each year.”
What New Tools Are Needed?
Modern software development approaches can improve how researchers access scientific literature so that mountains of lost productivity can be reclaimed. However, this is just the beginning. Once aggregated and easily findable, open access scientific should be computable in new and useful ways. Scientific content should exist in its native format but bend to the will of a researcher’s desired use case. For, example, if a researcher needs to quickly mine a paper(s) for all mentioned gene symbols (without reading the paper) and rapidly launch an analytic to assemble a gene/gene interaction map, it should only require a few clicks because the article and analytics have already been made interoperable.
Or, if a researcher wants to perform a complex multi-faceted search across different document types such as scientific publications, patents, and biological protocols, it should be easily accomplished from one search interface without having to log into multiple repositories because the different datasets have already been normalized and made interoperable. Moreover, if desired, search results should be easily shared with colleagues, saved for future reuse and scheduled to automatically launch at intervals set by the researcher to stay on top of recent publications in her field. One integrated platform should execute all these activities. These are only a few examples of what’s possible. Once scientific content is made computable, the use cases and ability to enable more productive literature-based workflows are nearly endless.
Scientific Content Consumption Platforms
We believe a fundamentally new type of scientific content consumption platform is needed to usher in a more affordable and more productive future for the research community. The volume of existing scientific literature is too large, the pace of new literature publication is too fast, and the workflows that are informed by published research are too important to throttle researchers with generic search engines and citation managers as their solutions.
In order to deliver new and more productive literature-based workflows a single platform must emerge that makes it easy to log in and:
– Find content quickly
– Determine relevancy
– Perform faceted navigation
– Save complex queries
– Integrate literature and downstream analytics
– Create libraries
– Add new content
– Collaborate on literature-based tasks
– Share papers and share extracted text or figures
– Annotate text and figures
– Export and share annotations
– Text mine biomedical entities and identify associations
– Create and export citations
In other words, a scientific content consumption platform is a single sign-on end-to-end solution that does the heavy lifting of aggregating large (and diverse) scientific literature sources and makes that content computable and usable to get work done.
“The volume of existing scientific literature is too large, the pace of new literature publication is too fast, and the workflows that are informed by published research results are too important to throttle researchers with generic search engines and citation managers as their solutions.”
The Catalytic Platform
We are a team of life scientists and software engineers who believe the brightest minds in science should have access to powerful productivity tools. The Catalytic Platform is a new kind of R&D cloud built specifically for how life scientists work. At the time of this blog post, we have ingested, integrated and made interoperable approximately 35 million open access scientific papers, patents, drug labels, and biological protocols. We continue to identify and ingest valuable open access scientific content wherever we can find it. Our talented engineering team has been building proprietary bioNLP search capabilities, text mining solutions and multistep analytic workflows designed to enable researchers to ask important questions of the scientific literature rather than struggle with old tools designed to only find, store and read stuff. The Catalytic Platform is available to the scientific community now and ready to serve those who want a solution to replace or add to paywall subscriptions. We architected the Catalytic Platform so that it can serve individual researchers, labs, life sciences companies or entire academic institutions.
This is Part One of a series that examines the changing landscape of scientific literature, the failure of current tools to meet researchers needs and a new type of scientific content consumption platform enabled by the open access movement and new software architectures. In Part Two, we will dive into the specifics of the Catalytic Platform and how it solves long-standing challenges holding back research productivity.