Knowledge synthesis of 100 million biomedical documents augments the deep expression profiling of coronavirus receptors

March 26,2021

May 26, 2020

Abstract: The COVID-19 pandemic demands assimilation of all biomedical knowledge to decode mechanisms of pathogenesis. Despite the recent renaissance in neural networks, a platform for the real-time synthesis of the exponentially growing biomedical literature and deep omics insights is unavailable. Here, we present the nferX platform for dynamic inference from 45 quadrillion+ possible conceptual associations from unstructured text and triangulation with insights from Single Cell RNA-sequencing, bulk RNAseq and proteomics from diverse tissue types. A hypothesis-free profiling of ACE2 suggests tongue keratinocytes, olfactory epithelial cells, airway club cells and respiratory ciliated cells as potential reservoirs of the SARS-CoV-2 receptor. We find the gut as the putative hotspot of COVID-19, where a maturation correlated transcriptional signature is shared in small intestine enterocytes among coronavirus receptors(ACE2, DPP4, ANPEP). A holistic data science platform triangulating insights from structured and unstructured data holds potential for accelerating the generation of impactful biological insights and hypotheses. 

AJ Venkatakrishnan1Arjun Puranik1Akash Anand2David Zemmour1Xiang Yao3Xiaoying Wu3Ramakrishna Chilaka2Dariusz K. Murakowski1Kristopher Standish3Bharathwaj Raghunathan4Tyler Wagner1Enrique Garcia-Rivera1Hugo Solomon1Abhinav Garg2Rakesh Barve2Anuli Anyanwu-Ofili3Najat Khan3Venky Soundararajan1*
1nference, Cambridge, MA 02142, USA.
2nference Labs, Bengaluru, KA 560017, India.
3Janssen Research & Development LLC, USA.
4nference, Toronto, ON M5V 2G9, Canada.
Correspondence: Venky Soundararajan ( 
Media coverage:
© 2020, Venkatakrishnan et al.
This article is distributed under the terms of the Creative Commons Attribution License permitting unrestricted use and redistribution provided that the original author and source are credited.