Shibiao Wan, Ph.D

Assistant Professor 


Assistant Professor
Assistant Director for Bioinformatics and Systems Biology Core
Department of Genetics, Cell Biology and Anatomy
985805 Nebraska Medical Center
Omaha, NE 68198-5805
Lab Website

Postdoc, University of Pennsylvania, 2019
Postdoc, Princeton University, 2017
PhD, The Hong Kong Polytechnic University, 2014
BEng, Wuhan University, 2010

Academic Appointments:
Assistant Director, Bioinformatics and Systems Biology Core Facility, UNMC

Our laboratory is focusing on bioinformatics, machine learning and computational biology, especially in single cell analysis, multi-omics analysis, spatial transcriptomics and machine learning for processing large-scale biological data. To unravel the mechanisms of molecular biological systems in which enormous amounts of heterogeneous data are usually involved, bioinformatics and machine learning are perfect tools. Besides collaborating with scientists in cancer biology, metabolism, immunology, pathology and developmental biology, our laboratory is mainly to develop artificial intelligence, machine learning and/or data science based methods to tackle essential biomedical problems in genomics, transcriptomics, epigenetics, proteomics, metabolomics, and interactomes.

Single cell analysis
As one of the most essential and far-reaching technologies in recent decades, single-cell sequencing has been selected as “Method of the Year” by Nature Methods three times, namely single-cell sequencing for 2013, single-cell multimodal omics for 2019 and spatial transcriptomics for 2020. By enabling profiling at the individual-cell level, single-cell sequencing enables researchers to characterize novel cell types and interrogate intra-population heterogeneity. Given the rich information single cell data reveal, there are many challenges related to single cell analysis that remain to be addressed, including but not limited to, clustering, batch effect correction, multi-modal data integration, trajectory inference, RNA velocity, etc. Our laboratory has made some contributions to the topic of clustering, where I proposed the first computational model (i.e., SHARP) that is capable of processing 10-million cells for single-cell data analysis fast and accurately. The remaining challenges are a major focus in our lab.

Multi-omics analysis
The integration of multi-omics including genomics, transcriptomics, epigenetics, proteomics, metabolomics and interactomics in both bulk and single cell data, can contribute to pinpointing biomarkers of disease and physiology, and to deciphering mechanisms of associations among genotypes, phenotypes and envirotypes. Commonly, conventional models first analyze each type of omics data independently and then leverage potential (yet perhaps weak) connections among them. These models heavily rely manual interventions and different persons might yield different interpretations. To overcome these problems, our laboratory will develop computational models to automatically learn heterogeneous multi-omics data. For example, some potential machine learning algorithms include multi-kernel learning, co-learning, multimodal representation, and joint representation. The first two categories of models simultaneously learn the informative features from the multi- omics data, whereas the latter two categories automate the representations of multi-omics data in the same feature space. For single-cell multi-omics data, we can also combine our existing machine learning models to deal with high-dimensionality and big-data problems.

Spatial transcriptomics
In spatial transcriptomics, elucidating single-cell heterogeneity while also retaining the spatial information is crucial for understanding key aspects of cell development and differentiation, cell- cell interaction, tumorigenesis, and cancer progression, etc. Spatial transcriptomics has also been used to determine subcellular localization of mRNA molecules, which is highly related to my previous research on protein subcellular localization. More importantly, with the coupling of spatial transcriptomics and single cell sequencing, we anticipate multimodal spatial profiling of transcriptome, genome, and proteome simultaneously. Given that both sequencing and imaging data were generated in spatial transcriptomics, we believe computational models can play significant roles in spatial transcriptomics. Our laboratory leverages machine learning and bioinformatics methods to tackle these problems.

Machine learning for intelligent healthcare and precision medicine
With the explosion of heterogeneous and big data in biology and medicine, machine learning has become an essential tool for cancer research, disease diagnostics, therapeutic development, and drug discovery. In addition to the omics data, our laboratory is also interested in tackling other basic, translational and clinical data in various data formats (e.g., sequencing, electronic health records, images, etc) by artificial intelligence, machine learning and data science methods to facilitate intelligent healthcare and precision medicine. By establishing extensive collaborations with clinicians, doctors, radiologists and pathologists, we develop and/or leverage computational models to help unravel the mechanisms of pathogenesis, tumorigenesis, disease diagnosis and prognosis, treatment design and drug discovery.


Publications listed in PubMed

Selected Publications:

  1. S. Wan, J. Kim and K. J. Won, “SHARP: Hyper-Fast and Accurate Processing of Single-Cell RNA-seq Data via Ensemble Random Projection”, Genome Research, 2020, vol. 30, pp. 205-213.
  2. J. Wang and S. Wan*, “Editorial: Single Cell Meets Metabolism and Cancer Biology”, Frontiers in Oncology, 2023, vol. 13, article 1125186.
  3. T. Sakamoto, K. Batmanov, S. Wan, Y. Guo, L. Lai, R. B. Vega and D. P. Kelly, “The Nuclear Receptor ERR Cooperates with the Cardiogenic Factor GATA4 to Orchestrate Transcriptional Control of Cardiomyocyte Differentiation”, Nature Communications, 2022, vol. 13, no. 1991, pp. 1-20.
  4. S. Wan* and J. Wang*, “A Sequence Obfuscation Method for Protecting Personal Genomic Privacy”, Frontiers in Genetics, 2022, vol. 13, article 876686.
  5. R. Wang, X. Zheng, J. Wang, S. Wan, M. H. Wong, K. S. Leung and L. Cheng, “Improving Bulk RNA-seq Classification by Transferring Gene Signature from Single Cells in Acute Myeloid Leukemia”, Briefings in Bioinformatics, 2022, vol. 23, no. 2, bbac002.
  6. S. Singh, W. Quarni, M. Goralski, S. Wan, H. Jin, L. A. Van de Velde, J. Fang, R. Sing, Y. Fan, M. Johnson, W. Akers, P. Murray, P. G. Thomas, D. Nijhawan, A. M. Davidoff and J. Yang, “Targeting the Spliceosome through RBM39 Degradation Results in Exceptional Responses in High-Risk Neuroblastoma Models”, Science Advances, 2021, vol. 7, no. 47, eabj5405.
  7. T. Sakamoto, T. Matsuura, S. Wan, D. Ryba, J. Kim, K. J. Won, L. Lai, C. Petucci, N. Petrenko, K. Musunuru, R. Vega, D. Kelly, “A Critical Role for Estrogen-Related Receptor Signaling in Cardiac Maturation”, Circulation Research, 2020, vol. 126, pp. 1685-1702.
  8. S. Wan* and M. W. Mak*, “Predicting Subcellular Localization of Multi-Location Proteins by Improving Support Vector Machines with Adaptive-Decision Schemes”, International Journal of Machine Learning and Cybernetics, 2018, vol. 9, pp. 399–411.
  9. S. Wan*, M. W. Mak*, and S. Y. Kung, "FUEL-mLoc: Feature-Unified Prediction and Explanation of Multi-Localization of Cellular Proteins in Multiple Organisms",Bioinformatics, 2017, vol. 33, no. 5, pp. 749–750.
  10. S. Wan*, M. W. Mak*, and S. Y. Kung, “Transductive Learning for Multi-Label Protein Subchloroplast Localization Prediction”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, vol. 14, pp. 212–224.
  11. S. Wan*, M. W. Mak*, and S. Y. Kung, "Gram-LocEN: Interpretable Prediction of Multi-Location Gram-Positive and Gram-Negative Bacterial Protein Subcellular Localization ",Chemometrics and Intelligent Laboratory Systems, 2017, vol. 162, pp. 1–­9.
  12. S. Wan*, M. W. Mak*, and S. Y. Kung, "Ensemble Linear Neighbourhood Propagation for Predicting Subchloroplast Localization of Multi-Location Proteins", Journal of Proteome Research, 2016, vol. 15, pp. 4755–­4762.
  13. S. Wan*, M. W. Mak*, and S. Y. Kung, "Mem-mEN: Predicting Multi-Functional Types of Membrane Proteins by Interpretable Elastic Nets", IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2016, vol. 13, pp. 706–718. 
  14. S. Wan*, M. W. Mak, and S. Y. Kung, "mLASSO-Hum: A LASSO-Based Interpretable Human-Protein Subcellular Localization Predictor", Journal of Theoretical Biology, 2015, vol. 382, pp. 223–234. 
  15. S. Wan, M. W. Mak, and S. Y. Kung, "mGOASVM: Multi-Label Protein Subcellular Localization Based on Gene Ontology and Support Vector Machines",BMC Bioinformatics, 2012, 13:290.