Postdoc, University of Pennsylvania, 2019
Postdoc, Princeton University, 2017
PhD, The Hong Kong Polytechnic University, 2014
BEng, Wuhan University, 2010
Assistant Director, Bioinformatics and Systems Biology Core Facility, UNMC
Our laboratory is focusing on bioinformatics, machine learning and computational biology, especially in single cell analysis, multi-omics analysis, spatial transcriptomics and machine learning for processing large-scale biological data. To unravel the mechanisms of molecular biological systems in which enormous amounts of heterogeneous data are usually involved, bioinformatics and machine learning are perfect tools. Besides collaborating with scientists in cancer biology, metabolism, immunology, pathology and developmental biology, our laboratory is mainly to develop artificial intelligence, machine learning and/or data science based methods to tackle essential biomedical problems in genomics, transcriptomics, epigenetics, proteomics, metabolomics, and interactomes.
Single cell analysis
As one of the most essential and far-reaching technologies in recent decades, single-cell sequencing has been selected as “Method of the Year” by Nature Methods three times, namely single-cell sequencing for 2013, single-cell multimodal omics for 2019 and spatial transcriptomics for 2020. By enabling profiling at the individual-cell level, single-cell sequencing enables researchers to characterize novel cell types and interrogate intra-population heterogeneity. Given the rich information single cell data reveal, there are many challenges related to single cell analysis that remain to be addressed, including but not limited to, clustering, batch effect correction, multi-modal data integration, trajectory inference, RNA velocity, etc. Our laboratory has made some contributions to the topic of clustering, where I proposed the first computational model (i.e., SHARP) that is capable of processing 10-million cells for single-cell data analysis fast and accurately. The remaining challenges are a major focus in our lab.
The integration of multi-omics including genomics, transcriptomics, epigenetics, proteomics, metabolomics and interactomics in both bulk and single cell data, can contribute to pinpointing biomarkers of disease and physiology, and to deciphering mechanisms of associations among genotypes, phenotypes and envirotypes. Commonly, conventional models first analyze each type of omics data independently and then leverage potential (yet perhaps weak) connections among them. These models heavily rely manual interventions and different persons might yield different interpretations. To overcome these problems, our laboratory will develop computational models to automatically learn heterogeneous multi-omics data. For example, some potential machine learning algorithms include multi-kernel learning, co-learning, multimodal representation, and joint representation. The first two categories of models simultaneously learn the informative features from the multi- omics data, whereas the latter two categories automate the representations of multi-omics data in the same feature space. For single-cell multi-omics data, we can also combine our existing machine learning models to deal with high-dimensionality and big-data problems.
In spatial transcriptomics, elucidating single-cell heterogeneity while also retaining the spatial information is crucial for understanding key aspects of cell development and differentiation, cell- cell interaction, tumorigenesis, and cancer progression, etc. Spatial transcriptomics has also been used to determine subcellular localization of mRNA molecules, which is highly related to my previous research on protein subcellular localization. More importantly, with the coupling of spatial transcriptomics and single cell sequencing, we anticipate multimodal spatial profiling of transcriptome, genome, and proteome simultaneously. Given that both sequencing and imaging data were generated in spatial transcriptomics, we believe computational models can play significant roles in spatial transcriptomics. Our laboratory leverages machine learning and bioinformatics methods to tackle these problems.
Machine learning for intelligent healthcare and precision medicine
With the explosion of heterogeneous and big data in biology and medicine, machine learning has become an essential tool for cancer research, disease diagnostics, therapeutic development, and drug discovery. In addition to the omics data, our laboratory is also interested in tackling other basic, translational and clinical data in various data formats (e.g., sequencing, electronic health records, images, etc) by artificial intelligence, machine learning and data science methods to facilitate intelligent healthcare and precision medicine. By establishing extensive collaborations with clinicians, doctors, radiologists and pathologists, we develop and/or leverage computational models to help unravel the mechanisms of pathogenesis, tumorigenesis, disease diagnosis and prognosis, treatment design and drug discovery.
- S. Wan and M. W. Mak, “Machine Learning for Protein Subcellular Localization Prediction”, ISBN 978-1-5015-0150-0, published by De Gruyter, Germany, 2015.
- S. Wan, Y. Fan, C. Jiang and S. Li, “Bioinformatics and Machine Learning for Cancer Biology”, published by MDPI, ISBN 978-3-0365-4814-2, Switzerland, 2022. (Edited Book)
- S. Wan, J. Kim and K. J. Won, “SHARP: Hyper-Fast and Accurate Processing of Single-Cell RNA-seq Data via Ensemble Random Projection”, Genome Research, 2020, vol. 30, pp. 205-213.
- S. Wan*, C. Jiang, S. Li and Y. Fan, “Special Issue on Bioinformatics and Machine Learning for Cancer Biology”, Biology, 2022, vol. 11, no. 3, 361.
- S. Wan* and J. Wang*, “A Sequence Obfuscation Method for Protecting Personal Genomic Privacy”, Frontiers in Genetics, 2022, vol. 13, article 876686.
- T. Sakamoto, K. Batmanov, S. Wan, Y. Guo, L. Lai, R. B. Vega and D. P. Kelly, “The Nuclear Receptor ERR Cooperates with the Cardiogenic Factor GATA4 to Orchestrate Transcriptional Control of Cardiomyocyte Differentiation”, Nature Communications, 2022, vol. 13, no. 1991, pp. 1-20.
- R. Wang, X. Zheng, J. Wang, S. Wan, M. H. Wong, K. S. Leung and L. Cheng, “Improving Bulk RNA-seq Classification by Transferring Gene Signature from Single Cells in Acute Myeloid Leukemia”, Briefings in Bioinformatics, 2022, vol. 23, no. 2, bbac002.
- V. Honnell, J. Norrie, A. Patel, C. Ramirez, J. Zhang, K. Lai, S. Wan and M. A. Dyer, "Identification of a Modular Super-Enhancer in Murine Retinal Development", Nature Communications, 2022, vol. 13, no. 253, pp. 1-13.
- W.Qi, W. Rosikiewicz, Z. Yin, B. Xu, H. Jiang, S. Wan, Y. Fan, G. Wu and L. Wang, “Genomic Profiling Identifies Genes and Pathways Dysregulated by HEY1-NCOA2Fusion and Shed a Light on Mesenchymal Chondrosarcoma Tumorigenesis”, Journal of Pathology, 2022, vol. 257, no. 5, pp. 579-592.
- P. C. Chen, X. Han, T. Shaw, H. Sun, M. Niu, Z. Wang, Y. Jiao, B. Teubner, D. Eddins, L. Beloate, B. Bai, J. Mertz, Y. Li , Y. Fu , J. H. Cho , X. Wang , Z. Wu , S. Poudel , Z. F. Yuan, A. Mancieri, J. Low, H. M. Lee, M. Patton, L. Earls, E. Stewart, P. Vogel, S. Wan, G. Serrano, T. Beach, M. Dyer, R. Smeyne, T. Moldoveanu, T. Chen, G. Wu, S. Zakharenko, G. Yu and J. Peng, “Alzheimer’s Disease-Associated U1 snRNP Splicing Dysfunction Causes Neuronal Hyperexcitability and Cognitive Impairment”, Nature Aging, 2022, vol. 2, pp. 923-940.
- C. Jiang, S. Wan, P. Hu, Y. Li and S. Li, “Editorial: Transcriptional Regulation in Metabolism and Immunology”, Frontiers in Genetics, 2022, vol. 13, article 845697.
- S.Singh, W. Quarni, M. Goralski, S. Wan, H. Jin, L. A. Van de Velde, J. Fang, R. Sing, Y. Fan, M. Johnson, W. Akers, P. Murray, P. G. Thomas, D. Nijhawan, A. M. Davidoff and J. Yang, “Targeting the Spliceosome through RBM39 Degradation Results in Exceptional Responses in High-Risk Neuroblastoma Models”, Science Advances, 2021, vol. 7, no. 47, eabj5405.
- A. Lavado, R. Gangwar, J. Pare, S. Wan, Y. Fan and X. Cao, “YAP/TAZ Maintain the Proliferative Capacity and Structural Organization of Radial Glial Cells during Brain Development”, Developmental Biology, 2021, vol. 480, pp. 39-49.
- T. Sakamoto, T. Matsuura, S. Wan, D. Ryba, J. Kim, K. J. Won, L. Lai, C. Petucci, N. Petrenko, K. Musunuru, R. Vega, D. Kelly, “A Critical Role for Estrogen-Related Receptor Signaling in Cardiac Maturation”, Circulation Research, 2020, vol. 126, pp. 1685-1702.
- B. Ahn, S. Wan, N. Jaiswal, R. Vega, D. E. Ayer, P. M. Titchenell, X. Han, K. J. Won, and D. P. Kelly, “MondoA Coordinately Drives Muscle Lipid Accumulation and Insulin Resistance”, JCI Insight, 2019, 4(15): e129119.
- S. Wan* and M. W. Mak*, “Predicting Subcellular Localization of Multi-Location Proteins by Improving Support Vector Machines with Adaptive-Decision Schemes”, International Journal of Machine Learning and Cybernetics, 2018, vol. 9, pp. 399–411.
- S. Wan*, M. W. Mak*, and S. Y. Kung, "FUEL-mLoc: Feature-Unified Prediction and Explanation of Multi-Localization of Cellular Proteins in Multiple Organisms",Bioinformatics, 2017, vol. 33, no. 5, pp. 749–750.
- S. Wan*, M. W. Mak*, and S. Y. Kung, “Transductive Learning for Multi-Label Protein Subchloroplast Localization Prediction”, IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2017, vol. 14, pp. 212–224.
- S. Wan*, M. W. Mak*, and S. Y. Kung, "Gram-LocEN: Interpretable Prediction of Multi-Location Gram-Positive and Gram-Negative Bacterial Protein Subcellular Localization ",Chemometrics and Intelligent Laboratory Systems, 2017, vol. 162, pp. 1–9.
- J. Q. Wang, C. C. Zhang, S. Wan and G. Peng. "Is Congenital Amusia a Connectome Disorder?: A Diffusion MRI Study Combining Tract- and Network-Based Analysis", Frontiers in Human Neurosciences, 2017, vol. 11, pp. 473. doi: 10.3389/fnhum.2017.00473.
- S. Wan*, M. W. Mak*, and S. Y. Kung, "Ensemble Linear Neighbourhood Propagation for Predicting Subchloroplast Localization of Multi-Location Proteins", Journal of Proteome Research, 2016, vol. 15, pp. 4755–4762.
- S. Wan*, M. W. Mak*, and S. Y. Kung, “Sparse Regressions for Predicting and Interpreting Subcellular Localization of Multi-Label Proteins”, BMC Bioinformatics, 2016, 17:97.
- S. Wan*, M. W. Mak*, and S. Y. Kung, "Mem-mEN: Predicting Multi-Functional Types of Membrane Proteins by Interpretable Elastic Nets", IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2016, vol. 13, pp. 706–718.
- S. Wan*, M. W. Mak*, and S. Y. Kung, "Benchmark Data for Identifying Multi-Functional Types of Membrane Proteins",Data in Brief, 2016, vol. 8, pp. 105–107.
- S. Wan*, M. W. Mak*, and S. Y. Kung, "Mem-ADSVM: A Two-Layer Multi-Label Predictor for Identifying Multi-Functional Types of Membrane Proteins",Journal of Theoretical Biology, 2016, vol. 398, pp. 32–42.
- S. Wan*, M. W. Mak, and S. Y. Kung, "mLASSO-Hum: A LASSO-Based Interpretable Human-Protein Subcellular Localization Predictor", Journal of Theoretical Biology, 2015, vol. 382, pp. 223–234.
- S. Wan, M. W. Mak, and S. Y. Kung, “mPLR-Loc: An Adaptive-Decision Multi-Label Classifier Based on Penalized Logistic Regression for Protein Subcellular Localization Prediction”, Analytical Biochemistry, 2015, vol. 473, pp. 14–27.
- S. Wan, M. W. Mak, and S.Y. Kung, "HybridGO-Loc: Mining Hybrid Features on Gene Ontology for Predicting Subcellular Localization of Multi-Location Proteins", PLoS ONE, 2014,9(3): e89545.
- S. Wan, M. W. Mak, and S. Y. Kung, “R3P-Loc: A Compact Multi-Label Predictor Using Ridge Regression and Random Projection for Protein Subcellular Localization”, Journal of Theoretical Biology, 2014, vol. 360, pp. 34–45.
- S. Wan, M. W. Mak, and S. Y. Kung, "Semantic Similarity over Gene Ontology for Multi-Label Protein Subcellular Localization ", Engineering, 2013, vol. 5, pp. 68-72.
- S. Wan, M. W. Mak, and S. Y. Kung, "GOASVM: A Subcellular Location Predictor by Incorporating Term-Frequency Gene Ontology into the General Form of Chou’s Pseudo-Amino Acid Composition",Journal of Theoretical Biology, 2013, vol. 323, pp. 40–48.
- S. Wan, M. W. Mak, and S. Y. Kung, "mGOASVM: Multi-Label Protein Subcellular Localization Based on Gene Ontology and Support Vector Machines",BMC Bioinformatics, 2012, 13:290.