Wan Lab

Our laboratory is focusing on artificial intelligence (AI), machine learning (ML), bioinformatics, and computational biology, especially in cancer research, intelligent healthcare, precision medicine, single cell analysis, multi-omics analysis, and spatial transcriptomics. To unravel the mechanisms of molecular biological systems in which enormous amounts of heterogeneous data are usually involved, bioinformatics and machine learning are perfect tools. Besides collaborating with scientists in cancer biology, metabolism, immunology, pathology and developmental biology, our laboratory is mainly to develop artificial intelligence, machine learning and/or data science based methods to tackle essential biomedical problems (e.g., cancer, neurological disorders, chronic diseases, and clinical informatics) by leveraging multi-omics data including genomics, transcriptomics, epigenetics, proteomics, metabolomics, and interactomes as well as medical imaging data and electronic health records (EHR) data.

Artificial intelligence and machine learning for cancer research
With the recent significant progress in large language models (LLMs) and foundation models, artificial intelligence (AI) and machine learning (ML) are revolutionizing cancer research in early detection, diagnosis, prognosis, treatment design, and personalized patient care. Our lab has developed a series of AI/ML and bioinformatics based approaches and tools to address various essential topics for multiple types of pediatric and adult cancer (e.g., leukemia, medulloblastoma, breast cancer, lung cancer, prostate cancer, and pancreatic cancer), including reducing health disparities, cancer subtype identification, cancer molecular characterization, cancer diagnosis, prognosis and treatment response prediction. Our unique contributions in these areas lie in that we have developed tailored-designed machine learning (including deep learning and large language models) approaches to leverage multi-modal data information to achieve superior prediction performance than other methods. We are seeking to implement our models into translational and clinical applications in biomedicine and cancer research.

Single cell analysis
As one of the most essential and far-reaching technologies in recent decades, single-cell sequencing has been selected as “Method of the Year” by Nature Methods three times, namely single-cell sequencing for 2013, single-cell multimodal omics for 2019 and spatial transcriptomics for 2020. By enabling profiling at the individual-cell level, single-cell sequencing enables researchers to characterize novel cell types and interrogate intra-population heterogeneity. Given the rich information single cell data reveal, there are many challenges related to single cell analysis that remain to be addressed, including but not limited to, clustering, batch effect correction, multi-modal data integration, trajectory inference, RNA velocity, etc. Our laboratory has made significant contributions to the topic of clustering, where I proposed the first computational model (i.e., SHARP) that is capable of processing 10-million cells for single-cell data analysis fast and accurately. We are also actively working in other cutting-edge topics in single cell analysis including single cell multi-omics integration, batch correction, rare cell detection, cell type annotation, pseudo time analysis and RNA velocity.

Multi-omics analysis
The integration of multi-omics including genomics, transcriptomics, epigenetics, proteomics, metabolomics and interactomics in both bulk and single cell data, can contribute to pinpointing biomarkers of disease and physiology, and to deciphering mechanisms of associations among genotypes, phenotypes and envirotypes. Commonly, conventional models first analyze each type of omics data independently and then leverage potential (yet perhaps weak) connections among them. These models heavily rely manual interventions and different persons might yield different interpretations. To overcome these problems, our laboratory will develop computational models to automatically learn heterogeneous multi-omics data. For example, some potential machine learning algorithms include multi-kernel learning, co-learning, multimodal representation, and joint representation. The first two categories of models simultaneously learn the informative features from the multi- omics data, whereas the latter two categories automate the representations of multi-omics data in the same feature space. For single-cell multi-omics data, we can also combine our existing machine learning models to deal with high-dimensionality and big-data problems.

Spatial transcriptomics
In spatial transcriptomics, elucidating single-cell heterogeneity while also retaining the spatial information is crucial for understanding key aspects of cell development and differentiation, cell- cell interaction, tumorigenesis, and cancer progression, etc. Spatial transcriptomics has also been used to determine subcellular localization of mRNA molecules, which is highly related to my previous research on protein subcellular localization. More importantly, with the coupling of spatial transcriptomics and single cell sequencing, we anticipate multimodal spatial profiling of transcriptome, genome, and proteome simultaneously. Given that both sequencing and imaging data were generated in spatial transcriptomics, we believe computational models can play significant roles in spatial transcriptomics. Our laboratory leverages machine learning and bioinformatics methods to tackle these problems.

Machine learning for intelligent healthcare and precision medicine
With the explosion of heterogeneous and big data in biology and medicine, machine learning has become an essential tool for cancer research, disease diagnostics, therapeutic development, and drug discovery. In addition to the omics data, our laboratory is also interested in tackling other basic, translational and clinical data in various data formats (e.g., sequencing, electronic health records, images, etc.) by artificial intelligence, machine learning and data science methods to facilitate intelligent healthcare and precision medicine (e.g., neurological disorder, chronic diseases, etc.). By establishing extensive collaborations with clinicians, doctors, radiologists and pathologists, we develop and/or leverage computational models to help unravel the mechanisms of pathogenesis, tumorigenesis, disease diagnosis and prognosis, treatment design and drug discovery.

Full list of publications.

Principal Investigator

Shibiao Wan, PhD

Assistant Professor, Department of Genetics, Cell Biology, and Anatomy
Co-Director, Bioinformatics and Systems Biology PhD Program

Send Email