University of Nebraska Medical Center

Past Projects

Prediction of domain-domain interactions from protein-protein interactions

A vast majority of proteins must interact with other proteins to perform their intended functions. Proteins are made of functional modules known as domains that create the interface of an interaction through highly specific recognition events. Thus, knowledge on domain-domain interactions (DDIs) is very important for understanding the nature and the significance of protein-protein interactions (PPIs). Currently, the number of experimentally-known DDIs is very small, which warrants the development of computational inference methods for predicting functionally-significant DDIs.

We created a comprehensive, non-redundant dataset of 209,165 experimentally-derived PPIs by combining datasets from five major interaction databases. We introduced an integrated scoring system that uses a novel combination of a set of five orthogonal scoring features covering the probabilistic, evolutionary, evidence-based, spatial and functional properties of interacting domains, which can map the interacting propensity of two domains in many dimensions. This method outperforms similar existing methods both in the accuracy of prediction and in the coverage of domain interaction space. We predicted a set of 52,492 high-confidence DDIs to carry out cross-species comparison of DDI conservation in eight model species including human, mouse, Drosophila, C. elegans, yeast, Plasmodium, E. coli and Arabidopsis. Our results show that only 23% of these DDIs are conserved in at least two species and only 3.8% in at least 4 species, indicating a rather low conservation across species. Pair-wise analysis of DDI conservation revealed a 'sliding conservation' pattern between the evolutionarily neighboring species. Our methodology and the high-confidence DDI predictions generated in this study can help to better understand the functional significance of PPIs at the modular level, thus can significantly impact further experimental investigations in systems biology research.

Published articles related to this project:

  • Guda C, King BR, Pal LR, Guda P. A top-down approach to infer and compare domain-domain interactions across eight model organisms. PLoS ONE (2009) 4:e5096 [Pubmed]

Tracing the evolutionary origin of functional modules in the human proteome

The functional repertoire of the human proteome is an incremental collection of functions accomplished by protein domains evolved along the Homo sapiens lineage. Therefore, knowledge on the origin of these functionalities provides a better understanding of the domain and protein evolution in human. This study reports a unique approach for understanding the evolution of human proteome by tracing the origin of its constituting domains hierarchically, along the Homo sapiens lineage. The uniqueness of this method lies in subtractive searching of functional and conserved domains in the human proteome resulting in higher efficiency of detecting their origins. From these analyses the nature of protein evolution and trends in domain evolution can be observed in the context of the entire human proteome data. The method adopted here also helps delineate the degree of divergence of functional families occurred during the course of evolution.

This approach to trace the evolutionary origin of functional domains in the human proteome facilitates better understanding of their functional versatility as well as provides insights into the functionality of hypothetical proteins present in the human proteome. This work elucidates the origin of functional and conserved domains in human proteins, their distribution along the Homo sapiens lineage, occurrence frequency of different domain combinations and proteome-wide patterns of their distribution, providing insights into the evolutionary solution to the increased complexity of the human proteome.

Published articles on this project:

  • Pal LR,Guda C. Tracing the origin of functional and conserved domains in the human proteome:implications for protein evolution at the modular level. BMC Evolutionary Biology(2006) 6:91[Pubmed]

Motif recognition in voltage-gated ion channel proteins

Voltage-gated ion channels (VGC) mediate selective diffusion of ions across cell membranes to enable many vital cellular processes. Three-dimensional structure data is virtually lacking for VGC proteins due to limitations in the crystallization of these mostly hydrophobic transmembrane proteins. Therefore, to better understand their function, there is a need to identify the conserved patterns using sequence analysis methods. VGC proteins assemble as functional tetramers from four monomer subunits in K+ ion channels or from four repeats of a single polypeptide in Ca2+ and Na+ channel sub-families. For Ca2+ and Na+ channel proteins, we generated profiles for each repeat and created profile-to-profile alignments for all repeats using a phylogenetic guide tree built from the consensus sequences of repeats. In this study, we identified several new conserved patterns specific to each transmembrane segment (TMS) of the voltage-sensing and the pore-forming modules in each sub-family. For Ca2+ and Na+, the functional theme of pattern conservation is similar in almost all segments while they differ with those of the K+ channel proteins, except in the S4 segment of voltage-sensing module. For each subfamily, we also identified residues conserved 50% or more in each TMS, their biological significance and disease associations in human.

Published articles on this project:

  • Guda P, Bourne PE, Guda C, Conserved motifs in voltage-sensing and pore-forming modules of voltage-gated ion channel proteins. Biochem. Biophys. Res. Commun. (2007) 352:292-298

Alignment of multiple protein structures using Monte Carlo optimization

A global and comprehensive study of protein structures is possible only by comparison of multiple structures and investigation of their folding similarities and evolutionary relationships. With the availability of vast amounts of structural information, accurate and fully automated structural alignment algorithms are needed for a better understanding of sequence-structure-function relationships in proteins. Here, we present a new algorithm for the alignment of multiple protein structures using Monte Carlo optimization method. The algorithm uses pair-wise structural alignments as a starting point. Four different types of moves were designed to generate random changes in the alignment. A distance-based score is calculated for each trial move and moves are accepted or rejected based on the improvement in the alignment score until the alignment is converged. Initial tests on 66 protein structural families show promising results, the score increases by 69% on average. The increase in score is accompanied by an increase (12%) in the number of residue positions incorporated into the alignment. Two specific families, protein kinases and aspartic proteinases were tested and compared against curated alignments from HOMSTRAD and manual alignments. This algorithm has improved the overall number of aligned residues while preserving key catalytic residues.

Published articles on this project:

  • Guda C, Pal LR, Shindyalov IN, DMAPS: A Database of Multiple Alignments for Protein Structures. Nucleic Acids Research (2006) 34: D273-276 [Pubmed]
  • Guda C, Lu S, Scheeff E, Bourne PE, Shindyalov IN, CE-MC: A Multiple Protein Structure Alignment Server. Nucleic Acids Research (2004) 32: W100-W103 [Pubmed]
  • Guda C, Scheeff ED, Bourne PE, Shindyalov IN, A new algorithm for the alignment of multiple protein structures using Monte Carlo optimization. Proceedings of the Pacific Symposium on Biocomputing (2001), pp. 275-286 [Pubmed]

Comparative genome analysis

We developed computational methods for comparative analysis of complete chloroplast genomes of solanaceous crop species and grass plant species. Specifically, we analyzed the inter-genomic spacer regions of these genomes in all-against-all fashion to compare and contrast the similarities and the differences.

Published articles on this project:

  • Saski C, Lee SB, Fjellheim S, Guda C, Jansen RK, Tomkins J, Rognli OA, Daniell H, Clarke JL., Complete chloroplast genome sequences of Hordeum vulgare, Sorghum bicolor and Agrostis stolonifera, and comparative analyses with other grass genomes. Theoretical and Applied Genetics (2007) 115:571-590 [Pubmed]
  • Daniell H, Lee SB, Grevich J, Saski C, Guda C, Tomkins J, Jansen RK., Complete chloroplast genome sequence of Solanum tuberosum, Lycopersicon esculentum and comparative analyses with other Solanaceous genomes. Theoretical and Applied Genetics (2006) 112:1503-1518 [Pubmed]

Phylogenic analysis of proteins based on domain structure

The Rho family of small GTPases are important regulators of multiple cellular activities and, most notably, reorganization of the actin cytoskeleton. Dbl-homology (DH)-domain-containing proteins are the classical guanine nucleotide exchange factors (GEFs) responsible for activation of Rho GTPases. However, members of a newly discovered family can also act as Rho-GEFs. These CZH proteins include: CDM (Ced-5, Dock180 and Myoblast city) proteins, which activate Rac; and zizimin proteins, which activate Cdc42. The family contains 11 mammalian proteins and has members in many other eukaryotes. The GEF activity is carried out by a novel, DH-unrelated domain named the DOCKER, CZH2 or DHR2 domain. CZH proteins have been implicated in cell migration, phagocytosis of apoptotic cells, T-cell activation and neurite outgrowth, and probably arose relatively early in eukaryotic evolution.

Published articles on this project:

  • Meller N, Merlot S, Guda C, CZH proteins-New family of Rho GEFs. Journal of Cell Science (2005) 118: 4937-4946 [Pubmed]
  • Meller N, Westbrook JM, Shannon JD, Guda C, Schwartz MA, Function of the N-terminus of zizimin1: autoinhibition and membrane targeting. Biochemical Journal (2008)409:525-533 [Pubmed]