※ Computational resources of protein N-phosphorylation
Introduction:
With the recent development of tools for studying histidine, arginine and lysine phosphorylation, N-phosphorylation has begun to emerge as an important new area. Recent report suggests that N-phosphorylation plays a key role in cell signaling, and may become as frequently phosphorylated as serine, threonine and tyrosine (Leijten,et al., 2022). Identification of N-phosphorylated proteins is fundamental for understanding the molecular mechanisms of N-phosphorylation. Besides experimental approaches, prediction of potential candidates with computational methods has also attracted great attention for its convenience and fast-speed. In this review, we present a comprehensive but brief summarization of computational resources on protein N-phosphorylation, including N-phosphorylation databases, prediction of N-phosphorylation sites and other tools.
We apologize for not including computational studies without any databases or tools since it's not easy for experimentalists to use these studies directly. We are grateful for users feedback. Please inform Dr. Yu Xue or Weizhi Zhang to add, remove or update one or multiple web links below.
Index:
<1> N-Phosphorylation databases
<2> Prediction of N-phosphorylation sites
<3> Structure databases
<4> Gene expression databases
<5> Cancer genomic databases
<6> Implemented tools
<7> Miscellaneous tools
<8> Detection of potential phosphorylation sites from mass spectrometry data
<1> N-Phosphorylation databases:
1. HisPhosSite : Contains 13,505 phosphorylated proteins with 16,154 pHis sites derived from 1,374 species (Zhao, et al., 2021).
2. PubMed: Comprises more than 4 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full text content from PubMed Central and publisher web sites.
<2> Prediction of N-phosphorylation sites:
1. pHisPred : Predicts histidine phosphorylation sites in eukaryotic and prokaryotic proteins (Zhao, et al., 2022).
2. PROSPECT: A web server for predicting protein histidine phosphorylation sites based on a hybrid method (Chen, et al., 2020).
1. UniProt : For each protein annotation, the "Amino acid modifications" in the "Sequence annotation (Features)" section collected the post-translational modification information of proteins (UniProt Consortium,et al., 2021).
2. PDB: A leading resource of structural data of biological macromolecules (Berman HM,et al., 2000).
3. AlphaFold2: An artificial intelligence (AI) system developed by DeepMind that can predict three-dimensional (3D) structures of proteins from amino acid sequences with atomic-level accuracy (Jumper,et al., 2021).
<4> Gene expression databases:
1. HPA : A Swedish-based program initiated in 2003 with the aim to map all the human proteins in cells, tissues, and organs using an integration of various omics technologies, including antibody-based imaging, mass spectrometry-based proteomics, transcriptomics, and systems biology (Uhlén, et al., 2015).
2. DICE: A database that contains reference transcriptomic and epigenomic maps in human immune cell types from healthy subjects, functional single nucleotide polymorphisms (SNPs) that affect gene expression (eQTLs and chromatin QTLs) in immune cells, and regulatory mechanisms and novel target genes implicated in the development of human disease (Schmiedel, et al., 2018).
1. TCGA : A landmark cancer genomics program that molecularly characterized over 20,000 primary cancer and matched normal samples spanning 33 cancer types (Liu, et al., 2022).
2. ICGC: A collaborative effort to characterize genomic abnormalities in 50 different cancer types (Zhang, et al., 2019).
3. COSMIC: The world's largest and most comprehensive resource for exploring the impact of somatic mutations in human cancer (Tate, et al., 2022).
1. Echarts: An Open Source JavaScript Visualization Library.
2. IUPred: The web server takes a single amino acid sequence as an input and calculates the pairwise energy profile along the sequence (Dosztányi,et al., 2021).
3. 3Dmol.js: A modern, object-oriented JavaScript library for visualizing molecular data.
4. NetSurfP: A tool for predicting solvent accessibility, secondary structure, structural disorder and backbone dihedral angles for each residue of an amino acid sequence (Marcatili,et al., 2022).
1. iLearnPlus: A comprehensive and automated machine-learning platform for nucleic acid and protein sequence analysis, prediction and visualization (Chen, et al., 2021).
2. DOG
2.0
: Prepares
publication-quality figures of protein domain structures.
The scale of a protein domain and the position of
a functional motif/site will be precisely calculated
(Ren, et al., 2009).
3. HemI
1.0
: An easy-to-use tool can visualize either gene or protein expression data in heatmaps. Additionally, the heatmaps can be recolored, rescaled or rotated in a customized manner. In addition, HemI provides multiple clustering strategies for analyzing the data. Publication-quality figures can be exported directly
(Deng, et al., 2014).
4. PhosSNP 1.0: A genome-wide analysis of genetic polymorphisms that influence protein phosphorylation in H. Sapiens. It was estimated that ~69.76% of nsSNPs (non-synonymous SNPs) are potential phosSNPs (Phosphorylation-related SNPs) (64, 035) in 17, 614 proteins (Ren, et al., 2010).
5. dbPPT 1.0
: A comprehensive resource of plant protein phosphorylation that contains 82,175 phosphorylation sites in 31,012 proteins from 20 plant organisms. The phosphorylation sites in dbPPT were manually curated from the literatures, while datasets in other public databases were also integrated
(Cheng, et al., 2014).
6. EPSD
: A comprehensive data resource updated from two databases of dbPPT and dbPAF , which contained 82,175 p-sites of 20 plants and 483,001 p-sites of 7 animals and fungi, respectively
(Lin, et al., 2020).
<8> Detection of potential phosphorylation sites from mass spectrometry data:
1. PhosphoScore: A phosphorylation assignment program that is compatible with all levels of tandem mass spectrometry spectra (MSn) generated through the Bioworks/Sequest platform. The program utilizes a "cost function" which takes into account both the match quality and normalized intensity of observed spectral peaks compared to a theoretical spectrum. PhosphoScore was written in Java (Ruttenberg, et al., 2008).
2. Ascore: Measures the probability of correct phosphorylation site localization based on the presence and intensity of site-determining ions in MS/MS spectra (Beausoleil, et al., 2006).
3. Colander: A probability-based support vector machine algorithm for automatic screening for CID spectra of phosphopeptides prior to database search (Lu, et al., 2008).
4. DeBunker: A SVM-based software, which could automatically validate phosphopeptide identifications from tandem mass spectra (Lu, et al., 2007).
5. APIVASE 2.2: Developed for phosphopeptide validation by combining the information obtained from MS2 spectra and its corresponding neutral loss MS3 spectra (Jiang, et al., 2008).
6. InsPecT: A new scoring function was developed for phosphorylated peptide tandem mass spectra for ion-trap instruments, without the need for manual validation (Payne, et al., 2008).
7. ArMone: A new phosphoproteomic technologies was developed for phosphorylated peptide tandem mass spectra as stand-alone application with friendly graphic user interface (Jiang, et al., 2010).
