
Due to that its backend Matlab is a closed commercial software, its availability is limited. used Matlab to implement deepUbi, a protein Ubiquitylation site prediction tool. built a computation model, MUscADEL, based on the long short term memory (LSTM) recurrent neural network. proposed a deep learning predictor MusiteDeep, based on convolutional neural networks, to predict and visualize protein post translational modification sites. However, these traditional machine learning methods employed feature engineering, which may lead to incomplete representations and biased results.ĭeep learning as a cutting-edge representation learning technique enables the production of high-level semantic features without handcrafted design, it has been widely applied to several PTM problems with large datasets. They treated these three types as three binary problems independently. In addition to the individual prediction of Ubiquitylation or SUMOylation sites, mUSP was proposed to predict their crosstalk. A recent work HseSUMO only employed four half-sphere exposure-based features to predict SUMOylation sites. Then, pSumo-cd applied a covariance discriminant algorithm in combination with a pseudo amino acid composition model. uses a scoring system based on a position frequency matrix. GPS-sumo employed a group-based prediction system (GPS) by a similarity clustering strategy to identify SUMOlytion sites. proposed a random-forest based predictor UbPred, in which 586 sequence attributes were detected from the input features.

established UbiProber, which extracted a set of features including physico-chemical property (PCP) and amino acid composition(AAC) to make Ubiquitylation site prediction. developed a method called UbiSite, using an efficient radial basis function (RBF) network to identify protein Ubiquitylation sites. Therefore, it is worthwhile to study the computational approaches.Īt present, several sequence-based approaches have been proposed to carry out the prediction of protein Ubiquitylation and SUMOylation sites respectively. Since most ubiquitinated and SUMOylated proteins are short-lived proteins with poor stability, the experimental approaches to identify protein Ubiquitylation and SUMOlytion sites might be costly and time-consuming. However, numerous potential Ubiquitylation and SUMOylation sites remain to be discovered from protein sequences. They are both highly conserved in evolution and related to diverse cellular activities including gene location, gene expression, and genome replication. As a major member of the family, small ubiquitin-related modifier (SUMO) proteins have similar 3D structures and biological modification processes to ubiquitins. Through the catalytic action of activating enzyme (E1), binding enzyme (E2), and ligase (E3), ubiquitins can covalently connect to the lysine residues of the target proteins. Ubiquitin is a small protein composed of 76 amino acids in eukaryotes. The proposed architecture managed to classify ubiquitylated and SUMOylated lysine residues along with their crosstalk sites, and outperformed other well-known Ubiquitylation and SUMOylation site prediction tools.

The corresponding APs reached 0.683, 0.804 and 0.552, which also validated our effectiveness. The promising AUCs of our method on Ubiquitylation, SUMOylation and crosstalk sites achieved 0.838, 0.888, and 0.862 respectively on tenfold cross-validation. Our deep learning architecture integrates several meta classifiers that apply deep neural networks to protein sequence information and physico-chemical properties, which were trained on multi-label classification mode for simultaneously identifying protein Ubiquitylation and SUMOylation as well as their crosstalk sites. This study is the first all-in-one deep network to predict protein Ubiquitylation and SUMOylation sites from protein sequences as well as their crosstalk sites simultaneously. However, existing methods generally rely on feature engineering, and ignore the natural similarity between the two types of protein translational modification. Several computational tools for predicting protein Ubiquitylation and SUMOylation sites have been proposed to study their regulatory roles in gene location, gene expression, and genome replication.
