Kıyaslama Veri Kümelerinin Protein Yapı Tahminine Etkisi: Bir Kavram Çalışması

Protein yapılarının bilinmesi hayati fonksiyonlarda görev alan proteinlerin görev tanımlarının anlaşılabilmesi, ilaç tasarımı ve daha birçok açıdan öneme sahiptir. Protein yapı tahmini ise laboratuvar ortamında oldukça uzun zaman alan süreci kısaltmak için alternatif bir biyoinformatik alt çalışma alanıdır. Bu alanda geliştirilen yöntemlerin performans analizleri genel itibariyle kıyaslama (benchmark) veri kümeleri üzerinden yapılmaktadır. Veri kümelerinin büyüklüğü algoritma çalışma zamanlarına doğrudan etki etmektedir. Bu çalışmada kapsamında kıyaslama veri kümelerinin sonuçlara nasıl yansıdığı analiz edilmiştir. Çalışma kapsamında iki CB513 ve EVASet olmak üzere iki farklı kıyaslama veri kümesi, JPred ve Porter olmak üzere iki farklı protein yapı tahmini yöntemi kullanılmıştır. Çalışma, protein özellikleri açısından geniş kapsamlı ancak, veri büyüklüğü anlamında olabildiğince az veri içerecek olan benchmark veri kümeleri geliştirme fikri itibariyle sonraki çalışmalar için esin kaynağı niteliğindedir.

Anahtar Kelimeler:

Protein yapı tahmini, Kıyaslama veri kümesi, Kavram

Effect of Benchmark Datasets on Protein Structure Prediction As a Concept

Knowing the protein structures is essential in understanding the job descriptions of proteins involved in vital functions, drug design, and many more. On the other hand, protein structure prediction is an alternative bioinformatics sub-study field to shorten the process that takes a long time in the laboratory environment. Performance analyzes of the methods developed in this field are generally made on benchmark datasets. The size of the datasets directly affects the algorithm runtime. In this study, how to benchmark datasets are reflected in the results is analyzed. Within the scope of the study, two different benchmark datasets, CB513 and EVASet, and two different protein structure prediction methods, JPred and Porter, were used. The study is a source of inspiration for further studies with the idea of developing benchmark datasets that are comprehensive in terms of protein properties but contain as little data as possible in terms of data size.

Keywords:

Protein structure prediction, Benchmark dataset, Concept,

PDF

___

Asai, K., Hayamizu, S., & Handa, K. I. (1993). Prediction of protein secondary structure by the hidden Markov model. Bioinformatics, 9(2), 141-146.
Atasever, S., Azgınoglu, N., Erbay, H., & Aydın, Z. (2021). 3-State Protein Secondary Structure Prediction based on SCOPe Classes. Brazilian Archives of Biology and Technology, 64.
Aydin, Z., Azginoglu, N., Bilgin, H. I., & Celik, M. (2019). Developing structural profile matrices for protein secondary structure and solvent accessibility prediction. Bioinformatics, 35(20), 4004-4010.
Azginoglu, N., Aydin, Z., & Celik, M. (2020). Structural profile matrices for predicting structural properties of proteins. Journal of Bioinformatics and Computational Biology, 18(04), 2050022.
Bouziane, H., Messabih, B., & Chouarfia, A. (2015). Effect of simple ensemble methods on protein secondary structure prediction. Soft Computing, 19(6), 1663-1678.
Bujnicki, J. M., Elofsson, A., Fischer, D., & Rychlewski, L. (2001). LiveBench‐1: Continuous benchmarking of protein structure prediction servers. Protein Science, 10(2), 352-361.
Cuff, J. A., & Barton, G. J. (1999). Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins: Structure, Function, and Bioinformatics, 34(4), 508-519.
Drozdetskiy, A., Cole, C., Procter, J., & Barton, G. J. (2015). JPred4: a protein secondary structure prediction server. Nucleic acids research, 43(W1), W389-W394.
Jones, D. T. (1999). Protein secondary structure prediction based on position-specific scoring matrices. Journal of molecular biology, 292(2), 195-202.
Holley, L. H., & Karplus, M. (1989). Protein secondary structure prediction with a neural network. Proceedings of the National Academy of Sciences, 86(1), 152-156.
Koh, I. Y., Eyrich, V. A., Marti-Renom, M. A., Przybylski, D., Madhusudhan, M. S., Eswar, N., ... & Rost, B. (2003). EVA: evaluation of protein structure prediction servers. Nucleic Acids Research, 31(13), 3311-3315.
Krishnan, K. V. (1932). The Defence Mechanism of the Human Body. The Indian medical gazette, 67(11), 637.
KU, L. L. (1952). Lane medical lectures: proteins and enzymes.
Mirabello, C., & Pollastri, G. (2013). Porter, PaleAle 4.0: high-accuracy prediction of protein secondary structure and relative solvent accessibility. Bioinformatics, 29(16), 2056-2058.
Le, Q., Sievers, F., & Higgins, D. G. (2017). Protein multiple sequence alignment benchmarking through secondary structure prediction. Bioinformatics, 33(9), 1331-1337.
Pearson, W. R., & Lipman, D. J. (1988). Improved tools for biological sequence comparison. Proceedings of the National Academy of Sciences, 85(8), 2444-2448.
Pirovano, W., & Heringa, J. (2010). Protein secondary structure prediction. Data Mining Techniques for the Life Sciences, 327-348.
Rost, B., & Eyrich, V. A. (2001). EVA: large‐scale analysis of secondary structure prediction. Proteins: Structure, Function, and Bioinformatics, 45(S5), 192-199.
Silverman, R. B., & Holladay, M. W. (2014). The organic chemistry of drug design and drug action. Academic press.
Spencer, M., Eickholt, J., & Cheng, J. (2014). A deep learning network approach to ab initio protein secondary structure prediction. IEEE/ACM transactions on computational biology and bioinformatics, 12(1), 103-112.
Van Goudoever, J. B., Vlaardingerbroek, H., van den Akker, C. H., de Groof, F., & van der Schoor, S. R. (2014). Amino acids and proteins. Nutritional Care of Preterm Infants, 110, 49-63.
Zemla, A., Venclovas, Č., Fidelis, K., & Rost, B. (1999). A modified definition of Sov, a segment‐based measure for protein secondary structure prediction assessment. Proteins: Structure, Function, and Bioinformatics, 34(2), 220-223.