中文摘要:目前,在传统抗生素失效的情况下,越来越多地认为噬菌体是治疗各种细菌感染的替代药物。然而,鉴定噬菌体的宿主特异性仍然是一个费时费力的过程。为了减轻这一负担,本研究开发了一种新型机器学习法,可通过已注释的受体结合蛋白(RBP)序列数据来预测噬菌体宿主。本研究主要预测ESKAPE组菌、大肠杆菌、肠炎沙门氏菌和艰难梭菌的宿主,还比较了该预测模型与广泛使用的基本局部匹配查询工具(BLAST)的性能。对于所收集的数据中不同水平的序列相似性,该最佳预测模型的精确召回曲线区面积(PR-AUC)得分在73.6%到93.8%之间。当数据中的序列相似性较高时,该模型的性能与BLASTp相当;而当序列相似性降至75%以下时,其性能优于BLASTp。可见,本机器学习法在序列与其他已知序列相似性较低的情况下性能优异。预测新的宏基因组RBP序列的宿主,可以扩展我们的工具箱,通过交换RBP来调整噬菌体或噬菌体尾部样细菌素的宿主谱。
外文摘要:Nowadays, bacteriophages are increasingly considered as an alternative treatment for a variety of bacterial infections in cases where classical antibiotics have become ineffective. However, characterizing the host specificity of phages remains a labor- and time-intensive process. In order to alleviate this burden, we have developed a new machine-learning-based pipeline to predict bacteriophage hosts based on annotated receptor-binding protein (RBP) sequence data. We focus on predicting bacterial hosts from the ESKAPE group, Escherichia coli, Salmonella enterica and Clostridium difficile. We compare the performance of our predictive model with that of the widely used Basic Local Alignment Search Tool (BLAST). Our best-performing predictive model reaches Precision-Recall Area Under the Curve (PR-AUC) scores between 73.6 and 93.8% for different levels of sequence similarity in the collected data. Our model reaches a performance comparable to that of BLASTp when sequence similarity in the data is high and starts outperforming BLASTp when sequence similarity drops below 75%. Therefore, our machine learning methods can be especially useful in settings in which sequence similarity to other known sequences is low. Predicting the hosts of novel metagenomic RBP sequences could extend our toolbox to tune the host spectrum of phages or phage tail-like bacteriocins by swapping RBPs.
作者:Boeckaerts, D;Stock, M;Criel, B;Gerstmans, H;De Baets, B;Briers, Y
作者单位:Univ Ghent;Katholieke Univ Leuven
期刊名称:SCIENTIFIC REPORTS
期刊影响因子:3.998
出版年份:2021
出版刊次:1
点击下载:通过受体结合蛋白序列预测噬菌体宿主