Etablissement Université de M’Sila - Mohamed Boudiaf Affiliation Institut d’Informatique Auteur YOUCEF, Gheraibia Directeur de thèse Moussaoui

Mémoires de Fin d’Etudes
Etablissement Université de M’Sila - Mohamed Boudiaf Affiliation Institut d’Informatique Auteur YOUCEF, Gheraibia Directeur de thèse Moussaoui abdelouahab (Maitre de conférence) Filière Informatique Diplôme Magister Titre Méthode hybride pour la prédiction des structures secondaires des protéines Mots clés Mots cl´es : Pr´ediction des structures secondaires, prot´eines, acides amin´es, al- gorithme g´en´etique, classifieur de Bayes na¨ıf, Knn : كلمات مفتاحيه . بايز, خوارزمية جينية , أحماض أمنية , التنبؤ بالبنية ,البنية الثانوية للبروتين Key Words : Secondary structure prediction, Protein, amino acids, Genetic algorithm, Bays, Knn. Résumé RESUME La prediction des structures secondaires des proteines est une etape impor- tante sur le chemin pour definir sa structure tridimensionnelle et sa fonction. Ce travail d´ecrit une nouvelle m´ethode pour la prediction des structures secondaires des prot´eines bas´ee sur les techniques de fouille de donn´ees et l’apprentissage automatique. Beaucoup de m´ethodes ont ´et´e d´evelopp´ees pour pr´edire les struc- tures secondaires des prot´eines depuis sa s´equence en acides amin´es, ces m´ethodes peuvent r´ealiser des pr´edictions avec un taux de suˆret´e globale jusqu’`a 80%. Dans ce travail notre intention est de combiner plusieurs m´ethodes afin d’avoir un taux maximum de pr´ediction. Notre travail est divis´e en trois parties ; premi`erement, nous pr´evoyons la structure secondaire de chaque acide amin´e avec un classi- fieur de Bayes na¨ıf, cette tˆache est bas´ee sur les pr´ef´erences des acides amin´es pour les diff´erentes structures secondaires. Deuxi`eme partie, nous exploitons un algorithme ´evolutionnaire pour am´eliorer cette pr´ediction, qui est bas´ee sur les propri´et´es physico-chimiques des r´egions des prot´eines. La derni`ere partie, nous avons d´evelopp´e une banque de fragments qui contient les fragments de prot´eine fr´equemment d´etect´es dans la banque de donn´ees de prot´eine (PDB), cette tˆache est bas´ee sur l’homologie de s´equences. Avec notre m´ethode nous avons am´elior´e le taux actuel par 4.5%, ainsi nous avons atteint un taux de 85.89% avec le PDB (Protein data bank). .ملخص التنبؤ بالبنية الثانوية للبروتين مرحلة هامة في الطريق نحو معرفة البنية ثلاثية الأبعاد للبروتين و وضيفته. هذا العمل هو عبارة عن طريقة جديدة من اجل التنبؤ بالبنية الثانوية للبروتين ترتكز على الأساليب الحديثة لاستخراج البيانات. عدة طرق طورت في الماضي من اجل التنبؤ بالبنية الثانوية للبروتين انطلاقا من بنيته الأولية من الأحماض الأمنية هذه الطرق تصل نسبة تنبؤها إلى غاية 80 بالمائة من البنية الصحيحة. هذا العمل هو عبارة عن مزج مجموعة من الطرق من اجل رفع نسبة التنبؤ. الفكرة المعروضة في هاذ العمل تنقسم إلى ثلاثة أقسام. أولا نتنبأ بالبنية الثانوية للبروتين عن طرق مصنف بايز و هذه العملية تعتمد على نسبة رغبة الأحماض الأمنية في تكوين بنيات ثانوية معينة. في المرحلة الثانية نستخدم خوارزمية تعتمد على نضريه التطور الجيني من اجل تطور النتيجة التي وصلنا لها سابقا. و في الأخير شكلنا بنك يحتوى على قطع البروتين التي لها بنية ثانوية معروفة بالطرق التجريبية و هذه المرحلة تعتمد على التراصف التسلسلي. استطعنا عن طريق الطريقة المبتكرة أن نطور نسبة التنبؤ ب 4.5 بالمائة إذ تصل نسبة صحة التنبؤ إلى 85 بالمائة ABSTRACT Prediction of secondary structure is an important step on the way to spell out its three dimensional structure and its function. This work describes a new method for prediction of secondary structure of protein based on contemporary machine learning methodologies and data mining. A lot of methods have been pro- duced to predict the protein secondary structure from the amino acids sequence, these methods can achieve up 80% overall accuracy. In this work our intention is to combine a several methods in order to achieve maximum accuracy. Our work is split into three parts. Firstly, we predict the secondary structure of each amino acids alone with naive bays classifier, this task is based on Amino acid preferences for different secondary structure. Secondly, we use an evolutionary algorithm to ameliorate this prediction, this prediction is based of physicochemical properties of protein regions. Finally, we have developed a fragments bank which contain the Protein fragments frequently detected in the protein data bank (PDB), this task is based on of the Sequence alignment but with restrainer dataset. With our method we have improved the best known predictive accuracy by 4.5% so we have attained 85.89% accuracy with the proteins data bank PDB. Date de soutenance 02/07/2012 Pagination 101 p Illusatration Relié Format 30cm Notes une copie de papier+ un CdRom Statut Soutenue

Etablissement Université de M’Sila - Mohamed Boudiaf Affiliation Institut d’Informatique Auteur YOUCEF, Gheraibia Directeur de thèse Moussaoui

Etablissement Université de M’Sila - Mohamed Boudiaf Affiliation Institut d’Informatique Auteur YOUCEF, Gheraibia Directeur de thèse Moussaoui

Tags & Categories

Own a Business?