Mémoires de Fin d’Etudes
Etablissement
Université de Béjaia - Abderrahmane Mira
Affiliation
Département d’Informatique
Auteur
IKKEN, Sonia
Directeur de thèse
KECHADI , Mohand Tahar (Professeur)
Filière
Informatique
Diplôme
Doctorat
Titre
Cloud storage management for high performance datamining systems: Implementation and validation
Mots clés
Cloud Computing; High Performance Data Mining; File Systems; Operating Systems; Computing Paradigm Distributed Systems
Résumé
Most of data mining systems that have been developed, so far, are for systems such as grids, clusters, and distributed clusters. These systems assume that processors are the scarce resource and therefore must be shared. The main computing paradigm of these platforms is that when the processors become available, new data is gathered and the computations can start again. In a distributed type of computing platform, usually the data is distributed among the processors. The computations are performed using a message passing or cloud services library, the results are gathered, and the process is repeated by moving new data to the processors. With this computing paradigm in mind, high performance data mining application have been developed and implemented to take advantage of powerful, but shared pools of processors. However, in the case of data mining a good proposition of the response time is spent transferring the data to or near the processors. In this project, the aim is to improve the performance of data mining application by trying to reduce the data transfer issue, as the data can be very large and the network latency is extremely high compared to the processor speed. So, the processors may end up idle for very long periods, leading to significant wasting time, energy and even money for the users. In this project, we propose to study this problem by: • Reviewing the storage management services offered on the cloud. • Reviewing mechanisms of data transfer and computations, such as long-term persistent storage, replication, distributed indexed files, ect. More precisely, the goal is to study the cloud file systems and their organization, and propose new methods for storing, managing, and processing (such as stream-processing) the data contained in these file systems. These new storage mechanisms will be based on the data-driven application behavior, patterns, and the target computing platform, which is cloud computing.
Statut
Vérifié