Tez özetleri Astronomi ve Uzay Bilimleri Anabilim Dalı


Analysis of Genome Sequence Data Using Machine Learning Methods



Yüklə 0,84 Mb.
səhifə87/119
tarix03.01.2022
ölçüsü0,84 Mb.
#49487
1   ...   83   84   85   86   87   88   89   90   ...   119
Analysis of Genome Sequence Data Using Machine Learning Methods
Over the past century, the progress in biology and genetics fields has helped the birth of a new discipline called “Bioinformatics” and a better understanding of species variety, causes of diseases and along with their cures. Without a doubt, genome-wide studies which aim to understand genome with all of its aspects, have a major role in this progress. Nevermore, due to reduced sequencing costs by each sequencing system, “personalized medicine”, which is a core study field of genomic research, has become much more applicable. In this context, machine learning and genome analysis based on statistical methods have gained an important role.
Lentivectors derived from various types of viruses are used for gene transfer in gene therapy studies. In this study, a pattern search tool of which aim is to find symmetric/palindromic behavior observed in the integration regions of HIV (Human Immunodeficiency Virus) derived lentivirus vectors, has been developed. By using the pattern search tool on different test sets with different sequence features and parameters (like window width and gap between windows), optimal parameters specific to the problem have been determined. Significance of the results have been tested using statistical tests like z-test and Mann-Whitney-Wilcoxon ranksum test.
In second part of the study, Canonical Correlation Analysis method on which the developed pattern search tool depended, has been used to detect genomic regions with different “Linkage Equilibrium” values in case/control groups. By this way, distribution of candidate mutations causing to Behcet’s disease has been analyzed. Results proved that this methodology can be used to detect disease related and cross-correlated mutations.
In last part of the study, the relation between the genetic diversities and geographical locations of races has been studied. For this reason, the dataset which had been composed in context of Human Genome Diversity Project has been utilized and with the help of Principal Component Analysis method, a correlation (called as geo-genomic correlation) between the pairwise genetic distance and geographical distance of races has been found. Nevertheless, it is shown that much less number of Single Nucleotide Polymorphisms (SNP) are required to establish such correlation than using all SNPs.

  

WAHEED Sajjad

Tez Adı : Yetenek Yönetimi Ve Kariyer Planlama Sistemi Tasarımı

Danışman : Prof. Dr. Ahmet SERTBAŞ

Anabilim Dalı : Bilgisayar Mühendisliği

Programı : -

Mezuniyet Yılı : 2013

Tez Savunma Jürisi : Prof. Dr. Ahmet ŞERTBAŞ

Prof. Dr. Selim AKYOKUŞ

Doç. Dr. Halil ZAİM

Prof. Dr. Sıbkat KAÇTIOĞLU

Yrd. Doç. Dr. M Ali AYDIN




Yüklə 0,84 Mb.

Dostları ilə paylaş:
1   ...   83   84   85   86   87   88   89   90   ...   119




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2025
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin