Reconstructing the Evolutionary History of Complex Human Gene Clusters Y. Zhang, G. Song, T. Vinar



Yüklə 445 b.
tarix06.09.2018
ölçüsü445 b.
#78585


Reconstructing the Evolutionary History of Complex Human Gene Clusters

  • Y. Zhang, G. Song, T. Vinar,

  • E. D. Green, A. Siepel, W. Miller

  • RECOMB 2008


Outline

  • Motivation

  • Problem definition

  • Simple algorithm for simple model

  • SIS algorithm for complex model

  • Results on human genome

  • Evaluation of SIS algorithm



Gene cluster

  • Group of related genes

  • Probably formed by duplications:

    • followed by functional diversification
    • with deletions, cause human genetic diseases


“History” of a gene cluster?



Dot-plots of self-alignments



Data Preparation

  • Atomic segments:

    • Self-alignment (forward and reverse-complement): pairs A ≈ B
    • Transitive closure property:
    • A ≈ B, B ≈ C A ≈ C
    • Maximize each alignment
  • Collapsible segments:



Computational Problem

  • Input: sequence of atomic (non-collapsible) segments

  • Output: (most probable) sequence of events (or the number of events) such that if we unwind these events in the input sequence, we obtain a sequence containing only a single atomic segment



Simple Model

  • Assumptions:

    • Only duplications (possibly with reversal and tandem duplications)
    • No duplication inside the originating region
    • The most important:


Simple Model (2)

  • Given assumptions 1 e 2, each event is

  • of “identical” (reversed) regions of consecutive atomic segments ( copied to )

  • Problem statement



Simple Model (3)

  • Candidate definition

  • Is this definition reasonable?



Simple Model (4)

  • Algorithm

  • Analysys



Simple Model: limitations

  • Assumptions:

    • “large scale deletions are likely to occur”
    • “atomic boundary reuses violating assumption are not uncommon”
  • Same number of events, but multiple way of reconstructing the history

    • NB: not solved with SIS, but you can assign probability..


Stochastic Model

  • Event :

    • Duplication (possibly with reversal)
    • Deletion (with restrictions)
  • History:

  • Target distribution of histories:

    • is the number of reused atomic boundaries and


Algorithm for Stochastic Model

  • Sample histories from the target distribution and compute the mean value (of the function we are interested in):

    • sample from
    • given , estimate
    • e.g., gives the number of events
  • Problem: how to sample from the target distribution?



Sequential Importance Sampling (SIS)

  • Goal: compute

  • Target distribution:

  • Trial distribution:

  • Sample’s weight:

  • Output:



SIS for gene cluster history

  • Duplication



SIS for gene cluster history (2)

  • Deletion

    • only without atomic boundary reuse
    • only if “the atomic segment pair flanking a deletion site appears elsewhere” in the seq.


Application to Human Gene Clusters

  • human genome assembly hg18 self alignment: 457 duplicated regions

    • alignments: at least 500 bp, >= 70% identity
    • segments separated by no more than 500 Kbp
    • only long and non-trivial regions considered
  • 165 biomedically interesting clusters (~111 Mbp)

  • 5 divergence thresholds: GA, OWM, NWM, LG, DOG

  • 825 combinations of gene cluster and divergence threshold



Application to Human Gene Clusters (2)



Application to Human Gene Clusters (3)



Application to Human Gene Clusters (4)

  • “… help prioritize the selection of notably interesting gene clusters for more detailed comparative genomics studies”

  • “… to compare cluster dynamics in certain lineages to observed phenotypic differences among primates”

  • “… such sequence data should reveal differences among primate species of possible relevance for selecting species for further biomedical studies”



Evaluation of the SIS Algorithm

  • Estimate parameters from the human genome for the events (e.g., duplications):

    • 39% of duplication “reversed”, deletions=2% of duplications
  • Starting from 500 Kbp sequence, generate 10 genome clusters for N= 10,20,…,100 events



Evaluation of the SIS Algorithm (2)



Discussion

  • Main Limitations:

    • details of data preparation
    • choice of the parameters/distributions
    • evaluation of the SIS algorithm
  • Future Directions:

    • include other types of events
    • understand the stochastic model
    • how to evaluate a model?
    • how to evaluate an algorithm?


Yüklə 445 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin