Integration of multiple types of genomic data can produce high-quality predictive models and shed new light on the molecular mechanisms at play (Cancer Genome Atlas Research Network, 2011). This cannot be achieved by simply piling up the data but data needs to be integrated. Multiple mechanisms for multi-level genomic data integration are possible.
The first level of integration of genomic data is identifier mapping. For example, the oligonucleotide probe set detecting a particular transcript on a gene expression microarray must be linked to the name of the corresponding gene. Similarly, identifiers for CpG sites on a DNA methylation microarray or for probes on a CNV microarray must be linked to the names of the corresponding genes. Although tools and databases exist for this purpose, it is not trivial as there is rarely a one-to-one unambiguous mapping between different molecular entities and the corresponding identifiers.
At a higher level, molecular pathways provide a powerful unifying framework for genomic data integration. Disturbances over sets of genes that do not make sense when they are considered individually become meaningful when these genes are mapped to biological pathways.
Finally, integration at the level of the biological functions themselves can bring insight and clarity, for example through the use of ontologies such as GeneOntology.