1DOCUMENT INFO 2
1.1Author 2
1.2Documents history 2
1.3Document data 2
1.4Distribution list 3
2Definitions and Abbreviations 5
3Introduction 6
3.1Breast cancer modelling and going beyond the state-of the art 7
4SUMMARY 10
5Data description 11
5.1Available data from TOP clinical trial 11
5.1.1Clinical Data 11
5.1.2Radiology Imaging Data 12
5.1.3Genomic Data 12
5.1.3.1 Gene Expression Data 12
5.1.3.2Affymetrix SNP and CNV data 12
5.1.3.3Illumina Methylation Data 12
5.2Expected data from other clinical trials 12
5.2.1Radiology Imaging Data 12
5.2.2Digital Pathology Images 13
5.2.3High-throughput Sequencing Data 13
6Clinical Scenarios 14
6.1Predictive Modelling Methodologies 14
6.1.1Feature Extraction from Images 14
6.1.2Feature Selection 15
6.1.3Integrating Heterogeneous Data 16
6.1.3.1Integration of Genomic Data 16
6.1.3.2Machine Learning Methods for Integration 16
6.1.4Kernel-Based Classification and MKL 19
6.1.5Decision Trees and Ensembles of Trees 20
6.1.6Evaluating the performance of the classifier 21
6.1.7Estimating the generalization error 23
6.1.8Feature Selection in Kernel Space 23
6.2Scenario A-Retrospective use of data 24
6.3Scenario B-Retrospective use of data 27
6.4Scenario C-Retrospective use of data 28
7Conclusion 30
8Appendix 31
8.1Scenario D-Retrospective use of clinical data 31
8.2Scenario E-Retrospective use of clinical data 33
8.3Scenario F-Retrospective use of imaging data 35
9REFERENCES 37