Content Motivation About WinMiner



Yüklə 445 b.
tarix02.11.2017
ölçüsü445 b.
#26896



Content



Motivation : Data

  • STULONG Data : A 20 year longitudinal study of risk factors related to atherosclerosis in a population of middle-aged men

  • Tables ENTRY and CONTROL:

    • 1216 patients described by:
      • Identification and social characteristics
      • Behavior
      • Health events
      • Physical and biochemical examinations
    • From 1 up to 21 control per patients
  •  A sequence of controls for each patient



Motivation: Medical issues

  • identified risks factors

  • no treatment available

  • necessity to consider a global risk instead of concentrating prevention efforts on individual ones

  • risk comportments dramatically increases cardio-vascular disease emergence, but no one knows when

  •  Relations between risk factors and clinical demonstration of atherosclerosis?

  •  Time intervals over which these relations are valid?



Motivation: WinMiner

  • WinMiner: a single optimised way to find sequential patterns in data along with their optimal time intervals, under user constraints

  • WinMiner suggests to experts possible temporal dependencies among occurrences of event types

  • WinMiner outputs "small" collections of sequential patterns



About WinMiner

  • Mining context



About WinMiner

  • Selecting patterns

  • support: how many times an episode/episode rule occurs within an event sequence?

  • A  B A  B  C

  • confidence: what is the probability of the RHS of an episode rule to occur knowing that its LHS already occured?

  • A  B  C

  • patterns are selected using:

    • a minimum support threshold
    • a minimum confidence threshold


About WinMiner

  • Selecting the optimal window span



About WinMiner

  • WinMiner :

    • checks all possible episode rules satisfying to frequency and confidence thresholds
    • outputs only the FLM-rules, along with their respective optimal window sizes
    • uses a maximal gap constraint


DM effort: Aims

  • Give to the medical expert:

  • a mean to follow both the evolution of risk factors and:

  • (1) impact of medical intervention

      • (2) modifications in patients’ behavior
  • in addition:

    • significant time periods of observation
    • frequency
    • probability


DM effort: Data preprocessing

  • Mainly focused on table CONTROL (1226 patients/10572 examinations)

  • Joint operations to export information from table ENTRY

  • Categorization of some factors

  • Choice of relevant factors according to:

    • Medical expertise
    • Mining approach
  •  Table Contr_Mod_2



DM Effort: Data preprocessing

  • Important factors (according to medical experts):

    • cholesterol
    • hypertension
    • smoking
    • physical activity
    • age
    • diabetes
    • alcohol consumption
    • BMI
    • family anamnesis
    • level of education


DM Effort: Data preprocessing

  • Contr_mod_2  large event sequence

  • For each patient: a subsequence containing all his control examinations

  • Coding guarantees that events corresponding to 2 different patients can not be associated in the same episode rule

  • Large event sequence: concatenation of all sub sequences constructed for patients.



DM effort: Results

  • Examples:

    • "If the patient has no hypercholesterolemia, and if he sometimes follows his diet, then the patient has no hypercholesterolemia with a probability of 0.8 within 40 months. This rule is supported by 201 examples in the event sequence."  
    • " If one eats less of fats and carbohydrates and he has claudication observed some time later, then this claudication does not disappear with a probability of 0.8 over 30 months. This rule is supported by 21 examples. "


DM effort: Results

  • Well known phenomena:

    • indication about correctness in pre-processing as well as in mining data
  • Added-value: suggestion concerning their temporal aspects

  • To be expected:

    • with new data and new risk factors put in evidence in the last decade, discovering new phenomena along with their optimal window sizes


Conclusion

  • With STULONG data: Searching for temporal dependencies between atherosclerosis risk factors and clinical demonstration of atherosclerosis that have an optimal interval/window size

  • Offers to the medical expert a possibility to explicit impact of a risk factor and to refine its part in comparison with other ones within a time interval

  • A few episode rules obtained, that allows experts to manually analyse the outputs

  • Could be applied to other medical data sets to help in finding unknown phenomena

  •  New perspectives both for data miners and physicians



Yüklə 445 b.

Dostları ilə paylaş:




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©muhaz.org 2024
rəhbərliyinə müraciət

gir | qeydiyyatdan keç
    Ana səhifə


yükləyin