Name of the course: Data Mining Number of ECTS credits:
Contact :
Name and Given names : Jean-François Boulicaut
Phone : +33 (0)4 72 43 89 05
email : +33 (0)4 72 43 87 13
Other professor(s) if any : Christophe Rigotti
Exam Prepared lecture on a couple of influential data mining papers.
Course Content Data Mining has been identified as one of the ten emergent technologies of the 21st century (MIT Technology Review, 2001). This discipline aims at discovering knowledge from large amounts of data and its development has emerged at the intersection of various disciplines related to data processing, e.g., machine learning, database management, visualization, statistics. In a first part, we will provide an overview of the quite active research field of data mining and knowledge discovery in databases. Classical techniques (clustering and supervised classification, pattern discovery) will be considered. Examples of real-life data mining applications will concern, among others, basket data analysis, WWW usage data analysis, and knowledge discovery in living sciences (e.g., molecular biology).
A second part will be dedicated to constraint-based data mining and the emerging framework of inductive querying. After an introduction to this appealing formal framework, we will discuss some recent research topics related to the condensed representations of frequent patterns and constraint-based mining of sequential patterns. C1 KDD: motivations and terminology (Boulicaut)
C2 Data (Rigotti)
C3 Data exploration (Rigotti)
C4 Clustering (Rigotti)
C5 Classification (Rigotti)
C6 Association analysis (Boulicaut)
C7 Towards a theory of data mining (Boulicaut)
C8 Condensed representations for frequent patterns (Boulicaut)
C10 A research agenda(Boulicaut) The course is based on the excellent book by Pang-Ning Tan, Michael Steinbach and Vipin Kumar “Introduction to data mining” published in 2006 by Addison-Wesley (slides have been provided by the authors).
It will be possible to apply the techniques on benchmark data by using the software platform Weka (free software).
Targeted SkillsThe popular techniques for data mining (e.g., K-Means and hierarchical clustering, decision tree building, association rule mining) are understood. Some recent conceptual issues or data mining principles related to inductive querying and constraint-based mining are understood as well: it provides a conceptual framework for analysing the current research directions in data mining.