Data Mining (95-791)

Data mining - intelligent analysis of information stored in data sets - has gained a substantial interest among practitioners in a variety of fields and industries. Nowadays, almost every organization collects data, which can be analyzed in order to support making better decisions, improving policies, discovering computer network intrusion patterns, designing new drugs, detecting credit fraud, making accurate medical diagnoses, predicting imminent occurrences of important events, monitoring and evaluation of reliability to preempt failures of complex systems, etc. 

This course is designed to give students a strong foundation in the methodologies, technologies and algorithms employed in the data mining field. The emphasis is on understanding the application of a wide range of modern machine learning techniques to specific data analysis scenarios rather than on mastering the theoretical underpinnings of individual techniques. The course covers supervised and unsupervised learning techniques including classification, clustering, prediction and forecasting and their implementations in renowned machine learning algorithms. Students will also gain experience with employing cutting edge visualization and data mining tools. During the course well-known data mining process methodologies will also be discussed along with practical case studies. 

Learning Objectives: 

• Use R to run many of the commonly used data mining methods 
• Understand the advantages and disadvantages of various methods 
• Compare the utility of different methods 
• Reliably perform model/feature selection 
• Use resampling-based approaches to assess model performance and reliability 

• Perform analyses of real world data

  • Units