Skip navigation

MATH6210 Data Mining

Offered By Department of Maths
Academic Career Graduate Coursework
Course Subject Mathematics
Offered in First Semester, 2010 and Second Semester, 2010
Unit Value 6 units
Course Description The main focus of the course will be supervised learning, primarily for classification.  The emphasis will be on practical applications of the methodologies that are described, with the R system used for the computations.

Attention will be given to:

  1. Generalizability and predictive accuracy, in the practical contexts in which methods are applied.
  2. Low-dimensional visual representation of results, as an aid to diagnosis and insight.
  3. Interpretability of model parameters, including potential for misinterpretation.
There will be very limited attention to regression methods with a continuous outcome variable.  Relevant statistical theory will mostly be assumed and described rather than derived mathematically.  There will be somewhat more attention to the mathematical derivation and description of algorithms.

Topic to be covered include:

  • Basic statistical ideas - populations, distributions, samples and random samples
  • Classification models and methods - including: linear discriminant analysis; trees; random forests; neural nets; boosting and bagging approaches; support vector machines.
  • Linear regression approaches to classification, compared with linear discriminant analysis,
  • The training/test approach to assessing accuracy, and cross-validation.
  • Strategies in the (common) situation where source and target population differ, typically in time but in other respects also.
  • Unsupervised models - kmeans, association rules, hierarchical clustering, model based clusters.
  • Low-dimensional views of classification results - distance methods and ordination.
  • Strategies for working with large data sets.
  • Practical approaches to classification with real life data sets, using different methods to gain different insights into presentation.
  • Privacy and security.
  • Use of the R system for handling the calculations.

Note: Graduate students attend joint classes with undergraduates but will be assessed separately.

Learning Outcomes

On satisfying the requirements of this course, students will have the knowledge and skills to:

1. Explain the fundamental issues involved in the use of the training/test methodology, cross-validation and the bootstrap to provide accuracy assessments.
2. Understand and explain ideas of source and target sample, and their relevance to the practical application of classification and other data mining techniques.
3. Demonstrate accurate and efficient use of classification and related data mining techniques, using the R system for the computations.
4. Demonstrate capacity for mathematical reasoning through analyzing, proving and explaining concepts from the theory that underpins classification and related data mining methods.
5. Apply problem-solving using classification and related data mining techniques to diverse situations in business, biology, engineering and other sciences.
Indicative Assessment

Assessment will be based on:

  • 3 Assignments (60%; LO 1-5)
  • Presentation (40%; LO1-5)
Course Classification(s) AdvancedAdvanced courses are designed for students having reached 'first degree' level of assumed knowledge, which provide a deep understanding of contemporary issues; or 'second degree' and higher levels of knowledge; or for transition to research training programs. and SpecialistSpecialist courses are designed for students having reached 'first degree' level of assumed knowledge, which provide for the acquisition of specialist skills; or 'second degree' and higher level of knowledge; or for transition to research training programs; or knowledge associated with professional accreditation.
Areas of Interest Mathematics
Eligibility Bachelor degree; with third year Mathematics.
Requisite Statement Third year Mathematics is required. 
Consent Required Please contact MATHSadmin@maths.anu.edu.au for consent to enrol in this course
Programs Master of Mathematical Sciences

The information published on the Study at ANU 2010 website applies to the 2010 academic year only. All information provided on this website replaces the information contained in the Study at ANU 2009 website.

Updated:   13 Nov 2015 / Responsible Officer:   The Registrar / Page Contact:   Student Business Solutions