Skip navigation

MATH3346 Data Mining Honours

Later Year Course

Offered By Department of Mathematics
Academic Career Undergraduate
Course Subject Mathematics
Offered in MATH3346 will not be offered in 2011
Unit Value 6 units
Course Description

The main focus of the course will be supervised learning, primarily for classification. The emphasis will be on practical applications of the methodologies that are described, with the R system used for the computations. Attention will be given to

1) Generalisability and predictive accuracy, in the practical contexts in which methods are applied.

2) Low-dimensional visual representation of results, as an aid to diagnosis and insight.

3) Interpretability of model parameters, including potential for misinterpretation.

There will be very limited attention to regression methods with a continuous outcome variable. Relevant statistical theory will mostly be assumed and described rather than derived mathematically. There will be somewhat more attention to the mathematical derivation and description of algorithms.

Topic to be covered include:

  • Basic statistical ideas - populations, distributions, samples and random samples
  • Classification models and methods - including: linear discriminant analysis; trees; random forests; neural nets; boosting and bagging approaches; support vector machines.
  • Linear regression approaches to classification, compared with linear discriminant analysis,
  • The training/test approach to assessing accuracy, and cross-validation.
  • Strategies in the (common) situation where source and target population differ, typically in time but in other respects also.
  • Unsupervised models - kmeans, association rules, hierarchical clustering, model based clusters.
  • Low-dimensional views of classification results - distance methods and ordination.
  • Strategies for working with large data sets.
  • Practical approaches to classification with real life data sets, using different methods to gain different insights into presentation.
  • Privacy and security.
  • Use of the R system for handling the calculations.

Note: This is an HPC, available as an HPC for students with outstanding results in mathematical and/or computing later year courses. Students will be required to do an indepth presentation of a current research topic, as well as demonstrate the use of advanced data mining techniques on data sets from numerous application areas.

Learning Outcomes

On satisfying the requirements of this course, students will have the knowledge and skills to:

1. Explain the fundamental issues involved in the use of the training/test methodology, cross-validation and the bootstrap to provide accuracy assessments.
2. Understand and explain ideas of source and target sample, and their relevance to the practical application of classification and other data mining techniques.
3. Demonstrate accurate and efficient use of classification and related data mining techniques, using the R system for the computations.
4. Demonstrate capacity for mathematical reasoning through analyzing, proving and explaining concepts from the theory that underpins classification and related data mining methods.
5. Apply problem-solving using classification and related data mining techniques to diverse situations in business, biology, engineering and other sciences.
Indicative Assessment

Assessment will be based on:

  • 3 Assignments (60%; LO 1-5)
  • Presentation (30%; LO1-5)
  • Commentary on other Presentations (10%; LO 1-5)
Areas of Interest Mathematics
Requisite Statement Student requires outstanding results in mathematical and/or computing later year courses for enrolling in this course. 
Consent Required Please contact MATHSadmin@maths.anu.edu.au for consent to enrol in this course.
Science Group C
Academic Contact admin.teaching.msi@anu.edu.au

The information published on the Study at ANU 2011 website applies to the 2011 academic year only. All information provided on this website replaces the information contained in the Study at ANU 2010 website.

Updated:   13 Nov 2015 / Responsible Officer:   The Registrar / Page Contact:   Student Business Solutions