COMP6490 Document Analysis
| Offered By | Research School of Computer Science |
|---|---|
| Academic Career | Graduate Coursework |
| Course Subject | Computer Science |
| Offered in | Second Semester, 2012 and Second Semester, 2013 |
| Unit Value | 6 units |
| Course Description |
Processing of semi-structured documents such as internet pages, RSS feeds and their accompanying news items, and PDF brochures is considered from the perspective of interpreting the content. This course considers the \document" and its various genres as a fundamental object for business, government and community. For this, the course covers four broad areas: (A) information retrieval, (B) natural language processing, (C) machine learning for documents, and (D) relevant tools for the Web. Basic tasks here are covered including content collection and extraction, formal and informal natural language processing, information extraction, information retrieval, classification and analysis. Fundamental probabilistic techniques for performing these tasks, and some common software systems will be covered, though no area will be covered in any depth. |
| Learning Outcomes |
Upon successful completion of the course, the student will have an understanding of the role documents play in business and community, and the various digital resources available for document analysis. Moreover, the student will have the background theory and practical knowledge necessary to plan and execute a basic document analysis project. The student will be able to:
|
| Indicative Assessment |
Two written assignments with programming option (40%), written final exam (60%). |
| Workload |
Thirty one-hour lectures and six two hour tutorial/laboratory sessions |
| Course Classification(s) | AdvancedAdvanced courses are designed for students having reached 'first degree' level of assumed knowledge, which provide a deep understanding of contemporary issues; or 'second degree' and higher levels of knowledge; or for transition to research training programs. |
| Requisite Statement |
None |
| Recommended Courses |
Programming ability in C, C++ or Java, and basic mathematical and statistical knowledge, at an undergraduate-level |
| Prescribed Texts |
The following reference books will be used.
|
| Academic Contact | peter.christen@anu.edu.au |
The information published on the Study at ANU 2012 website applies to the 2012 academic year only. All information provided on this website replaces the information contained in the Study at ANU 2011 website.




