Note: This unit version is currently being edited and is subject to change!

COMP5046: Natural Language Processing (2019 - Semester 1)

Download UoS Outline

Unit: COMP5046: Natural Language Processing (6 CP)
Mode: Normal-Day
On Offer: Yes
Level: Postgraduate
Faculty/School: School of Computer Science
Unit Coordinator/s: Han, Caren
Session options: Semester 1
Versions for this Unit:
Site(s) for this Unit:
Campus: Camperdown/Darlington
Pre-Requisites: None.
Brief Handbook Description: This unit introduces computational linguistics and the statistical techniques and algorithms used to automatically process natural languages (such as English or Chinese). It will review the core statistics and information theory, and the basic linguistics, required to understand statistical natural language processing (NLP).

Statistical NLP is used in a wide range of applications, including information retrieval and extraction; question answering; machine translation; and classifying and clustering of documents. This unit will explore the key challenges of natural language to computational modelling, and the state of the art approaches to the key NLP sub-tasks, including tokenisation, morphological analysis, word sense representation, part-of-speech tagging, named entity recognition and other information extraction, text categorisation and syntactic parsing.

Students will implement many of these sub-tasks in labs and assignments. The unit will also investigate the annotation process that is central to creating training data for statistical NLP systems. Students will annotate data as part of completing a real-world NLP task.
Assumed Knowledge: Knowledge of an OO programming language
Lecturer/s: Han, Caren
Tutor/s: Xiang Dai

All email should be directed to sit.comp5046@sydney.edu.au rather than personal staff addresses.
Timetable: COMP5046 Timetable
Time Commitment:
# Activity Name Hours per Week Sessions per Week Weeks per Semester
1 Lecture 2.00 1 12
2 Laboratory 1.00 1 12
3 Independent Study 6.00 14
T&L Activities: Tutorial: practical software development exercises and in-class discussions.

Independent Study: study of texts and completion of assignments.

Practical work will be demonstrated with the Python programming language and encourage the use of relevant libraries such as the Natural Language Toolkit.

Attributes listed here represent the key course goals (see Course Map tab) designated for this unit. The list below describes how these attributes are developed through practice in the unit. See Learning Outcomes and Assessment tabs for details of how these attributes are assessed.

Attribute Development Method Attribute Developed
COMP5046 introduces specialised knowledge and skills in statistical natural language processing, including techniques from machine learning and data mining that apply specifically to text. It introduces dynamic programming algorithms and their related data structures that are required for processing language efficiently and accurately.

COMP5046 also introduces the standard methodology for developing and evaluating NLP systems.
Engineering/IT Specialisation (Level 4)

For explanation of attributes and levels see Engineering & IT Graduate Outcomes Table 2018.

Learning outcomes are the key abilities and knowledge that will be assessed in this unit. They are listed according to the course goal supported by each. See Assessment Tab for details how each outcome is assessed.

Maths/Science Methods and Tools (Level 4)
1. Apply basic statistical methods and information theory principles to modelling language.
2. Apply basic linguistic knowledge to identifying the structure of language.
Engineering/IT Specialisation (Level 4)
3. Design and apply an annotation scheme
4. Annotate text using linguistic annotation schemes (e.g. part of speech tags and phrase structure)
5. Analyse the quality of an annotation scheme and data annotated to that scheme
6. Develop machine learning and statistical methods for solving natural language tasks
7. Evaluate the performance of natural language processing systems
Assessment Methods:
# Name Group Weight Due Week Outcomes
1 Stage 1: annotation task No 10.00 Week 5 1, 2, 3, 5,
2 Stage 2: categorisation task No 20.00 Week 9 1, 6, 7,
3 Sequence tagging task No 20.00 STUVAC (Week 14) 1, 2, 6, 7,
4 Final Exam No 50.00 Exam Period 1, 2, 3, 4, 5, 6, 7,
Assessment Description: Three individual assignments take place through the teaching period, as well as a final written exam.

Penalties for lateness: 10% of the available marks per day late; maximum 7 days late (after that: 0).
Grading:
Grade Type Description
Standards Based Assessment Final grades in this unit are awarded at levels of HD for High Distinction, DI (previously D) for Distinction, CR for Credit, PS (previously P) for Pass and FA (previously F) for Fail as defined by University of Sydney Assessment Policy. Details of the Assessment Policy are available on the Policies website at http://sydney.edu.au/policies . Standards for grades in individual assessment tasks and the summative method for obtaining a final mark in the unit will be set out in a marking guide supplied by the unit coordinator.
Minimum Pass Requirement It is a policy of the School of Computer Science that in order to pass this unit, a student must achieve at least 40% in the written examination. For subjects without a final exam, the 40% minimum requirement applies to the corresponding major assessment component specified by the lecturer. A student must also achieve an overall final mark of 50 or more. Any student not meeting these requirements may be given a maximum final mark of no more than 45 regardless of their average.
Policies & Procedures: IMPORTANT: School policy relating to Academic Dishonesty and Plagiarism.

In assessing a piece of submitted work, the School of Computer Science may reproduce it entirely, may provide a copy to another member of faculty, and/or to an external plagiarism checking service or in-house computer program and may also maintain a copy of the assignment for future checking purposes and/or allow an external service to do so.

Other policies

See the policies page of the faculty website at http://sydney.edu.au/engineering/student-policies/ for information regarding university policies and local provisions and procedures within the Faculty of Engineering and Information Technologies.
Recommended Reference/s: Note: References are provided for guidance purposes only. Students are advised to consult these books in the university library. Purchase is not required.
Online Course Content: Via Canvas
Note on Resources: Students may be interested in the 3rd edition-in-draft of Jurafsky and Martin's "Speech and Language Processing" ( https://web.stanford.edu/~jurafsky/slp3/), which covers some recent developments and bridges some gaps relative to Manning and Schütze. While some materials are not yet available in SLP3, we rely on Manning and Schütze to provide some of the foundational materials.

Note that the "Weeks" referred to in this Schedule are those of the official university semester calendar https://web.timetable.usyd.edu.au/calendar.jsp

Week Description
Week 1 Lecture: Statistical Natural Language Processing
Lab: Counting & collocations
Week 2 Lecture: Regular Expressions and Language Models
Assignment 1 released
Lab: Classification scheme design
Week 3 Lecture: Lexical Semantics and Vectors of Meaning
Lab: Regular Expressions and Language Models
Week 4 Lecture: Linguistic Fundamentals
Lab: Evaluation and Experimental Design
Week 5 Lecture: Text classification (Naive Bayes, Perceptron, MaxEnt, ...)
Lab: Text classification and feature engineering
Assignment 2 released
Assessment Due: Stage 1: annotation task
Week 6 Lecture: Part of Speech Tagging
Lab: Part of Speech Tagging
Week 7 Lab: Lexico-semantic processing
Lecture: Information extraction I: Named entity recognition
Week 8 Lecture: Information extraction II: relation extraction and coreference
Lab: Relation extraction
Week 9 Lecture: Applied Natural Language Processing
Assignment 3 released
Lab: BioNLP and domain adaptation
Assessment Due: Stage 2: categorisation task
Week 10 Lecture: Parsing I: Syntactic ambiguity and CFGs
Lab: Latent feature spaces
Week 11 Lecture: Parsing II: Probabilistic Parsing
Lab: Parsing
Week 12 Lecture: Dependency Parsing and Semantic Role Labelling
Lab: Relation extraction with dependency paths
Week 13 Lecture: Exam Review
Lab: Relation extraction with syntactic features
STUVAC (Week 14) Assessment Due: Sequence tagging task
Exam Period Assessment Due: Final Exam

Course Relations

The following is a list of courses which have added this Unit to their structure.

Course Year(s) Offered
Bachelor of Advanced Computing/Bachelor of Commerce 2018, 2019
Bachelor of Advanced Computing/Bachelor of Science 2018, 2019
Bachelor of Advanced Computing/Bachelor of Science (Health) 2018, 2019
Bachelor of Advanced Computing/Bachelor of Science (Medical Science) 2018, 2019
Bachelor of Advanced Computing (Computational Data Science) 2018, 2019
Bachelor of Advanced Computing (Computer Science Major) 2018, 2019
Bachelor of Advanced Computing (Information Systems Major) 2018, 2019
Bachelor of Advanced Computing (Software Development) 2018, 2019
Bachelor of Computer Science and Technology (Honours) 2015, 2016, 2017
Bachelor of Computer Science and Technology (Honours) 2014 2013, 2014
Bachelor of Information Technology 2015, 2016, 2017
Bachelor of Information Technology/Bachelor of Arts 2015, 2016, 2017
Bachelor of Information Technology/Bachelor of Commerce 2015, 2016, 2017
Bachelor of Information Technology/Bachelor of Medical Science 2015, 2016, 2017
Bachelor of Information Technology/Bachelor of Science 2015, 2016, 2017
Bachelor of Information Technology (Computer Science) 2014 and earlier 2009, 2010, 2011, 2012, 2013, 2014
Information Technology (Computer Science)/Arts 2012, 2013, 2014
Information Technology (Computer Science) / Commerce 2012, 2013, 2014
Information Technology (Computer Science) / Medical Science 2012, 2013, 2014
Information Technology (Computer Science) / Science 2012, 2013, 2014
Information Technology (Computer Science) / Law 2012, 2013, 2014
Bachelor of Information Technology (Information Systems) 2014 and earlier 2010, 2011, 2012, 2013, 2014
Information Technology (Information Systems)/Arts 2012, 2013, 2014
Information Technology (Information Systems) / Commerce 2012, 2013, 2014
Information Technology (Information Systems) / Medical Science 2012, 2013, 2014
Information Technology (Information Systems) / Science 2012, 2013, 2014
Information Technology (Information Systems) / Law 2012, 2013, 2014
Bachelor of Information Technology/Bachelor of Laws 2015, 2016, 2017
Graduate Certificate in Information Technology 2015, 2016, 2017, 2018, 2019
Graduate Certificate in Information Technology Management 2015, 2016, 2017, 2018, 2019
Graduate Diploma in Computing 2015, 2016, 2017, 2018, 2019
Graduate Diploma in Health Technology Innovation 2015, 2016, 2017, 2018, 2019
Graduate Diploma in Information Technology 2015, 2016, 2017, 2018, 2019
Graduate Diploma in Information Technology Management 2015, 2016, 2017, 2018, 2019
Graduate Certificate in Information Technology (till 2014) 2012, 2013, 2014
Graduate Diploma in Information Technology (till 2014) 2012, 2013, 2014
Master of Data Science 2016, 2017, 2018, 2019
Master of Health Technology Innovation 2015, 2016, 2017, 2018, 2019
Master of Information Technology 2015, 2016, 2017, 2018, 2019
Master of Information Technology Management 2015, 2016, 2017, 2018, 2019
Master of IT/Master of IT Management 2015, 2016, 2017, 2018, 2019
Master of Information Technology (till 2014) 2014

Course Goals

This unit contributes to the achievement of the following course goals:

Attribute Practiced Assessed
Design (Level 4) No 0%
Maths/Science Methods and Tools (Level 4) No 18%
Engineering/IT Specialisation (Level 4) Yes 82%

These goals are selected from Engineering & IT Graduate Outcomes Table 2018 which defines overall goals for courses where this unit is primarily offered. See Engineering & IT Graduate Outcomes Table 2018 for details of the attributes and levels to be developed in the course as a whole. Percentage figures alongside each course goal provide a rough indication of their relative weighting in assessment for this unit. Note that not all goals are necessarily part of assessment. Some may be more about practice activity. See Learning outcomes for details of what is assessed in relation to each goal and Assessment for details of how the outcome is assessed. See Attributes for details of practice provided for each goal.