Note: This unit version is currently under review and is subject to change!

COMP5310: Principles of Data Science (2019 - Semester 2)

Download UoS Outline

Unit: COMP5310: Principles of Data Science (6 CP)
Mode: Normal-Day
On Offer: Yes
Level: Postgraduate
Faculty/School: School of Computer Science
Unit Coordinator/s: Anaissi, Ali
Session options: Semester 1, Semester 2
Versions for this Unit:
Site(s) for this Unit: https://canvas.sydney.edu.au/courses/17649
Campus: Camperdown/Darlington
Pre-Requisites: None.
Prohibitions: INFO3406.
Brief Handbook Description: The focus of this unit is on understanding and applying relevant concepts, techniques, algorithms, and tools for the analysis, management and visualisation of data– with the goal of enabling discovery of information and knowledge to guide effective decision making and to gain new insights from large data sets.

To this end, this unit of study provides a broad introduction to data management, analysis, modelling and visualisation using the Python programming language. Development of custom software using the powerful, general-purpose Python scripting language; Data collection, cleaning, pre-processing, and storage using various databases; Exploratory data analysis to understand and profile complex data sets; Mining unlabelled data to identify relationships, patterns, and trends; Machine learning from labelled data to predict into the future; Communicate findings to varied audiences, including effective data visualisations.

Core data science content will be taught in normal lecture + tutorial delivery mode. Python programming will be taught through an online learning platform in addition to the weekly face-to-face lecture/tutorials. The unit of study will include hands-on exercises covering the range of data science skills above.
Assumed Knowledge: None.
Lecturer/s: Anaissi, Ali
Tutor/s: tbc.
Timetable: COMP5310 Timetable
Time Commitment:
# Activity Name Hours per Week Sessions per Week Weeks per Semester
1 Lecture 2.00 1 13
2 Laboratory 1.00 1 13
3 Independent Study 9.00 1 13
T&L Activities: Lecture: The 3-hour combined lectures/tutorials per week are taught in a blended mode, covering both concepts and practical work and discussions (and on several weeks, to assessment tasks)

Attributes listed here represent the key course goals (see Course Map tab) designated for this unit. The list below describes how these attributes are developed through practice in the unit. See Learning Outcomes and Assessment tabs for details of how these attributes are assessed.

Attribute Development Method Attribute Developed
Students learn and practice the design of a data processing pipeline. Design (Level 4)
Students are given scenario(s) that require them to use various algorithms and tools to create a pipeline to process a set of complex data. Students have to articulate and substantiate their choice of computational methods & tools used in the process owing to technical, social and application constraints in the given setting. Engineering/IT Specialisation (Level 4)
Students will learn a variety of data analysis and processing tools and methods such as a scripting language and the the Map/Reduce paradigm in order to pre-process, analyse and visualise data from heterogeneous sources. Maths/Science Methods and Tools (Level 4)
Students have to explore the implied constraints of the implicit & explicit requirements in a given practical assessment through literature and tool documentation. Information Seeking (Level 3)
Students practice their written and oral communication skills through the assessments. They need to articulate well the aim and issues of the problems, the social and technical constraints, the reasons behind decision choices. They should be able to discuss and draw insights from the results through their analytical work. Communication (Level 3)

For explanation of attributes and levels see Engineering & IT Graduate Outcomes Table 2018.

Learning outcomes are the key abilities and knowledge that will be assessed in this unit. They are listed according to the course goal supported by each. See Assessment Tab for details how each outcome is assessed.

Maths/Science Methods and Tools (Level 4)
1. Student can select statistical techniques appropriate for evaluation of a predictive model that is based on data analysis, and can justify their choice
2. Student can select statistical techniques appropriate for summarization and analysis of a data set, and can justify their choice
3. Student can apply concepts and terms from social science to describe and analyse the role of a data analysis task in its organizational context
Engineering/IT Specialisation (Level 4)
4. Student understands the role of data science in decision-making
5. Student understands the technical issues that are present in the stages of a data analysis task and the properties of different technologies and tools that can be used to deal with the issues
6. Student can process large data sets using appropriate technologies
Design (Level 4)
7. Students can carry out (in guided stages) the whole design and implementation cycle for creating a pipeline to analyse a large heterogenous dataset
Information Seeking (Level 3)
8. Student can find out details of how to use a method or tool in the data analytic process.
Communication (Level 3)
9. Student can communicate the results produced by an analysis pipeline, in oral and written form, including meaningful diagrams
10. Student can communicate the process used to analyse a large data set, and justify the methods used.
Assessment Methods:
# Name Group Weight Due Week Outcomes
1 Participation No 10.00 Multiple Weeks 1, 2, 5, 6, 7, 8, 9, 10,
2 Project Stage 1: Obtain data, clean it and load Yes 10.00 Week 6 6, 7, 8,
3 Project Stage 2: Summarize and analyse the data Yes 20.00 Week 12 1, 2, 6, 7, 8,
4 Project Stage 3: Oral Presentation Yes 5.00 Week 12 6, 8, 10,
5 Written Exam No 55.00 Exam Period 3, 4, 5, 9, 10,
Assessment Description: Participation: Complete and submit lab exercises [10 marks; individual work].

Project Stage 1: Obtain data, clean it, load and summarise [10 marks; due week 6]

Project Stage 2: Analyse the data, develop and test a predictive model [20 marks; due week 12]

Project Stage 3: Presentation of results [5 marks; due week 12]

Penalties for lateness:

10% of the awarded marks per day late; maximum 7 days late (after that: 0)

There may be statistically defensible moderation when combining the marks from each component to ensure consistency of marking between markers, and alignment of final grades with unit outcomes.
Grading:
Grade Type Description
Standards Based Assessment Final grades in this unit are awarded at levels of HD for High Distinction, DI (previously D) for Distinction, CR for Credit, PS (previously P) for Pass and FA (previously F) for Fail as defined by University of Sydney Assessment Policy. Details of the Assessment Policy are available on the Policies website at http://sydney.edu.au/policies . Standards for grades in individual assessment tasks and the summative method for obtaining a final mark in the unit will be set out in a marking guide supplied by the unit coordinator.
Minimum Pass Requirement It is a policy of the School of Computer Science that in order to pass this unit, a student must achieve at least 40% in the written examination. For subjects without a final exam, the 40% minimum requirement applies to the corresponding major assessment component specified by the lecturer. A student must also achieve an overall final mark of 50 or more. Any student not meeting these requirements may be given a maximum final mark of no more than 45 regardless of their average.
Policies & Procedures: IMPORTANT: School policy relating to Academic Dishonesty and Plagiarism.

In assessing a piece of submitted work, the School of Computer Science may reproduce it entirely, may provide a copy to another member of faculty, and/or to an external plagiarism checking service or in-house computer program and may also maintain a copy of the assignment for future checking purposes and/or allow an external service to do so.

Other policies

See the policies page of the faculty website at http://sydney.edu.au/engineering/student-policies/ for information regarding university policies and local provisions and procedures within the Faculty of Engineering and Information Technologies.
Recommended Reference/s: Note: References are provided for guidance purposes only. Students are advised to consult these books in the university library. Purchase is not required.
Online Course Content: This subject will use Python as programming language throughout the course. An online tutorial on `Principles of Data Science: Introduction to Python` is made available through the Grok learning platform. For the best learning effect, students should start working on this Python tutorial already before the semester start.

https://canvas.sydney.edu.au/courses/17649
Note on Resources: Lecture notes, tutorial notes and links to online questions will be provided in the eLearning system

Note that the "Weeks" referred to in this Schedule are those of the official university semester calendar https://web.timetable.usyd.edu.au/calendar.jsp

Week Description
Week 1 Introduction to Data Science and Big Data
Week 2 Data Exploration with Spreadsheets
Week 3 Data Exploration with Python
Week 4 Cleaning and Storing Data
Week 5 Querying and Summarising Data
Week 6 Hypothesis Testing and Evaluation
Assessment Due: Project Stage 1: Obtain data, clean it and load
Week 7 Data Mining - Association Rules and Dimensionality Reduction
Week 8 Data Mining - Clustering
Week 9 Machine Learning - Regression
Week 10 Machine Learning - Classification
Week 11 Unstructured Data
Week 12 Product Thinking and Ethics

Information, actionable knowledge from data, and link to effective decision making.
Assessment Due: Project Stage 2: Summarize and analyse the data
Assessment Due: Project Stage 3: Oral Presentation
Week 13 UoS Review
Exam Period Assessment Due: Written Exam

Course Relations

The following is a list of courses which have added this Unit to their structure.

Course Year(s) Offered
Graduate Certificate in Data Science 2016, 2017, 2018, 2019, 2020
Master of Data Science 2016, 2017, 2018, 2019, 2020
Graduate Diploma in Health Technology Innovation 2016, 2017, 2018, 2019, 2020
Master of Health Technology Innovation 2015, 2016, 2017, 2018, 2019, 2020

Course Goals

This unit contributes to the achievement of the following course goals:

Attribute Practiced Assessed
Maths/Science Methods and Tools (Level 4) Yes 21%
Engineering/IT Specialisation (Level 4) Yes 32.5%
Design (Level 4) Yes 11%
Information Seeking (Level 3) Yes 8.5%
Communication (Level 3) Yes 27%

These goals are selected from Engineering & IT Graduate Outcomes Table 2018 which defines overall goals for courses where this unit is primarily offered. See Engineering & IT Graduate Outcomes Table 2018 for details of the attributes and levels to be developed in the course as a whole. Percentage figures alongside each course goal provide a rough indication of their relative weighting in assessment for this unit. Note that not all goals are necessarily part of assessment. Some may be more about practice activity. See Learning outcomes for details of what is assessed in relation to each goal and Assessment for details of how the outcome is assessed. See Attributes for details of practice provided for each goal.