Note: This unit version is currently being edited and is subject to change!

INFO3406: Introduction to Data Analytics (2017 - Semester 2)

Download UoS Outline

Unit: INFO3406: Introduction to Data Analytics (6 CP)
Mode: Normal-Day
On Offer: Yes
Level: Senior
Faculty/School: School of Information Technologies
Unit Coordinator/s: Dr Takatsuka, Masahiro
Diaz Cifuentes, Claudio Esteban
Session options: Semester 2
Versions for this Unit:
Campus: Camperdown/Darlington
Pre-Requisites: (MATH1005 OR MATH1905) AND (INFO2120 OR INFO2820).
Brief Handbook Description: Big Data refers to datasets that are massive, heterogenous, and dynamic that are beyond current approaches for the capture, storage, management, and analysis of the data. The focus of this unit is on understanding and applying relevant concepts, techniques, algorithms, and tools for the analysis, management and visualization of big data – with the goal of keeping abreast of the continual increase in the volume and complexity of data sets and enabling discovery of information and knowledge to guide effective decision making .
Assumed Knowledge: Basic statistics and database management.
Lecturer/s: Diaz Cifuentes, Claudio Esteban
Timetable: INFO3406 Timetable
Time Commitment:
# Activity Name Hours per Week Sessions per Week Weeks per Semester
1 Lecture 2.00 1 13
2 Laboratory 1.00 1 13

Attributes listed here represent the key course goals (see Course Map tab) designated for this unit. The list below describes how these attributes are developed through practice in the unit. See Learning Outcomes and Assessment tabs for details of how these attributes are assessed.

Attribute Development Method Attribute Developed
Students learn and practice the design of a pipeline of processes to analyse a huge set of complex data. Design (Level 3)
Students are given scenario(s) that require them to use various components and tools to create a pipeline to process a set of complex data. Students have to articulate and substantiate their choice of computational methods & tools used in the process owing to technical, social and application constraints in the given setting. Engineering/IT Specialisation (Level 3)
Students are required to determine and identify appropriate tools and methods to pre-process heterogenous data coming from different channels in the practical assessment. Through their assignment, Different tools are provide to students but they have to come up with rational choice of tools to analyse & to clean up data. Maths/Science Methods and Tools (Level 3)
Students are required to perform requirements analysis through the practical assessment. They have to identify implicit & explicit requirements in a given project brief. Students should also explore the implied constraints through literature after synthesising the given requirements. Information Seeking (Level 3)
Students practice their written and oral communication skills through the assessments. They need to articulate well the aim and issues of the problems, the social and technical constraints, the reasons behind decision choices. They should be able to discuss and draw insights from the results through their analytical work. Communication (Level 3)

For explanation of attributes and levels see Engineering & IT Graduate Outcomes Table.

Learning outcomes are the key abilities and knowledge that will be assessed in this unit. They are listed according to the course goal supported by each. See Assessment Tab for details how each outcome is assessed.

Engineering/IT Specialisation (Level 3)
1. Student understands the role of data analysis in decision-making
2. Student understands the technical issues that are present in the stages of a data analysis task and the properties of different technologies and tools that can be used to deal with the issues
3. Student can process large data sets using appropriate technologies
Maths/Science Methods and Tools (Level 3)
4. Student can select statistical techniques appropriate for summarization and analysis of a data set, and can justify their choice
5. Student can select statistical techniques appropriate for evaluation of a predictive model that is based on data analysis, and can justify their choice
6. Student can apply concepts and terms from social science to describe and analyse the role of a data analysis task in its organizational context
Information Seeking (Level 3)
7. Student can identify explicit and implicit requirements for carrying out a data analysis task to meet stakeholder purposes
8. Student can find out details of how to use a method or tool in the data analytic process.
Design (Level 3)
9. Students can carry out (in guided stages) the whole design and implementation cycle for creating a pipeline to analyse a large heterogenous dataset
Communication (Level 3)
10. Student can communicate the results produced by an analysis pipeline, in oral and written form, including meaningful diagrams
11. Student can communicate the process used to analyse a large data set, and justify the methods used.
Assessment Methods:
# Name Group Weight Due Week Outcomes
1 Class Activities No 10.00 Multiple Weeks 1, 2, 3, 4, 5, 6, 7, 8, 9,
2 Project Stage 1 No 10.00 Week 4 3, 8, 9,
3 Project Stage 2 No 10.00 Week 8 3, 4, 8, 9,
4 Project Stage 3 Yes 10.00 Week 11 3, 5, 8, 9,
5 Project Stage 4 Yes 10.00 Week 13 6, 8, 10, 11,
6 Written exam No 50.00 Exam Period 1, 2, 4, 5, 6, 7,
Assessment Description: Project Stage 1: Obtain data, clean it and load it [10 marks individual work; due week 4]

Project Stage 2: Summarize and analyse the data [10 marks individual work, due week 8]

Project Stage 3: Develop and test a predictive model [10 marks group work; due week 11]

Project Stage 4: presentation of results [10 marks group work; due week 13]
Grading:
Grade Type Description
Standards Based Assessment Final grades in this unit are awarded at levels of HD for High Distinction, DI (previously D) for Distinction, CR for Credit, PS (previously P) for Pass and FA (previously F) for Fail as defined by University of Sydney Assessment Policy. Details of the Assessment Policy are available on the Policies website at http://sydney.edu.au/policies . Standards for grades in individual assessment tasks and the summative method for obtaining a final mark in the unit will be set out in a marking guide supplied by the unit coordinator.
Minimum Pass Requirement It is a policy of the School of Information Technologies that in order to pass this unit, a student must achieve at least 40% in the written examination. For subjects without a final exam, the 40% minimum requirement applies to the corresponding major assessment component specified by the lecturer. A student must also achieve an overall final mark of 50 or more. Any student not meeting these requirements may be given a maximum final mark of no more than 45 regardless of their average.
Policies & Procedures: IMPORTANT: School policy relating to Academic Dishonesty and Plagiarism.

In assessing a piece of submitted work, the School of IT may reproduce it entirely, may provide a copy to another member of faculty, and/or to an external plagiarism checking service or in-house computer program and may also maintain a copy of the assignment for future checking purposes and/or allow an external service to do so.

Other policies

See the policies page of the faculty website at http://sydney.edu.au/engineering/student-policies/ for information regarding university policies and local provisions and procedures within the Faculty of Engineering and Information Technologies.

Note that the "Weeks" referred to in this Schedule are those of the official university semester calendar https://web.timetable.usyd.edu.au/calendar.jsp

Week Description
Week 1 Overview of the Course: CRISP-DM Methodology. Business and Big data, what it is. The four V’s (volume, variety, velocity, and veracity).
Week 2 Data Understanding: Data types, quality, pre-processing, similarity.
Week 3 Data Analysis: Storing data and exploratory data analysis and visualization
Week 4 Data Preparation: Select, clean, construct, integrate, format. Outliers, missing, samples.
Assessment Due: Project Stage 1
Week 5 Modeling: Introduction and Naive Bayes
Week 6 Modeling: Decision Tree and Scoring
Week 7 Guest Lecture
Week 8 Modeling: Regressions
Assessment Due: Project Stage 2
Week 9 Modeling: Clustering
Week 10 Modeling: Pattern Discovery
Week 11 Evaluation and Deployment
Assessment Due: Project Stage 3
Week 12 Visualization, techniques, visual sense making etc. (variety of tools for the labs including dashboards etc.)
Week 13 Information, actionable knowledge from data, and link to effective decision making.
Assessment Due: Project Stage 4
Exam Period Assessment Due: Written exam

Course Relations

The following is a list of courses which have added this Unit to their structure.

Course Year(s) Offered
Bachelor of Computer Science and Technology 2015, 2016, 2017
Bachelor of Computer Science and Technology (Advanced) 2015, 2016, 2017
Bachelor of Computer Science and Technology (Computer Science) 2014 and earlier 2014
Bachelor of Computer Science and Technology (Computer Science)(Advanced) 2014 and earlier 2014
Bachelor of Computer Science and Technology (Information Systems) 2014 and earlier 2014
Bachelor of Computer Science and Technology (Information Systems)(Advanced) 2014 and earlier 2014
Bachelor of Computer Science & Tech. Mid-Year 2016, 2017
Aeronautical Engineering / Science 2014
Aeronautical Engineering (Space) / Science 2014
Biomedical Engineering / Science 2014
Electrical Engineering / Science 2014
Electrical Engineering (Computer) / Science 2014
Electrical Engineering (Power) / Science 2014
Electrical Engineering (Telecommunications) / Science 2014
Aeronautical / Science 2015, 2016, 2017
Aeronautical (Space) / Science 2015
Biomedical Mid-Year 2016
Biomedical 2016
Biomedical /Science 2015, 2016, 2017
Electrical / Science 2015
Electrical (Computer) / Science 2015
Electrical (Power) / Science 2015
Electrical (Telecommunications) / Science 2015
Mechanical / Science 2015, 2016, 2017
Mechanical (Space) / Science 2015
Mechatronic / Science 2015, 2016, 2017
Mechatronic (Space) / Science 2015
Mechanical Engineering / Science 2014
Mechanical Engineering (Space) / Science 2014
Mechatronic Engineering / Science 2014
Mechatronic Engineering (Space) / Science 2014
Bachelor of Information Technology 2015, 2016, 2017
Information Technology / Arts 2015, 2016, 2017
Information Technology / Commerce 2015, 2016, 2017
Information Technology / Medical Science 2015, 2016, 2017
Information Technology / Science 2015, 2016, 2017
Bachelor of Information Technology (Computer Science) 2014 and earlier 2014
Bachelor of Information Technology (Information Systems) 2014 and earlier 2014
Information Technology / Law 2015, 2016, 2017

Course Goals

This unit contributes to the achievement of the following course goals:

Attribute Practiced Assessed
Engineering/IT Specialisation (Level 3) Yes 29.3%
Maths/Science Methods and Tools (Level 3) Yes 32.3%
Information Seeking (Level 3) Yes 18.2%
Design (Level 3) Yes 15.2%
Communication (Level 3) Yes 5%

These goals are selected from Engineering & IT Graduate Outcomes Table which defines overall goals for courses where this unit is primarily offered. See Engineering & IT Graduate Outcomes Table for details of the attributes and levels to be developed in the course as a whole. Percentage figures alongside each course goal provide a rough indication of their relative weighting in assessment for this unit. Note that not all goals are necessarily part of assessment. Some may be more about practice activity. See Learning outcomes for details of what is assessed in relation to each goal and Assessment for details of how the outcome is assessed. See Attributes for details of practice provided for each goal.