Note: This unit version is currently being edited and is subject to change!

DATA2901: Big Data and Data Diversity (Advanced) (2019 - Semester 1)

Download UoS Outline

Unit: DATA2901: Big Data and Data Diversity (Advanced) (6 CP)
Mode: Normal-Day
On Offer: Yes
Level: Junior
Faculty/School: School of Computer Science
Unit Coordinator/s: A/Prof Roehm, Uwe
Session options: Semester 1
Versions for this Unit:
Site(s) for this Unit:
Campus: Camperdown/Darlington
Pre-Requisites: DATA1002 OR DATA1902 OR INFO1110 OR INFO1903 OR INFO1103. Students need Distinction or better in one of the prerequisite units.
Prohibitions: DATA2001.
Brief Handbook Description: This course focuses on methods and techniques to efficiently explore and analyse large data collections. Where are hot spots of pedestrian accidents across a city? What are the most popular travel locations according to user postings on a travel website? The ability to combine and analyse data from various sources and from databases is essential for informed decision making in both research and industry.

Students will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects, such as relational, semi-structured, time series, geospatial, image, text. As well as reinforcing their programming skills through experience with relevant Python libraries, this course will also introduce students to the concept of declarative data processing with SQL, and to analyse data in relational databases. Students will be given data sets from, eg., social media, transport, health and social sciences, and be taught basic explorative data analysis and mining techniques in the context of small use cases. The course will further give students an understanding of the challenges involved with analysing large data volumes, such as the idea to partition and distribute data and computation among multiple computers for processing of 'Big Data'.

This unit is an alternative to DATA2001, providing coverage of some additional, more sophisticated topics, suited for students with high academic achievement.
Assumed Knowledge: None.
Lecturer/s: A/Prof Roehm, Uwe
Tutor/s: Nazim Choudhury, Tim-Patrick Sass
Timetable: DATA2901 Timetable
Time Commitment:
# Activity Name Hours per Week Sessions per Week Weeks per Semester
1 Lecture 3.00 1 13
2 Laboratory 2.00 1 13
3 Project Work - own time 3.00 13
4 Independent Study 3.00 13
T&L Activities: A variety of learning situations will be employed during the unit of study, including lectures, on-line demos, tutorials, directed computer laboratory exercises, self-learning SQL exercises, and assessed data science assignments. To benefit fully from this unit it is necessary to participate fully in all aspects of the unit of study.

Learning outcomes are the key abilities and knowledge that will be assessed in this unit. They are listed according to the course goal supported by each. See Assessment Tab for details how each outcome is assessed.

(8) Professional Effectiveness and Ethical Conduct (Level 1)
1. Awareness of privacy issues when working with data.
(2) Engineering/ IT Specialisation (Level 3)
2. Ability to use appropriate Python libraries to automate data science activities on diverse kinds of data.
3. Ability to understand and produce declarative queries to extract appropriate information from data sets, including competence in use of SQL.
4. Knowledge of the main challenges analysing 'Big Data': Data Volume, Variety, Velocity, Veracity.
5. Experience with handling datasets of diverse kinds of data, including relational, semi-structured, time series, geo-location, image, text, including experience to combine data of different types
(1) Maths/ Science Methods and Tools (Level 2)
6. Ability to ingest, combine and summarise data from a variety of data models.
Unassigned Outcomes
7. Understanding of the impact of data volume on data processing, and awareness of approaches to address this such as indexing, compression, data partitioning, and distributed processing frameworks (Hadoop).
8. Knowledge of, and ability to work with, several sophisticated topics related to data scale and diversity
Assessment Methods:
# Name Group Weight Due Week Outcomes
1 SQL Tutorials No 0.00 Multiple Weeks 3,
2 SQL Quiz No 20.00 Week 7 3, 5, 8,
3 Assignment Yes 20.00 Week 12 2, 3, 5, 6, 8,
4 Final Examination No 60.00 Exam Period 1, 2, 3, 4, 7, 8,
Assessment Description: SQL: Students work through weekly online tutorials introducing increasingly sophisticated usage of SQL. Solutions are provided for each week, and the topics are assessed in an SQL quiz.

Final Exam: Understanding of all of this unit`s material is reviewed in a written examination.
Assessment Feedback: SQL tutorials provide simple feedback and allow multiple attempts, and example solutions are available after the submission deadline has passed.

Tutorial exercises include solutions after one week.
Grading:
Grade Type Description
Standards Based Assessment Final grades in this unit are awarded at levels of HD for High Distinction, DI (previously D) for Distinction, CR for Credit, PS (previously P) for Pass and FA (previously F) for Fail as defined by University of Sydney Assessment Policy. Details of the Assessment Policy are available on the Policies website at http://sydney.edu.au/policies . Standards for grades in individual assessment tasks and the summative method for obtaining a final mark in the unit will be set out in a marking guide supplied by the unit coordinator.
Minimum Pass Requirement It is a policy of the School of Computer Science that in order to pass this unit, a student must achieve at least 40% in the written examination. For subjects without a final exam, the 40% minimum requirement applies to the corresponding major assessment component specified by the lecturer. A student must also achieve an overall final mark of 50 or more. Any student not meeting these requirements may be given a maximum final mark of no more than 45 regardless of their average.
Policies & Procedures: IMPORTANT: School policy relating to Academic Dishonesty and Plagiarism.

In assessing a piece of submitted work, the School of Computer Science may reproduce it entirely, may provide a copy to another member of faculty, and/or to an external plagiarism checking service or in-house computer program and may also maintain a copy of the assignment for future checking purposes and/or allow an external service to do so.

Other policies

See the policies page of the faculty website at http://sydney.edu.au/engineering/student-policies/ for information regarding university policies and local provisions and procedures within the Faculty of Engineering and Information Technologies.
Online Course Content: The SQL teaching will include lectures, and labwork where students work on the GrokLearning platform by following a sequence that integrates expository material with frequent exercises (formative and summative) which are automatically graded.

Note that the "Weeks" referred to in this Schedule are those of the official university semester calendar https://web.timetable.usyd.edu.au/calendar.jsp

Week Description
Week 1 Intro/Motivation; What is Big Data? Challenges for Data Analytics.
Week 2 Data Analysis with Python
Advanced topic: system extensions for diverse data types, versus use of multiple systems each specialized for one data type
Week 3 Accessing data in relational databases; introduction to SQL
Advanced topic: user-defined types and user-defined functions
Week 4 Declarative data analysis with SQL
Advanced topic: recursive SQL
Week 5 Scalable Data Analytics: The role of indexes and data partitioning
Advanced topic: evaluation of recursive SQL
Week 6 Exploring health data: Analysing time series data
Advanced topic: stream processing systems
Week 7 Advanced topic: spatial data UDTs
Assessment Due: SQL Quiz
Week 8 Web Content / Social Media Analytics: reading and interpreting data from the web
Advanced topic: graph UDTs
Week 9 NoSQL: Processing semi-structured data (pot. combining with geo-location data)
Advanced topic: introduction to Spark
Week 10 Text data processing: feature extraction and analysis
Advanced topic: more on Spark
Week 11 Image data processing: feature extraction and analysis
Advanced topic: evaluation of Spark
Week 12 Challenges in analysing Big Data: The What and Why of Hadoop
Data Privacy / Anonymising Data
Advanced topic: differential privacy
Assessment Due: Assignment
Week 13 Revision
Exam Period Assessment Due: Final Examination

Course Relations

The following is a list of courses which have added this Unit to their structure.

Course Year(s) Offered
Bachelor of Advanced Computing/Bachelor of Commerce 2018, 2019, 2020
Bachelor of Advanced Computing/Bachelor of Science 2018, 2019, 2020
Bachelor of Advanced Computing/Bachelor of Science (Health) 2018, 2019, 2020
Bachelor of Advanced Computing/Bachelor of Science (Medical Science) 2018, 2019, 2020
Bachelor of Advanced Computing (Computational Data Science) 2018, 2019, 2020
Bachelor of Advanced Computing (Computer Science Major) 2018, 2019, 2020
Bachelor of Advanced Computing (Information Systems Major) 2018, 2019, 2020
Bachelor of Advanced Computing (Software Development) 2018, 2019, 2020
Software Mid-Year 2018, 2019, 2020
Software 2017, 2018, 2019, 2020
Bachelor of Information Technology 2017
Bachelor of Information Technology/Bachelor of Arts 2017
Bachelor of Information Technology/Bachelor of Commerce 2017
Bachelor of Information Technology/Bachelor of Medical Science 2017
Bachelor of Information Technology/Bachelor of Science 2017
Bachelor of Information Technology/Bachelor of Laws 2017

Course Goals

This unit contributes to the achievement of the following course goals:

Attribute Practiced Assessed
(8) Professional Effectiveness and Ethical Conduct (Level 1) No 6%
(2) Engineering/ IT Specialisation (Level 3) No 53%
(1) Maths/ Science Methods and Tools (Level 2) No 2%

These goals are selected from Engineering & IT Graduate Outcomes Table 2018 which defines overall goals for courses where this unit is primarily offered. See Engineering & IT Graduate Outcomes Table 2018 for details of the attributes and levels to be developed in the course as a whole. Percentage figures alongside each course goal provide a rough indication of their relative weighting in assessment for this unit. Note that not all goals are necessarily part of assessment. Some may be more about practice activity. See Learning outcomes for details of what is assessed in relation to each goal and Assessment for details of how the outcome is assessed. See Attributes for details of practice provided for each goal.