Note: This unit version is currently being edited and is subject to change!

DATA2001: Data Science: Big Data and Data Diversity (2019 - Semester 1)

Download UoS Outline

Unit: DATA2001: Data Science: Big Data and Data Diversity (6 CP)
Mode: Normal-Day
On Offer: Yes
Level: Intermediate
Faculty/School: School of Computer Science
Unit Coordinator/s: A/Prof Roehm, Uwe
Session options: Semester 1
Versions for this Unit:
Site(s) for this Unit:
Campus: Camperdown/Darlington
Pre-Requisites: DATA1002 OR DATA1902 OR INFO1110 OR INFO1910 OR INFO1903 OR INFO1103.
Prohibitions: DATA2901.
Brief Handbook Description: This course focuses on methods and techniques to efficiently explore and analyse large data collections. Where are hot spots of pedestrian accidents across a city? What are the most popular travel locations according to user postings on a travel website? The ability to combine and analyse data from various sources and from databases is essential for informed decision making in both research and industry.

Students will learn how to ingest, combine and summarise data from a variety of data models which are typically encountered in data science projects, such as relational, semi-structured, time series, geospatial, image, text. As well as reinforcing their programming skills through experience with relevant Python libraries, this course will also introduce students to the concept of declarative data processing with SQL, and to analyse data in relational databases. Students will be given data sets from, eg., social media, transport, health and social sciences, and be taught basic explorative data analysis and mining techniques in the context of small use cases. The course will further give students an understanding of the challenges involved with analysing large data volumes, such as the idea to partition and distribute data and computation among multiple computers for processing of 'Big Data'.
Assumed Knowledge: None.
Lecturer/s: A/Prof Roehm, Uwe
Tutor/s: Harshana Randeni (TA), Irene Mao, Tanya Singh, Joe Nguyen, William Zhang, Parvin Radhakrishnan
Timetable: DATA2001 Timetable
Time Commitment:
# Activity Name Hours per Week Sessions per Week Weeks per Semester
1 Lecture 2.00 1 13
2 Laboratory 2.00 1 13
3 Project Work - own time 3.00 13
4 Independent Study 3.00 13
T&L Activities: A variety of learning situations will be employed during the unit of study, including lectures, on-line demos, tutorials, directed computer laboratory exercises, self-learning SQL exercises, and assessed data science assignments. To benefit fully from this unit it is necessary to participate fully in all aspects of the unit of study.

Learning outcomes are the key abilities and knowledge that will be assessed in this unit. They are listed according to the course goal supported by each. See Assessment Tab for details how each outcome is assessed.

(8) Professional Effectiveness and Ethical Conduct (Level 1)
1. Awareness of privacy issues when working with data.
(2) Engineering/ IT Specialisation (Level 3)
2. Knowledge of the main challenges analysing 'Big Data': Data Volume, Variety, Velocity, Veracity.
3. Experience with handling datasets of diverse kinds of data, including relational, semi-structured, time series, geo-location, image, text, including experience to combine data of different types
4. Understanding of the impact of data volume on data processing, and awareness of approaches to address this such as indexing, compression, data partitioning, and distributed processing frameworks (Hadoop).
(1) Maths/ Science Methods and Tools (Level 2)
5. Ability to use appropriate Python libraries to automate data science activities on diverse kinds of data.
6. Ability to ingest, combine and summarise data from a variety of data models.
7. Ability to understand and produce declarative queries to extract appropriate information from data sets, including competence in use of SQL.
Assessment Methods:
# Name Group Weight Due Week Outcomes
1 SQL Tutorials No 0.00 Multiple Weeks 7,
2 SQL Quiz No 20.00 Week 7 3, 7,
3 Assignment Yes 20.00 Week 12 3, 5, 6, 7,
4 Final Examination No 60.00 Exam Period 1, 2, 4, 5, 7,
Assessment Description: SQL: Students work through weekly online tutorials introducing increasingly sophisticated usage of SQL. Solutions are provided for each week, and the topics are assessed in an SQL quiz.

Final Exam: Understanding of all of this unit`s material is reviewed in a written examination.
Assessment Feedback: SQL tutorials provide simple feedback and allow multiple attempts, and example solutions are available after the submission deadline has passed.

Tutorial exercises include solutions after one week.
Grading:
Grade Type Description
Standards Based Assessment Final grades in this unit are awarded at levels of HD for High Distinction, DI (previously D) for Distinction, CR for Credit, PS (previously P) for Pass and FA (previously F) for Fail as defined by University of Sydney Assessment Policy. Details of the Assessment Policy are available on the Policies website at http://sydney.edu.au/policies . Standards for grades in individual assessment tasks and the summative method for obtaining a final mark in the unit will be set out in a marking guide supplied by the unit coordinator.
Minimum Pass Requirement It is a policy of the School of Computer Science that in order to pass this unit, a student must achieve at least 40% in the written examination. For subjects without a final exam, the 40% minimum requirement applies to the corresponding major assessment component specified by the lecturer. A student must also achieve an overall final mark of 50 or more. Any student not meeting these requirements may be given a maximum final mark of no more than 45 regardless of their average.
Policies & Procedures: IMPORTANT: School policy relating to Academic Dishonesty and Plagiarism.

In assessing a piece of submitted work, the School of Computer Science may reproduce it entirely, may provide a copy to another member of faculty, and/or to an external plagiarism checking service or in-house computer program and may also maintain a copy of the assignment for future checking purposes and/or allow an external service to do so.

Other policies

See the policies page of the faculty website at http://sydney.edu.au/engineering/student-policies/ for information regarding university policies and local provisions and procedures within the Faculty of Engineering and Information Technologies.
Online Course Content: The SQL teaching will include lectures, and labwork where students work on the GrokLearning platform by following a sequence that integrates expository material with frequent exercises (formative and summative) which are automatically graded.

Note that the "Weeks" referred to in this Schedule are those of the official university semester calendar https://web.timetable.usyd.edu.au/calendar.jsp

Week Description
Week 1 Intro/Motivation; What is Big Data? Challenges for Data Analytics.
Week 2 Data Analysis with Python
Week 3 Accessing data in relational databases; introduction to SQL
Week 4 Declarative data analysis with SQL
Week 5 Scalable Data Analytics: The role of indexes and data partitioning
Week 6 Exploring health data: Analysing time series data
Week 7 Assessment Due: SQL Quiz
Week 8 Web Content / Social Media Analytics: reading and interpreting data from the web
Week 9 NoSQL: Processing semi-structured data (pot. combining with geo-location data)
Week 10 Text data processing: feature extraction and analysis
Week 11 Image data processing: feature extraction and analysis
Week 12 Challenges in analysing Big Data: The What and Why of Hadoop
Data Privacy / Anonymising Data
Assessment Due: Assignment
Week 13 Revision
Exam Period Assessment Due: Final Examination

Course Relations

The following is a list of courses which have added this Unit to their structure.

Course Year(s) Offered
Bachelor of Advanced Computing (Computational Data Science) 2018, 2019, 2020
Bachelor of Advanced Computing/Bachelor of Commerce 2018, 2019, 2020
Bachelor of Advanced Computing/Bachelor of Science 2018, 2019, 2020
Bachelor of Advanced Computing/Bachelor of Science (Health) 2018, 2019, 2020
Bachelor of Advanced Computing/Bachelor of Science (Medical Science) 2018, 2019, 2020
Bachelor of Advanced Computing (Computer Science Major) 2018, 2019, 2020
Bachelor of Advanced Computing (Information Systems Major) 2018, 2019, 2020
Bachelor of Advanced Computing (Software Development) 2018, 2019, 2020
Biomedical Mid-Year 2016, 2017, 2018, 2019, 2020
Biomedical 2016, 2017, 2018, 2019, 2020
Software Mid-Year 2019, 2020
Software 2018, 2019, 2020
Bachelor of Project Management (Built Environment) 2018
Bachelor of Project Management (Civil Engineering Science) 2018
Bachelor of Project Management (Software) 2018
Bachelor of Project Management (Built Environment) Mid-Year 2018
Bachelor of Project Management (Civil Engineering Science) Mid-Year 2018
Bachelor of Project Management (Software) Mid-Year 2018

Course Goals

This unit contributes to the achievement of the following course goals:

Attribute Practiced Assessed
(8) Professional Effectiveness and Ethical Conduct (Level 1) No 6%
(2) Engineering/ IT Specialisation (Level 3) No 50%
(1) Maths/ Science Methods and Tools (Level 2) No 44%

These goals are selected from Engineering & IT Graduate Outcomes Table 2018 which defines overall goals for courses where this unit is primarily offered. See Engineering & IT Graduate Outcomes Table 2018 for details of the attributes and levels to be developed in the course as a whole. Percentage figures alongside each course goal provide a rough indication of their relative weighting in assessment for this unit. Note that not all goals are necessarily part of assessment. Some may be more about practice activity. See Learning outcomes for details of what is assessed in relation to each goal and Assessment for details of how the outcome is assessed. See Attributes for details of practice provided for each goal.