Note: This unit version is currently under review and is subject to change!

DATA1902: Informatics: Data and Computation (Advanced) (2019 - Semester 2)

Download UoS Outline

Unit: DATA1902: Informatics: Data and Computation (Advanced) (6 CP)
Mode: Normal-Day
On Offer: Yes
Level: Junior
Faculty/School: School of Computer Science
Unit Coordinator/s: Prof Fekete, Alan
Session options: Semester 2
Versions for this Unit:
Site(s) for this Unit:
Campus: Camperdown/Darlington
Pre-Requisites: None.
Prohibitions: INFO1903 OR DATA1002.
Brief Handbook Description: This unit covers computation and data handling, integrating sophisticated use of existing productivity software, e.g. spreadsheets, with the development of custom software using the general-purpose Python language. It will focus on skills directly applicable to data-driven decision-making. Students will see examples from many domains, and be able to write code to automate the common processes of data science, such as data ingestion, format conversion, cleaning, summarization, creation and application of a predictive model.
Assumed Knowledge: None.
Lecturer/s: Prof Fekete, Alan
Timetable: DATA1902 Timetable
Time Commitment:
# Activity Name Hours per Week Sessions per Week Weeks per Semester
1 Lecture 3.00 3 13
2 Laboratory 3.00 1 13
3 Project Work - own time 3.00 13
4 Independent Study 2.00 13
T&L Activities: Note that each laboratory session will include an online multichoice quiz, and also time devoted to covering an advanced topic (not part of the material in data1002). Thus attendance at lab is crucial.

Learning outcomes are the key abilities and knowledge that will be assessed in this unit. They are listed according to the course goal supported by each. See Assessment Tab for details how each outcome is assessed.

Unassigned Outcomes
1. Ability to automate a computational process, when given a clear account of the algorithm to be applied. This will be done by writing Python programs with core techniques of procedural programming.
2. Knowledge of Python synatx and semantics, to trace and understand idiomatic code typical of data science activities, including features such as user-defined functions, exception-raising and handling.
3. Experience with automation of the computational process needed for examples of the various activity in the data science pipeline: data ingestion and cleaning, data format conversion, data summarization, visual and tabular presentation of the results from summarization, creation of a predictive model of a given form, application of a predictive model to new data, evaluation of a predictive model (and also, automation of a pipeline that scripts use of existing tools for these activities). The examples students have seen will cover a diversity of application domains
4. Experience with both spreadsheets, and programs in Python, for automatically performing computational processes of data science, and awareness of the similarities and differences between tools.
5. Understanding of main issues for data management in connection with data science activities, including value of data, importance of metadata (that describes the format and meaning of data, constraints on the data, origins of the data, restrictions on use of the data, etc), issues when sharing data across time and across users (eg value of a manager role, access control, persistence, recovery)
6. Understanding of how data sets are represented in computer files, in particular, the many-to-many relationship between the physical representation and the logical representation; advantages and disadvantages of different representations.
7. Understanding of principles of charting and information presentation, and ability to produce good charts using both Python libraries and spreadsheets; also capability to evaluate charts for effectiveness in communication.
8. Ability to use and understand some more sophisticated tools for computation or data-handling.
Assessment Methods:
# Name Group Weight Due Week Outcomes
1 Weekly Python tasks* No 5.00 Multiple Weeks 1, 2, 3, 7,
2 Weekly quizzes* No 10.00 Multiple Weeks 2, 3, 4, 5, 6, 7, 8,
3 Practice Python coding test* No 0.00 Week 7 1, 2,
4 Python coding test* No 10.00 Week 10 1, 2,
5 Project stage 1 Yes 5.00 Week 9 1, 3, 4, 5,
6 Project stage 2 Yes 20.00 Week 12 1, 3, 7, 8,
7 Exam No 50.00 Exam Period 1, 2, 3, 4, 5, 6, 7, 8,
Assessment Description: Weekly Python tasks: the material in the GrokLearning platform includes tasks where the student must write a Python program to prduce precisely described output. The program will be graded automatically, by being run against several input datasets (only one of these datasets is visible to the student before submission), and the output will be compared to what is required. *In case of special consideration, reweighting will be applied, taking the grade from those tasks not covered by the consideration. However, students should still complete the tasks even if the due date has passed.

Weekly quizzes: held during the student`s scheduled tutorial session each week, each quiz consists of multiple-choice questions related to the lecture content from the previous week, and also the extra Advanced content from lab session of the previous week; the quizzes are done through the Canvas system. Each quiz is worth 1 point, and the total mark is the sum of these but capped at 10. *In case of special consideration, extension or alternative assessment are not possible, instead reweighting will be done, to replace an affected quiz by the average score on non-affected quizzes.

Practice Python coding test: held during scheduled tutorial sessions. Each student will be required to produce Python code that calculates precisely described output from data in a file. This carries no weight in final grade, but is intended to accustom students to the setting in preparation for the later coding test. *In case of special consideration, no action is needed.

Python coding test: held during scheduled tutorial sessions. Each student will be required to produce Python code that calculates precisely described output from data in a file. *In case of special consideration, alternative assessment will be arranged.

Project Stage 1: This is the first part of a group project (the students in a group should all be attending the same scheduled tutorial session). This stage involves finding data from a domain of interest for the students; data cleaning and importing to a tool, and doing a very simple analysis from some of the data. A report is required that describes the dataset, how it was obtained, and how it was processed by the tool. If this stage is missed or badly done, the group can be given a clean data set, for a domain chosen by the instructor, to use in the rest of the project. It is crucial that each group manages its internal working effectively, and they need mechanisms to detect problems and report them to the coordinator early.

Project Stage 2: the group will use computational tools, to analyse the data and offer interactive visualisation, build a useful predictive model of some kind, and report on both what was done and what was found. This stage cannot be reweighted in response to special consideration; extension or alternative assessment are needed. It is crucial that each group manages its internal working effectively, and they need mechanisms to detect problems and report them to the coordinator early.

Exam: a written exam, covering conceptual content, skills, and experiences

Except for tasks where late work is not accepted at all, as noted above, late submission of a progressive assessment (up to 10 days late) will attract a penalty of 5 percent of the available mark, for each calendar day after the due date. Work that is not submitted within 10 calendar days will receive a mark of zero.
Grading:
Grade Type Description
Standards Based Assessment Final grades in this unit are awarded at levels of HD for High Distinction, DI (previously D) for Distinction, CR for Credit, PS (previously P) for Pass and FA (previously F) for Fail as defined by University of Sydney Assessment Policy. Details of the Assessment Policy are available on the Policies website at http://sydney.edu.au/policies . Standards for grades in individual assessment tasks and the summative method for obtaining a final mark in the unit will be set out in a marking guide supplied by the unit coordinator.
Minimum Pass Requirement It is a policy of the School of Computer Science that in order to pass this unit, a student must achieve at least 40% in the written examination. For subjects without a final exam, the 40% minimum requirement applies to the corresponding major assessment component specified by the lecturer. A student must also achieve an overall final mark of 50 or more. Any student not meeting these requirements may be given a maximum final mark of no more than 45 regardless of their average.
Policies & Procedures: IMPORTANT: School policy relating to Academic Dishonesty and Plagiarism.

In assessing a piece of submitted work, the School of IT may reproduce it entirely, may provide a copy to another member of faculty, and/or to an external plagiarism checking service or in-house computer program and may also maintain a copy of the assignment for future checking purposes and/or allow an external service to do so.

Other policies

See the policies page of the faculty website at http://sydney.edu.au/engineering/student-policies/ for information regarding university policies and local provisions and procedures within the Faculty of Engineering and Information Technologies.
Recommended Reference/s: Note: References are provided for guidance purposes only. Students are advised to consult these books in the university library. Purchase is not required.
Online Course Content: The unit`s Canvas site will contain copies of lecture slides (and lecture recordings if the technology works as it ought to), tutorial instructions, assessment instructions, and a discussion forum. The Python teaching will be done in two independent ways: though lectures, and through labwork where students work on the GrokLearning platform by following a sequence that integrates expository material with frequent exercises which are automatically graded.

Note that the "Weeks" referred to in this Schedule are those of the official university semester calendar https://web.timetable.usyd.edu.au/calendar.jsp

Week Description
Week 1 Introduction and adminstrivia; data science lifecycle and pipeline; how to learn to program. [Advanced lab: Unix tools]
Week 2 Data science with spreadsheets; Python as a calculator, variables and expressions; assignment, simplified notional machine model for Python. [Advanced lab: Unix tools]
Week 3 More spreadsheet techniques; Decisions and conditionals; Strings, text files and loops. [Advanced lab: regular expressions]
Week 4 Pivot tables and lookup in spreadsheets; Lists and tuples; Dictionaries [Advanced lab: AWK]
Week 5 Communication and charts; data management, metadata and data quality; Writing a function in Python. [Advanced lab: combining tools]
Week 6 Storage and number formats; Charts in spreadsheets; Prepare for practice coding test. [Advanced lab: comparing tools]
Week 7 Data persistence and recovery; Intro to Pandas and Dataframes; More Pandas capabilities. [Advanced lab: comparing tools]
Assessment Due: Practice Python coding test*
Week 8 Optimisation and simulation; Plotting with Python; Scope and notional machine for Python functions. [Advanced lab: Interactive visualisation]
Week 9 [PUBLIC HOLIDAY]; Predicting a category (classification) and evaluating a classifier; scikit-learn Python library [Advanced lab: Interactive visualisation]
Assessment Due: Project stage 1
Week 10 Sharing data; Predicting a numeric value (regression) and evaluation a regression; Exception-handling in Python. [Advanced lab: Interactive visualisation]
Assessment Due: Python coding test*
Week 11 Data management policies; Clustering; Introduction to classes and objects in Python. [Advanced lab: Interactive visualisation]
Week 12 Notebooks, workflow, provenance; Recommendation; Software quality issues. [Advanced lab: Interactive visualisation]
Assessment Due: Project stage 2
Week 13 Review of semester; further study of data science or programming; preview of exam
Exam Period Assessment Due: Exam

Course Relations

The following is a list of courses which have added this Unit to their structure.

Course Year(s) Offered
Software Engineering (mid-year) 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025
Software / Project Management 2019+ 2023, 2024, 2025
Software Engineering 2017, 2018, 2019, 2020, 2021, 2022, 2023, 2024, 2025
Software / Arts 2023+ 2023, 2024, 2025
Software / Commerce 2023+ 2023, 2024, 2025
Software / Science 2023, 2024, 2025
Software / Science - Mid Year 2023, 2024, 2025
Software / Law 2023+ 2023, 2024, 2025

Course Goals

This unit contributes to the achievement of the following course goals:

Attribute Practiced Assessed
Unit has not been assigned any attributes yet.

These goals are selected from Engineering & IT Graduate Outcomes Table 2018 which defines overall goals for courses where this unit is primarily offered. See Engineering & IT Graduate Outcomes Table 2018 for details of the attributes and levels to be developed in the course as a whole. Percentage figures alongside each course goal provide a rough indication of their relative weighting in assessment for this unit. Note that not all goals are necessarily part of assessment. Some may be more about practice activity. See Learning outcomes for details of what is assessed in relation to each goal and Assessment for details of how the outcome is assessed. See Attributes for details of practice provided for each goal.