Module Import 04IN2043 - Data Science

Status: Published
Workload6 ECTS = 180 hrs
Credits, Weight6 ECTS, (n.s.)
Language of Instruction English
Semester (n.s.)
Duration1 Sem.
M/E Elective
Courses
Course No. Type Name MA/EL Workload Credits Contact Hours Selfstudy Group Size
04IN2043-1 Lecture Data Science MA 3 ECTS = 90 hrs 3 ECTS 2 hrs/week = 30 hrs 60 hrs 60
04IN2043-2 Seminar/Exercise Data Science MA 3 ECTS = 90 hrs 3 ECTS 2 hrs/week = 30 hrs 60 hrs 30
Learning Outcomes

A good understanding of tasks and challenges in data analysis. The student should be able to understand the statistical foundations of data analysis and be able to apply them in big data settings. For this purpose, student should become familiar with theoretical foundations of data engineering and large-scale data analysis and data-analysis platforms.

Content

(not specified)

04IN2043-1 - Data Science
Data Science describes a set of methods for handling data-intensive problems. The topic connects several disciplines such as physics, biology, social sciences and economics. It uses elaborate computer science paradigms and needs a background in statistics.
More specifically the lecture will cover the topics:
Topics:
  1. Data science: history and background, change of paradigm from statistics to programming
  2. Problem scenarios will mostly deal with open data, such as found on the Web and open statistical data
  3. Background in statistics
    Details of computing statistics and determining the quality of a probabilistic model. In particular, we will look at distributions commonly used for modeling:
    • Uniform distribution
    • Normal distribution
    • Exponential distribution
    • Power law distribution
    • Poisson distribution
    • Log normal distribution
    And we will look at quality measures such as:
    • Students' t-test (valid only for normal distributions)
    • Chi square
    • ANOVA
    • Kulback-Leibler and Jensen-Shannon
    • Kolmogorov-Smirnovv
  4. Hypothesis driven research
    • Hypothesis testing
    • Statistics fallacies
    • Applications
  5. Programming paradigms
    • Relational and NoSQL Database Management Systems
    • Parallel task processing: Gridgain
    • MapReduce (Hadoop/Spark)
    • Graph Paradigms (GraphLab, neo4j, RDF Databases)
  6. Visualization
  7. Simple machine learning on large scale data
  8. Example application domain: text
    • n-grams
    • p-grams
    • generalized n-grams (gappy n-grams)
  9. Privacy
Teaching Methods

(not specified)

Prerequisites

This module requires basic understanding of algorithmics and programming as well as basic knowledge in linear algebra and statistics.

Examination Methods

oral or written exam

Credit Requirements

(not specified)

References

(not specified)

04IN2043-1 - Data Science
  1. Anand Rajaraman, Jeffrey Ullman, Jure Leskovec, Mining of Massive Datasets, Cambridge University Press (free download)
  2. Jeffrey Stanton, Introduction to Data Science
Use of this Module
  1. unmodified as Elective  -    BSc Computer Science 2017  -    Mandatory elective courses Computer Science  -    Data Science
  2. unmodified as Elective  -    BSc Computational Visualistics 2017  -    Mandatory elective courses Computer Science  -    Data Science
  3. unmodified as Elective  -    MSc Computer Science 2017  -    Mandatory elective courses in mathematics and theoretical computer science  -    Data Science
  4. unmodified as Elective  -    MSc Computer Science 2017  -    Mandatory elective courses Computer Science  -    Data Science
  5. unmodified as Elective  -    MSc Computer Science 2017  -    Major subject computer science  -    Data and Knowledge Engineering  -    Data Science
  6. unmodified as Elective  -    MSc Computational Visualistics 2017  -    Mandatory elective courses Computer Science  -    Data Science
  7. unmodified as Elective  -    MSc Computational Visualistics 2017  -    Mandatory elective courses in Computational Visualistics or computer science  -    Data Science
  8. unmodified as Elective  -    MSc Computational Visualistics 2017  -    Mandatory elective courses in theoretical computer science and mathematics  -    Data Science
  9. unmodified as Elective  -    MSc Computational Visualistics 2017  -    Mandatory elective courses in theoretical computer science and mathematics or natural and social sciences  -    Data Science
  10. unmodified as Elective  -    MSc Web Science 2017  -    Mandatory elective courses Computer Science  -    Data Science
Responsible / Organizational Unit
Staab, Steffen / Institute for Computer Science
Additional Information

(not specified)

04IN2043-1 - Data Science

Octave, R

Last change
Apr 24, 2018 by Frey, Johannes
Last Change Module
Jan 17, 2014 by Frey, Johannes