Status:
Published

Workload6 ECTS = 180 hrs

Credits, Weight6 ECTS, (n.s.)

Language of Instruction
English

Semester
(n.s.)

Duration1 Sem.

M/E
Elective

Courses

Course No. | Type | Name | MA/EL | Workload | Credits | Contact Hours | Selfstudy | Group Size |
---|---|---|---|---|---|---|---|---|

04IN2043-1 |
Lecture | Data Science | MA | 3 ECTS = 90 hrs | 3 ECTS | 2 hrs/week = 30 hrs | 60 hrs | 60 |

04IN2043-2 |
Seminar/Exercise | Data Science | MA | 3 ECTS = 90 hrs | 3 ECTS | 2 hrs/week = 30 hrs | 60 hrs | 30 |

Learning Outcomes

A good understanding of tasks and challenges in data analysis. The student should be able to understand the statistical foundations of data analysis and be able to apply them in big data settings. For this purpose, student should become familiar with theoretical foundations of data engineering and large-scale data analysis and data-analysis platforms.

Content

(not specified)

- 04IN2043-1 - Data Science
- Data Science describes a set of methods for handling data-intensive problems. The topic connects several disciplines such as physics, biology, social sciences and economics. It uses elaborate computer science paradigms and needs a background in statistics.More specifically the lecture will cover the topics:Topics:
- Data science: history and background, change of paradigm from statistics to programming
- Problem scenarios will mostly deal with open data, such as found on the Web and open statistical data
- Background in statistics

Details of computing statistics and determining the quality of a probabilistic model. In particular, we will look at distributions commonly used for modeling:- Uniform distribution
- Normal distribution
- Exponential distribution
- Power law distribution
- Poisson distribution
- Log normal distribution

- Students' t-test (valid only for normal distributions)
- Chi square
- ANOVA
- Kulback-Leibler and Jensen-Shannon
- Kolmogorov-Smirnovv

- Hypothesis driven research
- Hypothesis testing
- Statistics fallacies
- Applications

- Programming paradigms
- Relational and NoSQL Database Management Systems
- Parallel task processing: Gridgain
- MapReduce (Hadoop/Spark)
- Graph Paradigms (GraphLab, neo4j, RDF Databases)

- Visualization
- Simple machine learning on large scale data
- Example application domain: text
- n-grams
- p-grams
- generalized n-grams (gappy n-grams)

- Privacy

Teaching Methods

(not specified)

Prerequisites

This module requires basic understanding of algorithmics and programming as well as basic knowledge in linear algebra and statistics.

Examination Methods

oral or written exam

Credit Requirements

(not specified)

References

(not specified)

- 04IN2043-1 - Data Science
- Anand Rajaraman, Jeffrey Ullman, Jure Leskovec, Mining of Massive Datasets, Cambridge University Press (free download)
- Jeffrey Stanton, Introduction to Data Science

Use of this Module

- unmodified as Elective - BSc Computer Science 2017 - Mandatory elective courses Computer Science - Data Science
- unmodified as Elective - BSc Computational Visualistics 2017 - Mandatory elective courses Computer Science - Data Science
- unmodified as Elective - MSc Computer Science 2017 - Mandatory elective courses in mathematics and theoretical computer science - Data Science
- unmodified as Elective - MSc Computer Science 2017 - Mandatory elective courses Computer Science - Data Science
- unmodified as Elective - MSc Computer Science 2017 - Major subject computer science - Data and Knowledge Engineering - Data Science
- unmodified as Elective - MSc Computational Visualistics 2017 - Mandatory elective courses Computer Science - Data Science
- unmodified as Elective - MSc Computational Visualistics 2017 - Mandatory elective courses in Computational Visualistics or computer science - Data Science
- unmodified as Elective - MSc Computational Visualistics 2017 - Mandatory elective courses in theoretical computer science and mathematics - Data Science
- unmodified as Elective - MSc Computational Visualistics 2017 - Mandatory elective courses in theoretical computer science and mathematics or natural and social sciences - Data Science
- unmodified as Elective - MSc Web Science 2017 - Mandatory elective courses Computer Science - Data Science

Responsible / Organizational Unit

Staab, Steffen / Institute for Computer Science

Additional Information

(not specified)

- 04IN2043-1 - Data Science
Octave, R

Last change

Apr 24, 2018
by
Frey, Johannes

Last Change Module

Jan 17, 2014
by
Frey, Johannes

MoMa - Module Manual - Version 1.14 - © 2018 Faculty 4: Computer Science - MoMa Team - Legal Notice