ECS 265: Distributed Database Systems (4)
Lecture: 3 hours
Discussion: 1 hour
Prerequisite: Course 165A
Grading: Letter; projects (40%), final (30%), homework (30%)
Description:
Lecture topics cover advanced database concepts, including Data Integration, Query Rewriting, Distributed Query Processing and Optimization. Project topics can also include: XML querying; data stream processing; data-intensive, parallel computing (e.g. MapReduce), and data provenance management.
Goals:
This course focuses on principles and practice of advanced database system applications. Students will learn foundations and principles of these applications through lectures, homework, and reading assignments, and then complement this knowledge with a hands-on practical project.
The lecture topics are roughly 1/3 foundations of databases, 1/3 engineering aspects, and 1/3 applications. Foundations and engineering aspects are also covered via homeworks and reading assignments; applications are mainly covered through the class projects. In the project work, the emphasis will be on employing practically relevant, innovative technologies e.g. use of MapReduce-style processing, novel cloud computing services, etc.
Revised Course Description:
- Introduction
- Course overview
- Refresher: Relational Model: SQL, Relational Algebra
- Foundations: Queries as mappings, Datalog
- Foundations of Data Integration & Exchange
- View-based data integration: Global-As-View, Local-As-View
- Query containment, limited access patterns (cf. web services)
- Query-answering & rewriting over distributed sources
- Distributed Query Processing and Optimization
- Query decomposition and data localization
- Centralized vs. distributed query optimization
- Join ordering
- Applications & Project Topics, e.g.:
- MapReduce, Hadoop, HadoopDB, Cloud computing, ...
- Querying and transforming semistructured data (XML), e.g.:
- Introduction to XPath, XQuery; stream-based XML querying; data provenance
- Declarative Networking
Textbook:No special textbook required. Instead, a collection of papers addressing specific topics will be
distributed in class.
Readings
Computer Usage:
Students work individually on homeworks and jointly on the class project using open source tools and on CS or remote machines as provided to participants of this class.
Engineering Design Statement:
The projects involve design, implementation and verification of database
applications in a distributed database environment as well as the analysis
and verification of query processing algorithms. The systems and tools used
for these projects resemble those that would be found in industry to the
extent possible, including the standard database query languages SQL and
systems such as MySQL, PostgreSQL, Oracle, DB2, etc. Projects are graded based on the
design, performance, and correctness, including documentation. Examination
questions are based on models and techniques discussed in the lecture and
from the projects.
Instructor: Bertram Ludäscher
Prepared by: B. Ludäscher (Sept. 2009)
THIS COURSE DOES NOT DUPLICATE ANY EXISTING COURSE