ECS 265: Distributed Database Systems  (4)

Lecture: 3 hours

Discussion: 1 hour

Prerequisite: Course 165A

Grading: Letter; projects (40%), final (30%), homework (30%) 

Lecture topics cover advanced database concepts, including Data Integration, Query Rewriting, Distributed Query Processing and Optimization. Project topics can also include:  XML querying; data stream processing; data-intensive, parallel computing (e.g. MapReduce), and data provenance management.

This course focuses on principles and practice of advanced database system applications.  Students will learn foundations and principles of these applications through lectures, homework, and reading assignments, and then complement this knowledge with a hands-on practical project.

The  lecture topics are roughly 1/3  foundations of databases, 1/3 engineering aspects, and 1/3 applications. Foundations and engineering aspects are also covered via homeworks and reading assignments; applications are mainly covered through the class projects. In the project work, the emphasis will be on employing practically relevant, innovative technologies e.g. use of MapReduce-style processing, novel cloud computing services, etc.

Revised Course Description:

  1. Introduction
    • Course overview
    • Refresher: Relational Model: SQL, Relational Algebra
    • Foundations: Queries as mappings, Datalog
  2. Foundations of Data Integration & Exchange
    • View-based data integration: Global-As-View, Local-As-View
    • Query containment, limited access patterns (cf. web services)
    • Query-answering & rewriting over distributed sources
  3. Distributed Query Processing and Optimization
    • Query decomposition and data localization
    • Centralized vs. distributed query optimization
    • Join ordering
  4. Applications & Project Topics, e.g.:
    • MapReduce, Hadoop, HadoopDB, Cloud computing, ... 
    • Querying and transforming semistructured data (XML), e.g.:
      • Introduction to XPath, XQuery; stream-based XML querying; data provenance
    • Declarative Networking
No special textbook required. Instead, a collection of papers addressing specific topics will be distributed in class.

Computer Usage:
Students work individually on homeworks and jointly on the class project using open source tools and on CS or remote machines as provided to participants of this class.

Engineering Design Statement:
The projects involve design, implementation and verification of database applications in a distributed database environment as well as the analysis and verification of query processing algorithms. The systems and tools used for these projects resemble those that would be found in industry to the extent possible, including the standard database query languages SQL and systems such as MySQL, PostgreSQL, Oracle, DB2, etc. Projects are graded based on the design, performance, and correctness, including documentation. Examination questions are based on models and techniques discussed in the lecture and from the projects.

Instructor: Bertram Ludäscher

Prepared by: B. Ludäscher  (Sept. 2009)