Welcome!
General Info
- Call for Papers
- Call for Workshops
- Call for Demos
- Call for Participation
- Paper Submission
- Important Dates
- Publication Instructions
Program
- Program at Glance
- Keynotes
- Accepted Papers
- Best Papers
- Workshops
- Demo
- DYL Series
- Panel
- Industry
- MSRA Summer School
- Cloud Tag
Participants
Organization
Travel Information
WAIM 2013 / MSRA Summer School
COURSE INTRODUCTIONCourse 1: A Tutorial on Probabilistic Databases
Lecturer: Prof. Dan Suciu University of Washington
A major challenge in modern data management is how to cope with uncertainty in the data, such as in data extracted from text, in physical or RFID data, or in fuzzy data integration. In a probabilistic database uncertainty is modeled using probabilities, and data management techniques are extended to cope with probabilistic data. This tutorial will discuss the main challenge in probabilistic databases, which is query evaluation. Each answer to a SQL query has a degree of certainty, defined as the probability that the answer is present. This problem is equivalent to computing the probability of a Boolean formula, or to the model counting problem, which has been extensively studied in the AI and model checking literature, and is known to be intractable in general (#P-complete). The approach taken in probabilistic databases, however, is entirely novel, since here we can separate between the query and the data. By a careful static analysis of the SQL query we can identify many cases when the probabilistic inference problem is in PTIME, an approach that lead to the discovery of entirely new classes of tractable Boolean formulas. Even when the query is #P-hard, we can approximate the query answer by evaluating a dissociated version of the query, which can be done in PTIME. In all cases, the SQL query is entirely rewritten into a (more complex) SQL query that manipulates probabilities directly, and which can be computed in a standard relational database system.
The tutorial has four parts:
1. Motivation and Basic Definitions
-- sample applications
-- tuple/attribute level uncertainty
-- the possible worlds semantics
-- the query evaluation problem and its complexity (#P-hard)
2. Extensional Query Plans and Safe Plans
-- join, group-by (projection), selection, union, summation
-- safe and unsafe plans
-- converting safe plans back into SQL; demonstration in postgres
-- the dissociation theorem for approximate query evaluation
3. Extensional Query Evaluation
-- Conjunctive queries without self-joins; hierarchical queries
-- General queries and the inclusion/exclusion formula
-- the Moebuius function in a lattice
-- query shattering and ranking
-- the dichotomy theorem
4. Intensional Query Evaluation (advanced) -- Lineage
-- the DPLL class of algorithms for model counting
-- approaches to model counting: read-once formulas, OBDDs, FBDDs, d-DNNFs
-- query compilation and the characterization theorems
-- open problems
The tutorial assumes basic familiarity with probability theory and with simple SQL queries. No background is need in database internals, graphical models, or model counting. Most of the material covered in the tutorial is also available in: Suciu, Olteanu, R¨¦, Koch: Probabilistic Databases. Synthesis Lectures on Data Management, Morgan & Claypool Publishers 2011,
http://www.morganclaypool.com/doi/abs/10.2200/S00362ED1V01Y201105DTM016
Prof.Dan Suciu Computer Science & Engineering University of Washington Box 352350 suciu@cs.washington.edu |
Course 2: SQL, NoSQL, NewSQL and Other Interesting Ways to Process Big Data
Lecturer: Prof. Michael Franklin UC Berkeley
In this four-hour mini course he will cover various techniques for analyzing Big Data at scale. he'll give a bit of background and history about massively parallel query processing in database systems (a topic he first starting working on over 25 years ago) and then cover more recent massively parallel data processing infrastructure such as Map Reduce and Hadoop. The goal here will be to compare and contrast these approaches while avoiding sounding too much like a grumpy old database guy. Then, he'll describe the data analytics stack that they are building in the Berkeley AMPLab (called the BDAS - the Berkeley Data Analytics Stack) including the popular Spark and Shark systems as well as more recent efforts to extend these for supporting advanced techniques such as Stream processing, Graph processing and Machine Learning. The overall goal is to give a broad survey of this incredibly active area of research, with some ideas and pointers for areas of research opportunity. he apologize in advance that this will not be a comprehensive survey - there's just way too much going on!
Prof. Michael Franklin Professor and Director of AMPLab Dept of Computer Science The University of California at Berkeley USA |
Course 3: Usability in Database Systems
Lecturer:H V Jagadish While usability has long been recognized as an important virtue for a database system, there has been a recent strong push to improve database systems in this regard.
In this tutorial, I will present an overview of recent work in this regard, organized according to a recently developed framework in which to classify this research, based on both the \emph{data life-cycle} and on the steps of user interaction.
H V Jagadish Bernard A Galler Collegiate Professor of Elec. Engg. and Computer Science. University of Michigan jag at eecs . umich . edu |
Professor Jagadish is well-known for his broad-ranging research on information management, and has approximately 200 major papers and 37 patents. He is a fellow of the ACM ("The First Society in Computing") and serves on the board of the Computing Research Association, and is the Founding Editor-in-Chief of the Proceedings of the VLDB Endowment (since 2008).
Download Summer School Application Form & Summer School Accommodation Form