Schedule at a glance

  • June 15

  • Morrning

    • 8:00
    • 9:00
    • 10:00
    • 11:00
    • 12:00
     
    • 8:00
    • 9:00
    • 10:00
    • 11:00
    • 12:00
     

    Cultural Centre

    8:00 - 9:00

    Registration

     

    HG02

    9:00 - 12:00

    Tutorial 1

    Yufei Tao

    Tutorial 1Yufei Tao, Professor, The Chinese University of Hong Kong

    Title: Skylines: Forgetting Heuristics

    Abstract: The skyline operator, since its introduction to the database community, has been thoroughly studied. However, the database community is starting to be bored by a plethora of existing heuristic algorithms claimed to be effective on “real data”, but lacking rigorous theoretical performance guarantees. In this lecture, we will focus on state-of-the-art algorithms solving this problem with excellent bounds on I/O efficiency. As a side product, the lecture will also unveil many of the “magics” behind provably efficient I/O algorithms, such that students may even be able to start designing such algorithms in their own research right away.
     
     
  • Lunch (Lord Stow’s Bakery, UMAC Library,12:00~13:30)
  • Afternoon

    • 13:00
    • 14:00
    • 15:00
    • 16:00
    • 17:00
    • 18:00
     
    • 13:00
    • 14:00
    • 15:00
    • 16:00
    • 17:00
    • 18:00
     

    HG02

    13:30 - 14:30

    Keynote

    Wen-Syan Li, SAP

    14:30 - 15:30

    Talk 1

    Nan Zhang

    16:00 - 17:00

    Talk 2

    Jian Li

    17:00 - 18:00

    Talk 3

    Gao Cong

    KeynoteWen-Syan Li (Waim2014 Keynote Speaker Summer School Instructor)
    Vice President, SAP

    Title: The New Role of DBMS in Enterprise Application Development

    Abstract: Historically speaking, the DBMS was widely adopted in the development of traditional enterprise applications. Yet, in order to maximize cross-platform support & compatibility, most application logics were designed in the middle layer with a general data access adapter, so the use of DBMS was limited to the CRUD access. Nowadays, the role of DMBS has undergone a brand new shift. In the era of Big Data, more and more calculations & analyses are required to be undertaken by the DBMS for extreme performance. Hence, the DBMS has been transformed from a pure data repository into a new computation platform. Meanwhile, to avoid mass data movement, traditional three-tier architecture has been simplified into two-tier via new programming model.
    To accommodate these trends, SAP HANA, a real-time data platform with revolutionary in-memory & columnar table characteristics, is introduced. This breakthrough data & application platform brings the calculation logic close to data, combines parallel computing with algorithm optimization framework, enhances both existing & new applications with extreme performance, and makes previously impossible business scenarios possible.
    We will also discuss several design topics for the data platform during this transition, e.g. the comparison between general-purpose platform and specialized platform; how to design the online/nearline/offline storage and how to achieve balance in the single/dual stack architecture.
    In this lecture, we will have in-depth and detailed discussion on the new role of DBMS in enterprise application development. Also we will provide a case study of SAP HANA for extreme applications. In the end, several fancy innovation project demos will be shown.
     
    Talk 1Nan Zhang, Professor, George Washington University

    Title: Exploration and Privacy Preservation Over Deep Web Databases

    Abstract: A large number of online databases are hidden in the “deep web” and only accessible through restrictive search or browsing web interfaces. We consider third-party data analytics over these hidden databases, specifically, how to break through the barrier set forth by the restrictive web interfaces to truly gauge the value of deep web data through crawling, sampling, and aggregate estimations. We also explain how the recent advancements of such data analytics techniques pose significant privacy threats to certain sensitive aggregate information over hidden databases. The protection of sensitive aggregates stands in sharp contrast to the traditional privacy problem where individual tuples must be protected while ensuring access to aggregating information. We propose privacy-preserving techniques to suppress the inference of aggregate information from hidden databases.
     
    Talk 2Jian Li, Assistant Professor, Tsinghua University

    Title: Handling Uncertainty in Data Management

    Abstract: Almost all important decision problems are inevitably subject to some level of uncertainty either about data measurements, the parameters, or predictions describing future evolution. The significance of handling uncertainty is further amplified by the large volume of uncertain data automatically generated by modern data gathering or integration systems. Various types of problems of decision making under uncertainty have been subject to extensive research in computer science, economics and social science. In this talk, I will suvery some recent research efforts on dealing with uncertain data in database community and algorithm community.
     
    Talk 3Gao Cong, Assistant Professor, Nanyang Technological University (NTU)

    Title: Indexing and Querying Geo-textual data

    Abstract: Massive amount of data that are geo-tagged and associated with text information are being generated at an unprecedented scale. First, increasing volume of user generated content on the Web is being associated with geo-locations. Example user generated content includes geo-tagged micro-blogs (e.g., Twitter), photos with both tags and geo-locations in social photo sharing websites (e.g., Flickr), and check-in information on places in location-based social networks (e.g., FourSquare). Second, points of interest are increasingly associated with text in local search services and online yellow pages. This development calls for techniques that enable the indexing of data that contains both text descriptions and geo-locations in order to support the efficient processing of spatial-keyword queries that take a geo-location or a region and a set of keywords as arguments and return relevant content that matches the arguments. A number of geo-textual indices have been developed and different types of spatial-keyword queries are proposed to meet the various needs of users. The talk covers recent results on spatial-keyword querying based on geo-textual indices.
     
     
  • June 16

  • Morrning

    • 8:00
    • 9:00
    • 10:00
    • 11:00
    • 12:00
     
    • 8:00
    • 9:00
    • 10:00
    • 11:00
    • 12:00
     

    Cultural Centre

    8:00 - 9:00

    Registration

     

    HG02

    9:00 - 12:00

    Tutorial 2

    Jiawei Han

    Tutorial 2Jiawei Han, Abel Bliss Professor, Univ. of Illinois at Urbana-Champaign

    Title: Construction, Exploration and Mining of Semi-Structured, Heterogeneous Information Networks

    Abstract: People and informational objects are interconnected, forming gigantic, interconnected, integrated information networks. By structuring these data objects into multiple types, such networks become semi-structured heterogeneous information networks. Most real world applications that handle big data, including interconnected social media and social networks, medical information systems, online e-commerce systems, or database systems, can be structured into typed, semi-structured, heterogeneous information networks. For example, in a medical care network, objects of multiple types, such as patients, doctors, diseases, medication, and links such as visits, diagnosis, and treatments are intertwined together, providing rich information and forming heterogeneous information networks. Effective construction, exploration and analysis of large-scale heterogeneous information networks poses an interesting but critical challenge.

    In this talk, we first present a set of data mining scenarios in heterogeneous social and information networks and show that mining typed, heterogeneous networks is a new and promising research frontier in data mining research. Departing from many existing network models that view data as homogeneous graphs or networks, the semi-structured heterogeneous information network model leverages the rich semantics of typed nodes and links in a network and can uncover surprisingly rich knowledge from interconnected data. This heterogeneous network modeling will lead to the discovery of a set of new principles and methodologies for mining and exploring interconnected data, such as rank-based clustering and classification, meta path-based similarity search, and meta path-based link/relationship prediction. Then we discuss our recent progress on construction of quality semi-structured heterogeneous information networks from unstructured data. We will also point out some promising research directions in this domain.
     
     
  • Lunch (Lord Stow’s Bakery, UMAC Library,12:30~14:00)
  • Afternoon

    • 14:00
    • 15:00
    • 16:00
    • 17:00
     
    • 14:00
    • 15:00
    • 16:00
    • 17:00
     

    HG02

    14:00 - 17:00

    Tutorial 3

    Amr El Abbadi

    Tutorial 3Amr El Abbadi, Professor,University of California at Santa Barbara.

    Title: The Distributed and Database Foundations of Cloud-based Data Management

    Abstract: Over the past few decades, database and distributed systems researchers have made significant advances in the development of protocols and techniques to provide data management solutions that carefully balance three major requirements when dealing with critical data: high availability, fault-tolerance, and data consistency. However, over the past few years the data requirements, in terms of data availability and system scalability for Internet scale enterprises that provide services and cater to millions of users, have been unprecedented. Cloud computing has emerged as an extremely successful paradigm for deploying Internet and Web-based applications. Scalability, elasticity, pay-per-use pricing, and autonomic control of large-scale operations are the major reasons for the successful widespread adoption of cloud infrastructures. In this seminar, we will first discuss some of the critical distributed systems and database protocols that are essential for understanding current large scale data management. We analyze the design choices that allowed modern NoSQL data management systems (key-value stores) to achieve orders of magnitude higher levels of scalability compared to traditional databases, and lay the foundations for the integration of consistent transactional semantics for data management in the Cloud.
     
     

June 17-18: Attend WAIM Conference