Tutorials

Tutorial 1       Friday, July 7th 13:30 – 15:30, Conference Room 4 (4号会议室)

Title: Urban Computing: Enable Intelligent Cities with Big Data and AI Technology



Speaker: Yu Zheng - Microsoft Research

Abstract: Urban computing is a process of acquisition, integration, and analysis of big and heterogeneous data generated by a diversity of sources in cities to tackle urban challenges, e.g. air pollution, energy consumption and traffic congestion. Urban computing connects unobtrusive and ubiquitous sensing technologies, advanced data management and AI models, and novel visualization methods, to create win-win-win solutions that improve urban environment, human life quality, and city operation systems. Urban computing is an inter-disciplinary field where computer science meets urban planning, transportation, economy, the environment, sociology, and energy, etc., in the context of urban spaces. In this tutorial, I will overview the framework of urban computing, discussing its key challenges and methodologies from computer science perspective. This tutorial will also present a diversity of urban computing applications, ranging from big data-driven environmental protection to transportation, from urban planning to urban economy. The research has been not only published at prestigious conferences but also deployed in the real world. More details can be found on http://research.microsoft.com/en-us/projects/urbancomputing/default.aspx
Bio: Dr. Yu Zheng is a senior research manager in Urban Computing Group at Microsoft Research. He is also a Chair Professor at Shanghai Jiao Tong University, an Adjunct Professor at Hong Kong University of Science and Technology, and Hong Kong Polytechnic University. Zheng currently serves as the Editor-in-Chief of ACM Transactions on Intelligent Systems and Technology and the founding Secretary of SIGKDD China Chapter. He has served as chair on over 10 prestigious international conferences, e.g. as the program co-chair of ICDE 2014 (Industrial Track) and CIKM 2017 (Industrial Track). He publishes referred papers frequently as a leading author at prestigious conferences and journals, such as KDD, VLDB, UbiComp, and IEEE TKDE. Those papers have been cited over 12,000 times in recent five years (Google Scholar H-Index: 53). His book, titled Computing with Spatial Trajectories? has been used as a text book in universities worldwide and honored as the Top 10 Most Popular Computer Science Book authored by Chinese at Springer. Zheng has received 3 technical transfer awards from Microsoft and 24 granted/filed patents. His technology has been transferred to Microsoft Products like Bing Maps and CityNext. One of his projects, entitled Urban Air, has been deployed with the Chinese Ministry of Environmental Protection, predicting air quality for over 300 Chinese cities based on big data. He also leads a China pilot project on urban big data platform, which has been deployed in Guiyang City. In 2013, he was named one of the Top Innovators under 35 by MIT Technology Review (TR35) and featured by Time Magazine for his research on urban computing. In 2014, he was named one of the Top 40 Business Elites under 40 in China by Fortune Magazine, because of the business impact of urban computing he has been advocating since 2008. In 2016, he is honored as an ACM Distinguished Scientist. More details at https://www.microsoft.com/en-us/research/people/yuzheng/.


Tutorial 2       Saturday, July 8th 13:30 – 15:30, Conference Room 4 (4号会议室)

Title: Crowdsourced Data Management: Overview and Challenge



Speaker: Guoliang Li - Tsinghua University

Abstract: Many important data management and analytics tasks cannot be completely addressed by automated processes. Crowdsourcing is an effective way to harness human cognitive abilities to process these computer-hard tasks, such as entity resolution, sentiment analysis, and image recognition. Crowdsourced data management has been extensively studied in research and industry recently. In this tutorial, we will survey and synthesize a wide spectrum of existing studies on crowdsourced data management. We first give an overview of crowdsourcing, and then summarize the fundamental techniques, including quality control, cost control, and latency control, which must be considered in crowdsourced data management. Next we review crowdsourced operators, including selection, collection, join, top-k, sort, categorize, aggregation, skyline, planning, schema matching, mining and spatial crowdsourcing. We also discuss crowdsourcing optimization techniques and systems. Finally, we provide the emerging challenges.
Bio: Guoliang Li is an Associate Professor of Department of Computer Science, Tsinghua University, Beijing, China. His research interests include big spatio-temporal data analytics, crowd computing, large-scale data cleaning and integration. He has published more than 80 papers in premier conferences and journals, such as SIGMOD, VLDB, ICDE, SIGKDD, SIGIR, TODS, VLDB Journal, and TKDE. He is a PC co-chair of WAIM 2014, WebDB 2014, and NDBC 2016. He has regularly served as the PC members of many premier conferences, such as SIGMOD, VLDB, KDD, ICDE, WWW, IJCAI, and AAAI. His papers have been cited more than 4000 times. He received IEEE TCDE Early Career Award, Young ChangJiang Scholar, NSFC Excellent Young Scholars Award, CCY Young Scientist.

HP: http://dbgroup.cs.tsinghua.edu.cn/ligl



Tutorial 3       Sunday, July 9th 10:30 – 12:00, Juying Ballroom (聚英厅)

Title: Meta Paths and Meta Structures: Analysing Large Heterogeneous Information Networks



Speaker: Reynold Cheng - University of Hong Kong

Abstract: A heterogeneous information network (HIN) is a graph model in which objects and edges are annotated with types. Large and complex databases, such as YAGO and DBLP, can be modeled as HINs. A fundamental problem in HINs is the computation of closeness, or relevance, between two HIN objects. Relevance measures, such as PCRW, PathSim, and HeteSim, can be used in various applications, including information retrieval, entity resolution, and product recommendation. These metrics are based on the use of meta-paths, essentially a sequence of node classes and edge types between two nodes in a HIN. In this tutorial, we will give a detailed review of meta-paths, as well as how they are used to define relevance. In a large and complex HIN, retrieving meta paths manually can be complex, expensive, and error-prone. Hence, we will explore systematic methods for finding meta paths. In particular, we will study a solution based on the Query-by-Example (QBE) paradigm, which allows us to discovery meta-paths in an effective and efficient manner.

We further generalize the notion of meta path to ‘‘meta structures, which is a directed acyclic graph of object types with edge types connecting them. Meta structure, which is more expressive than the meta path, can describe complex relationship between two HIN objects (e.g., two papers in DBLP share the same authors and topics). We develop three relevance measures based on meta structure. Due to the computational complexity of these measures, we also study an algorithm with data structures proposed to support their evaluation. Finally, we will examine solutions for performing query recommendation based on meta-paths. We will also discuss future research directions in HINs.

Bio: Dr. Reynold Cheng is an Associate Professor of the Department of Computer Science in the University of Hong Kong. He was an Assistant Professor in HKU in 2008-11. He received his BEng (Computer Engineering ) in 1998, and MPhil ( Computer Science and Information Systems ) in 2000, from the Department of Computer Science in the University of Hong Kong. He then obtained his MSc and PhD from Department of Computer Science of Purdue University in 2003 and 2005 respectively. Dr. Cheng was an Assistant Professor in the Department of Computing of the Hong Kong Polytechnic University during 2005-08. He was a visiting scientist in the Institute of Parallel and Distributed Systems in the University of Stuttgart during the summer of 2006.

Dr. Cheng was granted an Outstanding Young Researcher Award 2011-12 by HKU. He was the recipient of the 2010 Research Output Prize in the Department of Computer Science of HKU. He also received the U21 Fellowship in 2011. He received the Performance Reward in years 2006 and 2007 awarded by the Hong Kong Polytechnic University. He is the Chair of the Department Research Postgraduate Committee, and was the Vice Chairperson of the ACM ( Hong Kong Chapter ) in 2013. He is a member of the IEEE, the ACM, the Special Interest Group on Management of Data ( ACM SIGMOD ), and the UPE (Upsilon Pi Epsilon Honor Society). He is an editorial board member of TKDE, DAPD and IS, and was a guest editor for TKDE, DAPD, and Geoinformatica. He is an area chair of ICDE 2017, a senior PC member for DASFAA 2015, PC co-chair of APWeb 2015, area chair for CIKM 2014, area chair for Encyclopedia of Database Systems, program co-chair of SSTD 2013, and a workshop co-chair of ICDE 2014. He received an Outstanding Service Award in the CIKM 2009 conference. He has served as PC members and reviewer for top conferences (e.g., SIGMOD, VLDB, ICDE, EDBT, KDD, ICDM, and CIKM) and journals (e.g., TODS, TKDE, VLDBJ, IS, and TMC).



Tutorial 4       Sunday, July 9th 13:30 – 15:30, Conference Room 4 (4号会议室)

Title: Reliability and Influence Maximization Queries over Uncertain Graphs



Speaker: Arijit Khan - Nanyang Technological University

Abstract: Large-scale, highly-interconnected networks pervade both our society and the natural world around us. Uncertainty, on the other hand, is inherent in the underlying data due to a variety of reasons, such as noisy measurements, lack of precise information needs, inference and prediction models, or explicit manipulation, e.g., for privacy purposes. Therefore, uncertain, or probabilistic, graphs are increasingly used to represent noisy linked data in many emerging application scenarios, and they have recently become a hot topic in the database research community. While many classical graph algorithms such as reachability and shortest path queries become #P-complete, and hence, more expensive in uncertain graphs; various complex queries are also emerging over uncertain networks, including information diffusion and influence maximization. In this tutorial, we discuss the sources of uncertain graphs and their applications, uncertainty modeling, as well as the complexities and algorithmic advances on uncertain graphs processing in the context of both classical (reliability) and emerging (influence maximization) graph queries. We emphasize the current challenges and highlight some future research directions.
Bio: Arijit Khan is an Assistant Professor in the School of Computer Science and Engineering at Nanyang Technological University, Singapore. His research interests span in the area of big-data, big-graphs, and graph systems. He earned his PhD from the Department of Computer Science, University of California, Santa Barbara, USA, and did a post-doc in the Systems group at ETH Zurich, Switzerland. Arijit is the recipient of the prestigious IBM PhD Fellowship in 2012-13. He published several papers in premier database and data-mining conferences and journals including SIGMOD, VLDB, TKDE, ICDE, SDM, EDBT, and CIKM. Arijit co-presented tutorials on emerging graph queries, big-graph systems, uncertain graphs, and graphs summarization at ICDE 2012, VLDB 2014, VLDB 2015, VLDB 2017, and served in the program committee of KDD, SIGMOD, VLDB, ICDE, ICDM, EDBT, WWW, and CIKM. Arijit served as the co-chair of Big-O(Q) workshop co-located with VLDB 2015, and contributed a chapter on Big-Graphs querying and mining in the Springer Handbook of Big Data Technologies.