About VLDB Summer School
VLDB（Very Large Data Bases）中国数据库学院由VLDB基金会赞助成立，下属于中国计算机学会数据库专业委员会, 旨在为中国从事或有志于从事数据库理论与技术研究的教师、学者和高年级研究生提供一个学习和交流的机会。学院将每年不定期地邀请数据库及相关领域内国际知名的学者来中国讲学，以促进我国对国际数据库学科前沿全面及时的了解，并在此前提下，立足应用研发具有特色的数据管理技术和系统。
2016年暑期学校由西北工业大学计算机学院承办。主讲教授是国际上享有很高学术声誉的数据库专家：Kyuseok Shim (Seoul National University), Feifei Li (University of Utah), Lei Chen (Hong Kong University of Science and Technology)。我们热烈欢迎全国各地从事数据管理理论与技术研究的青年教师和高年级研究生报名参加。
院 长：周傲英 教授（华东师范大学）
学术委员会主席：李战怀 教授（西北工业大学）周傲英 教授（华东师范大学）
Kyuseok Shim Feifei Li Lei Chen
主 题： Big Data Management and Analysis Techniques
1. Title: Parallel Algorithms using MapReduce for Big Data Analysis
Speaker: Kyuseok Shim, professor in the Department of Electrical and Computer Engineering, Seoul National University, Korea
Abstract: There is a growing trend of applications that should handle big data. However, analyzing big data is very challenging today. For such applications, the MapReduce framework has recently attracted a lot of attention. MapReduce is a programming model that allows easy development of scalable parallel applications to process big data on large clusters of commodity machines. Google’s MapReduce or its open-source equivalent Hadoop is a powerful tool for building such applications. In this tutorial, I will first introduce the MapReduce framework based on Hadoop system available to everyone to run distributed computing algorithms using MapReduce. I will next discuss how to design efficient MapReduce algorithms and present the state-of-the-art in MapReduce algorithms for big data analysis. Since Spark is recently developed to overcome the shortcomings of MapReduce which it is not optimized for of iterative algorithms and interactive data analysis, I will also present an outline of Spark as well as the differences between MapReduce and Spark. The intended audience of this tutorial is professionals who plan to develop efficient MapReduce algorithms and researchers who should be aware of the state-of-the-art in MapReduce algorithms available today for big data analysis.
Bio: Kyuseok Shim is currently a professor at electrical and computer engineering department in Seoul National University, Korea. Before that, he was an assistant professor at computer science department in KAIST and a member of technical staff for the Serendip Data Mining Project at Bell Laboratories. He was also a member of the Quest Data Mining Project at the IBM Almaden Research Center and visited Microsoft Research at Redmond several times as a visiting scientist. Kyuseok was named an ACM Fellow for his contributions to scalable data mining and query processing research in 2013.
Kyuseok has been working in the area of databases focusing on data mining, search engines, recommendation systems, MapReduce algorithms, privacy preservation, query processing and query optimization. His writings have appeared in a number of professional conferences and journals including ACM, VLDB and IEEE publications. He served as a Program Committee member for SIGKDD, SIGMOD, ICDE, ICDM, ICDT, EDBT, PAKDD, VLDB and WWW conferences. He also served as a Program Committee Co-Chair for PAKDD 2003, WWW 2014, ICDE 2015 and APWeb 2016. Kyuseok was previously on the editorial board of VLDB as well as IEEE TKDE Journals and is currently a member of the VLDB Endowment Board of Trustees. He received the BS degree in electrical engineering from Seoul National University in 1986, and the MS and PhD degrees in computer science from the University of Maryland, College Park, in 1988 and 1993, respectively.
2. Title: Towards Interactive, Online, and Secure Big (Spatial) Data Analytics
Speaker: Feifei Li, associate professor at the School of Computing, University of Utah, USA
Abstract: Large spatial data becomes ubiquitous. As a result, it is critical to provide fast, scalable, and high-throughput spatial queries and analytics for numerous applications in location-based services (LBS). Traditional spatial databases and spatial analytics systems are disk-based and optimized for IO efficiency. But increasingly, data are stored and processed in memory to achieve low latency, and CPU time becomes the new bottleneck. We will present the Simba (Spatial In-Memory Big data Analytics) system that offers scalable and efficient in-memory spatial query processing and analytics for big spatial data. Simba is based on Spark and runs over a cluster of commodity machines. In particular, Simba extends the Spark SQL engine to support rich spatial queries and analytics through both SQL and the DataFrame API. It introduces the concept and construction of indexes over RDDs in order to work with big spatial data and complex spatial operations. Lastly, Simba implements an effective query optimizer, which leverages its indexes and novel spatial-aware optimizations, to achieve both low latency and high throughput. Extensive experiments over large data sets demonstrate Simba's superior performance compared against other spatial analytics system.
Through its SQL and DataFrame API, Simba provides interactive analytics over big spatial data, but when data grows too big and/or computation becomes too expensive, we will talk about achieving interactive (or becoming more interactive in these scenarios) analytics through online sampling, online aggregation, and online analytics.
We will survey related work for systems that process big data and techniques for interactive and online queries and analytics, where the objective is to provide continuous online approximations to enable interactive query processing, rather than letting the user wait for final, exact answers without knowing how long a query will take to execute in full.
Finally, we will also survey and discuss secure query processing techniques to build systems that are privacy-aware and support query execution over encrypted data.
Bio: Feifei Li is currently an associate professor at the School of Computing, University of Utah. His research focuses on improving the scalability, the efficiency, and the effectiveness of database and big data management systems. He also works on various data security problems in these systems. He was a recipient for an NSF career award in 2011, two HP IRP awards in 2011 and 2012 respectively, a Google App Engine award in 2013, the IEEE ICDE best paper award in 2004, the IEEE ICDE 10+ Years Most Influential Paper Award in 2014, a Google Faculty award in 2015, and the SIGMOD Best Demonstration Award in SIGMOD 2015. He is/was the demo PC chair for VLDB 2014, the general co-chair for SIGMOD 2014, a PC area chair for both ICDE 2014 and SIGMOD 2015, and an associate editor for IEEE TKDE.
3. Title: Spatial Crowdsourcing, Opportunities and Challenge
Speaker: Lei Chen, professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology
Abstract: Crowdsourcing is a new computing paradigm where humans are enrolled actively to participate into the procedure of computing, especially for the tasks that are intrinsically easier for human than for computers. Not surprisingly, with the development of mobile Internet, the magic power of crowdsourcing is now expanding to physical world, where each user is treated as a mobile computing unit that can be activated and guided for certain tasks. Such practice is in general termed as spatial crowdsourcing, featuring task dispatching and dynamic pricing as its core technical niches. Therefore, it serves as the fundamental prototype of a cluster of industrial applications like Citizen Sensing (Waze), P2P ride-sharing (Uber), Real-time O2O service (Instacart, Postmates) and so on.
In this talk, I will first briefly review the history of crowdsourcing and discuss the key issues related to crowdsourcing. Then, I will introduce the theoretical and practical development our spatial crowdsourcing project, G-mission. Finally, highlight some interesting future works on G-mission.
Bio: Lei Chen received the BS degree in computer science and engineering from
Tianjin University, Tianjin, China, in 1994, the MA degree from Asian Institute of
Technology, Bangkok, Thailand, in 1997, and the PhD degree in computer science from the University of Waterloo, Canada, in 2005. He is currently a full professor in the Department of Computer Science and Engineering, Hong Kong University of Science and Technology. His research interests include crowdsourcing over social media, social media analysis, probabilistic and uncertain databases, and privacy-preserved data publishing. The system developed by his team won the excellent demonstration award in VLDB 2014. He got the SIGMOD Test-of-Time Award in 2015. He is PC Track chairs for SIGMOD 2014, VLDB 2014, ICDE 2012, CIKM 2012, SIGMM 2011. He has served as PC members for SIGMOD, VLDB, ICDE, SIGMM, and WWW. Currently, he serves as Editor-in-Chief of VLDB Journal and an associate editor-in-chief of IEEE Transaction on Data and Knowledge Engineering. He is a member of the VLDB endowment.
1. 本年度计划招收学员100名。学员录取由VLDB 中国数据库学院负责，将根据申请条件以及学校地区平衡的原则进行录取。录取通知将于2016年7月10日发出。
邮寄：陕西省西安市西北工业大学长安校区计算机学院324室 刘海龙 邮编：710129
（邮寄请注明：VLDB 2016 Summer School）
7. 网上报名截止时间：2016年 6月 30日