Welcome!
General Info
- Call for Papers
- Call for Workshops
- Call for Demos
- Call for Participation
- Paper Submission
- Important Dates
- Publication Instructions
Program
- Program at Glance
- Keynotes
- Accepted Papers
- Best Papers
- Workshops
- Demo
- DYL Series
- Panel
- Industry
- MSRA Summer School
- Cloud Tag
Participants
Organization
Travel Information
DYL (Distinguished Young Lecturer) Series
The DYL (Distinguished Young Lecturer) Series is a new feature introduced to WAIM2013. The purposes of the DYL series are three-fold:
- to promote and involve active young researchers who have made significant progresses in establishing themselves with highly visible and influential achievements in the web data management community, by sharing their experiences and lessons with the general audience;
- to set a model example of making good technical presentations for student audience;
- to provide a venue for the young researchers to interact and mingle with the students, thereby attract good students for possible collaborative research subsequently
Depending on the effect, it is hoped and expected that the series shall carry on and continue in subsequent WAIM series.
Xin Luna Dong
Senior Research Scientist
Google Palo Alto, California
USA
Abstract:
The Web has been changing our lives enormously and people rely more and more on the Web to fulfill their information needs. Compared with traditional media, information on the Web can be published fast,but with fewer guarantees on quality and credibility. Indeed, Web sources are of different qualities, sometimes providing conflicting, out-of-date and incomplete data. The sources can also easily copy, reformat and modify data from other sources, propagating erroneous data.
In this talk we present a recent study for truthfulness of Deep Web data in two domains where we believed data quality is important to people's lives: Stock and Flight. We then describe how we can resolve conflicts from different sources by leveraging accuracy of the sources and the copying relationships between the sources. We describe our SOLOMON system, which can effectively detect copying between data sources, leverage the results in truth discovery, and provide a user-friendly interface to facilitate users in understanding the results.
Bio:
Xin Luna Dong is a senior research scientist at Google. Prior to that, she has worked for AT&T Labs-Research for five years. She received her Ph.D. from University of Washington, received a Master's Degree from Peking University, and received a Bachelor's Degree from Nankai University. Her research interests include data integration, data quality, knowledge discovery, and Web search. She co-chaired CIKM demo track'13, Sigmod/PODS PhD Symposium'12-13, Sigmod New Researcher Symposium'12-13, QDB'12, WebDB'10; served as a track chair or senior reviewer at ICDE'13 and CIKM'11, and also served in many DB program committees.
Hong Cheng
Assistant Professor
Department of Systems Engineering and Engineering Management
Chinese University of Hong Kong
China
Abstract:
Different from a large body of research on social networks that has focused almost exclusively on positive relationships, we study signed social networks with both positive and negative links. Specifically, we focus on how to reliably and effectively predict the signs of links in a newly formed signed social network (called a target network). Since usually only a very small amount of edge sign information is available in such newly formed networks, this small quantity is not adequate to train a good classifier. To address this challenge, we need assistance from an existing, mature signed network (called a source network) which has abundant edge sign information. We adopt the transfer learning approach to leverage the edge sign information from the source network, which may have a different yet related joint distribution of the edge instances and their class labels.
As there is no predefined feature vector for the edge instances in a signed network, we construct generalizable features that can transfer the topological knowledge from the source network to the target. With the extracted features, we adopt an AdaBoost-like transfer learning algorithm with instance weighting to utilize more useful training instances in the source network for model learning. The effectiveness of this method is confirmed by testing on three real large signed social networks.
Bio:
Hong Cheng is an Assistant Professor in the Department of Systems Engineering and Engineering Management at the Chinese University of Hong Kong. She received her Ph.D. degree from University of Illinois at Urbana-Champaign in 2008. Her research interests include data mining, database systems, and machine learning. She received research paper awards at ICDE'07, SIGKDD'06 and SIGKDD'05, and the certificate of recognition for the 2009 SIGKDD Doctoral Dissertation Award. She is a recipient of the 2010 Vice-Chancellor's Exemplary Teaching Award at the Chinese University of Hong Kong.
Bingsheng He
Assistant Professor
School of Computer Engineering
Nanyang Technological University
Singapore
Abstract:
Big data have posed various research challenges in data management systems. In this talk, I will present our recent research efforts in making big data management systems faster and greener. Particularly, two studies will be presented: one is to optimize the efficiency of large graph processing performance in the cloud, and the other is to leverage renewable energy in database systems aiming at zero-emission databases. Finally, I will outline our research agenda in architecting the future big data management systems.
Bio:
Dr. Bingsheng He is currently an Assistant Professor at Division of Networks and Distributed Systems, School of Computer Engineering, Nanyang Technological University. Before that, he held a research position in the System Research group of Microsoft Research Asia (2008-2010), where his major research was building high performance cloud computing systems for Microsoft. He got the Bachelor degree in Shanghai Jiao Tong University (1999-2003), and the Ph.D. degree in Hong Kong University of Science & Technology (2003-2008). His current research interests include cloud computing, database systems and high performance computing (with a focus on GPGPU). His papers are published in prestigious international journals and proceedings such as ACM TODS, IEEE TKDE, ACM SIGMOD, VLDB/PVLDB, ACM/IEEE SuperComputing, PACT, ACM SoCC, and CIDR. He has been awarded with the IBM Ph.D. fellowship (2007-2008) and with NVIDIA Academic Partnership (2010-2011).
Guoliang Li
Associate Professor
Department of Computer Science
Tsinghua University
China
Abstract:
Tabular data on the Web has become a rich source of structured data that is useful for ordinary users to explore. Due to its potential, tables on the Web have recently attracted a number of studies with the goals of understanding the semantics of those Web tables and providing effective search and exploration mechanisms over them. Table understanding is to identify, recognize and interpret tabular structures to enable a variety of tasks such as data extraction, data integration, and information retrieval. In this paper, we propose a human-machine framework for understand large-scale tables on the Web. We utilize crowds to improve the quality of web table understanding. We discuss how to maximize the quality in our human-machine framework.
Bio:
Guoliang Li is an Associate Professor of Department of Computer Science, Tsinghua University, Beijing, China. He received his PhD degree in computer science from Tsinghua University in 2009, and his Bachelor degree in Computer Science from Harbin Institute of Technology in 2004. His research interests include data cleaning and integration, spatial databases and crowdsourcing. He has published more than 50 papers in premier conferences and journals, such as SIGMOD, VLDB, ICDE, TODS, VLDB Journal, and TKDE. He is a PC co-chair of The 14th International Conference on Web-Age Information Management (WAIM 2014). He has served on the program committees of many premier conferences, such as VLDB, KDD, ICDE, and IJCAI. His papers have been cited more than 1000 times, with two of them receiving more than 100 citations each. He received Beijing Excellent Doctoral Dissertation Award, Nomination Award of National Excellent Doctoral Dissertation, and SCOPUS National Youth Science Star Award. HP: http://dbgroup.cs.tsinghua.edu.cn/ligl
Lei Yu
Associate Professor
Department of Computer Science
Binghamton University
USA
Abstract:
High-dimensional data has become increasingly common in various real-world applications such as microarray analysis, biomedical imaging, and text mining. Feature selection has been an active field of research in the past two decades. Many feature selection algorithms have been developed and proven successful in reducing the dimensionality of the feature space and improving the predictive accuracy of classification. However, in many knowledge discovery endeavors, the primary focus is often not to build an accurate model to predict classes of future samples, but to discover characteristic markers or regulatory factors from numerous features to illuminate the observed phenomena. Due to high dimensionality and limited samples in the data collection, the results from conventional feature selection methods often vary significantly with training data variations. Such instability of feature selection hinders domain experts from deciding candidate features for subsequent validation. Moreover, conventional feature selection methods produce flat subsets of selected features, and disregard intrinsic group structures among features which could be critically valuable to domain experts for result interpretation and validation. This talk reviews challenges, recent progresses, and applications in stable feature selection and feature group selection.
Bio:
Lei Yu is currently an Associate Professor of the Department of Computer Science, Binghamton University. He received his Ph.D. in Computer Science from the Department of Computer Science and Engineering, Arizona State University in 2005, and his B.Eng. from the Department of Computer Science and Engineering, Dalian University of Technology in 1999. His research interests are in the areas of data mining, machine learning, and bioinformatics.