Industrial Talks

Industrial Talks Saturday, July 8th 16:00 – 18:00, Conference Room 3 (3号会议室)

Industrial Talk 1

Title: Usage of Crowdsourcing and Offline Resources in AI Data Acquisition

Speaker: Lanying Cheng - CTO of Datatang.

Abstract: "Learning" from data is one of the major methods in today's artificial intelligence area. The good representativeness, quality of data have big influence on (deep) learning results. Crowdsourcing is a fast and low-cost way to obtain data, but quality control is difficult, which makes it not so easy to be used in data collection, especially data annotation. Integrating the task design management, annotator group management and quality control theory, and combining the usage of resources off line, we did some practice in AI data collection and annotation using combined resources and crowdsourcing platform. The presentation will introduce the method of decision making and quality management used and a practical system. And the the usage of crowdsourcing and outsourcing to fulfill different level's complexity data acquisition and annotating tasks. At the same time, we will also introduce some AI data sets from Datatang.
Bio: Lanying has been dedicated in research and development of multiple applications in artificial intelligence area. Before joining Datatang, she has been working with Nuance China and oversea branches. In her earlier research, she focused on speech recognition and has published several papers in this field; in the most recent decade, she is concentrating on various applications of AI, such as smart phone, autonomous driving, and automated call-center services. At Datatang, Lanying devoted herself into research and product development of high-level data services.

Industrial Talk 2

Title: Big Data at Didi Chuxing

Speaker: Xuewen Chen - Senior Director of DiDi, Didi Chuxing

Abstract: Didi Chuxing is the world’s leading mobile transportation platform that offers a full range of mobile tech-based mobility options for over 400 million users across more than 400 Chinese cities. Every day, DiDi's platform generates more than 70TB worth of data, processes more than 20 billion routing requests, and produces over 14 billion location points. This talk is about how AI technologies have been applied to analyze such big transportation data to improve the travel experience for millions of people in China.
Bio: Dr. Chen is a Full Professor of Wayne State University. He graduated with PhD from Carnegie Mellon University in 2001 and has won the NSF CAREER Award. He has served as a department chair of computer science at Wayne State University and a conference chair for several international conferences such as CIKM 2012 and IEEE ICMLA 2014. Dr. Chen also serves as an Associate Editor for several international journals including BMC Systems Biology, IEEE Access etc. His research interests include big data, machine learning, and data mining with applications to video/image/text analysis and health informatics.

Industrial Talk 3

Title: Apache HAWQ: The Native Parallel SQL-on-Hadoop Engine

Speaker: Lei Chang - Creator of Apache HAWQ, and Founder of Oushu Inc.

Abstract: Apache HAWQ is a native SQL-on-Hadoop engine. Its novel design combines the performance of MPP database and the scalability of Hadoop. HAWQ provides users with a complete, standard compliant SQL interface, and the tools to confidently and successfully interact with petabyte data sets. HAWQ has been used extensively by hundreds of enterprise users, including GE, NYSE, Jindong, Chine Mobile et al. In this talk, Dr. Lei Chang will review the current status of HAWQ architecture and core components, including storage, query processing, interconnect, transaction management et al. And he will also discuss the research issues and the future roadmap items.
Bio: Dr. Lei Chang is the creator of Apache HAWQ and the founder of Oushu Inc. Before he founded Oushu Inc, he is an engineering director at Pivotal (EMC), where he led the R&D of HAWQ. And before Pivotal, he is a senior research scientist in EMC. Main research area includes parallel database, data analytics and cloud computing. He has published widely on data management and data mining in refereed journals and conferences, and holds dozens of US patents. He obtained his PhD degree in database from Peking University.

Industrial Talk 4

Title: Big Electronic Sports Data at Max+.

Speaker: Ning Xu – CEO of Max+.

Abstract: As China’s largest data platform for general electronic sports, Max+ hosts nearly one billion users in games like Dota2, CS: GO, League of Legend, Overwatch and ten billion matches. Since the early stage of Max+, we have been committed to empower electronic sports with advanced data analysis techniques. By carefully designed data cleaning and analysis algorithms, we can identify the improvement space for ordinary users and professional players.

Besides, we provide vivid visualization and recommendation engine to facilitate their gaming experience. We also host the second largest online community about electronic sports, in which more than tens of thousands of posts are generated every day. Max+ also collaborates with live broadcasting and global match partners.

This talk will cover the above big data practice in Max+. Specifically, we focus on big data platform in Max+, the data cases in electronic sports and our recent work in match data visualization experiences.

Bio: Dr. Ning Xu, is the creator of in 2014, the first electronic sports data platform in China, In 2015, he created Max+, invested by several renowned agents. Now Max+ is the largest data platform for general electronic sports in China. Dr. Xu graduated from Peking University. His research direction is database and parallel systems. He has published in SIGMOD, VLDB and TKDE.

Industrial Talk 5

Title: Detecting and Inferring Query Intent based on Query Log Mining

Speaker: Qi Ye – Expert Researcher of Sogou.

Abstract: Query intent mining is a critical problem in various real-world search applications. In the past few years, dramatic advances have been witnessed in the field of query intent mining. In this presentation, we will first provide overview of query intent detecting methods that have shown good performance in the state-of-the art. After that, we will deal with the problem of automatic detection of query intents in search engines by query log mining, and try to represent the intents of queries by coarse-grained semantic categories and fine-grained clusters, respectively. We show a practical system for identifying and inferring millions of query intents in daily sponsored search with high precision and acceptable coverage. We also propose a general method to enhance the performance of real-world coarse-grained semantic query inferring based on the results of detected fine-grained clusters. At last, we show a high precisely label propagation method to improve the performance of query intent inferring using the structure of click-through bipartite graph. All results indicate that using the results of query log mining allow us to achieve good performance for different real-world tasks.
Bio: Qi Ye received his PhD degree in computer science at Beijing University of Posts and Telecommunications in 2011. He has been dedicated in the research of sponsored search since then. Today he serves as an expert researcher in ADRS (ADvertisement Research for Sponsered search) group for Sogou Inc. His research interests include query intent detection, short text classification, short text clustering, short text relevance, graph mining, and their applications in sponsored search. Most of their research results have been used in online deployed systems, and some of them have been published.