1. 论坛名称
ACM SIGMOD China Chapter学术论坛
2. 论坛组织机构
活动主席:李建中(哈尔滨工业大学),
王宏志(哈尔滨工业大学),
李国良(清华大学)
联系人:李国良(清华大学)
3. 论坛简介
ACM SIGMOD China携手CCF数据库专委会,依托NDBC平台将每年举办一次学术活动,邀请国内外数据库领域顶尖学者讲授最新研究进展、讨论未来热点方向,并为参与活动的青年教师和学生们提供学习与交流的机会。
4. 论坛日程表
时间:2018年10月13日 13:30 – 18:00
地点:大连海事大学新大学生活动中心308室
ACM SIGMOD China优博学术新星论坛
1)Title:Recent Progress on Entity Resolution
Abstract:
Entity resolution identifies all records in a database that refer to the same entity and hundreds of papers have been published in this area. In this talk, I will present our ICDE 2018 work on unsupervised entity resolution. We propose an unsupervised graph-theoretic fusion framework with two components, namely ITER and CliqueRank. Specifically, ITER constructs a weighted bipartite graph between terms and record-record pairs and iteratively propagates the node salience until convergence. Subsequently, CliqueRank constructs a record graph to estimate the likelihood of two records resident in the same clique. The derived likelihood from CliqueRank is fed back to ITER to rectify the edge weight until a joint optimum can be reached. Experimental evaluation was conducted among 14 competitors and results show that without any labeled data or crowd assistance, our unsupervised framework is comparable or even superior to state-of-the-art methods among three benchmark datasets. Another noticeable trend in this area is to apply deep learning for entity resolution. I will also discuss the two related works in VLDB 2018 and SIGMOD 2018 from other research groups.
讲者简介:
张东祥博士毕业于新加坡国立大学,现为电子科技大学“校百人”教授,主要从事时空大数据分析、智慧城市、智能教育等前沿课题的相关研究,目前担任中国计算机学会数据库专业委员会委员,2016年入选四川省“千人计划”青年人才项目。已发表50余篇高水平论文,其中40篇为CCF A类论文,第一/通讯作者18篇,Google Scholar引用超过1200次,第一作者的论文单篇最高引用超过300次,H指数17,国家发明专利受理6项。多次担任重要数据库会议的程序委员会委员(包括ICDE 2012,2018、APWeb 2015-2016、WAIM 2014-2016、SSTD 2015和NDBC 2016-2018),并长期在IEEE Transactions on Knowledge and Data Engineering、ACM Transactions on Information Systems和VLDB Journal等CCF A类期刊担任审稿人员。论文《Effective Multi-Modal Retrieval based on Stacked Auto-Encoders》获VLDB 2014年度最佳论文候选;论文《CANDS: Continuous Optimal Navigation via Distributed Stream Processing》获上海市CCF数据库与数据挖掘2015年度候选优秀论文。
2)报告题目:Efficient Frequent Subgraph Mining on Uncertain Graphs
报告简介:
Uncertainty is intrinsic to a wide spectrum of real-life applications, which inevitably applies to graph data. Representative uncertain graphs are seen in bio-informatics, social networks, etc. This paper motivates the problem of frequent subgraph mining on single uncertain graphs, and investigates two different - probabilistic and expected - semantics in terms of support definitions. First, we present an enumeration-evaluation algorithm to solve the problem under probabilistic semantics. By showing the support computation under probabilistic semantics is #P-complete, we develop an approximation algorithm with accuracy guarantee for efficient problem-solving. To enhance the solution, we devise computation sharing techniques to achieve better mining performance. Afterwards, the algorithm is extended in a similar flavor to handle the problem under expected semantics, where checkpoint-based pruning and validation techniques are integrated. Experiment results on real-life datasets confirm the practical usability of the mining algorithms.
讲者简介:
赵翔,男,中国人民解放军国防科技大学系统工程学院信息系统工程重点实验室副教授,基础前沿技术研究室副主任;中国地球空间信息技术协同创新中心副研究员。2013年8月博士毕业于澳大利亚新南威尔士大学。主要研究领域包含大图数据管理、数据驱动的知识库构建与利用、智能情报分析等。在VLDB Journal、IEEE TKDE、VLDB、ICDE等国际顶级会议和期刊发表论文,担任IEEE Access副编辑、IEEE ICDE、APWeb等国际会议的程序委员会委员和VLDB Journal、IEEE TKDE等重要国际期刊的受邀评阅人;主持国家自然科学基金面上和青年项目、国防和军队预先研究项目和领域基金、国家和湖南省教育科学规划课题等;是中国计算机学会数据库专业委员会委员和大数据专家委员会通讯委员。
3)报告题目:数据清洗技术的研究现状与展望
报告简介:
在信息时代,数据即是资源。数据可靠无误才能准确地反映现实状况,有效地支持组织决策。但是现实世界中脏数据无处不在,数据不正确或者不一致会严重影响数据分析的结果,从而对事物的发展产生消极作用。数据清洗是对脏数据进行检测和纠正的过程,是进行数据分析与管理的基础。然而大数据时代的来临为数据清洗技术的发展带来了新的挑战。本次报告对现有的数据清洗技术进行分类和总结,对未来重点的研究方向进行探讨和展望。具体内容包括脏数据的分类、数据噪声检测技术和消除技术等。
讲者简介:
郝爽,北京交通大学计算机与信息技术学院讲师。2018年7月毕业于清华大学计算机科学与技术系数据库实验室,获工学博士学位,并于同月加盟北京交通大学计算机与信息技术学院。郝爽主要致力于数据清洗、数据集成等方面的研究,已在VLDBJ、ICDE、TKDE等数据库领域的顶级国际会议和期刊上发表多篇文章,曾获得ICDE2018的最佳论文提名奖、清华大学优秀博士论文一等奖和搜狐研发奖学金,并担任期刊VLDBJ和TKDE的评审人。
4)报告题目:面向高维特征和多分类问题的分布式梯度提升树算法
报告简介:
梯度提升树算法由于其高准确率和可解释性,被广泛地应用于分类、回归、排序等各类问题。随着数据规模的爆炸式增长,分布式梯度提升树算法成为研究热点。虽然目前已有一系列分布式梯度提升树算法的实现,但是它们大多使用数据并行策略,在高维特征和多分类任务上性能较差。
我们的工作首先基于一个严格的代价模型,比较了数据并行与特征并行策略,从理论上证明特征并行更加适合高维和多分类场景。根据理论分析的结果,提出了一种特征并行的分布式梯度提升树算法FP-GBDT。FP-GBDT设计了一种高效的分布式数据集转置算法,将原本按行切分的数据集转换为按列切分的数据表征;在建立梯度直方图和在分裂树节点时,FP-GBDT设计了一系列优化方法,减少计算开销和通信开销。通过详尽的实验,在多个数据集验证了FP-GBDT在高维特征和多分类场景下的有效性,FP-GBDT相比数据并行策略的方案,取得了最高6倍的性能提升。
讲者简介:
江佳伟,博士,2018年毕业于北京大学计算机系,腾讯技术工程事业群高级研究员。研究兴趣包括及机器学习、分布式系统、优化算法等,在SIGMOD、TOIS、ICDE等顶级学术期刊会议发表多篇论文,获得北京大学优秀博士学位论文等荣誉奖励,作为核心人员参与北大-腾讯联合开发的开源机器学习系统Angel。
ACM SIGMOD China学术界 vs 工业界:竞争与合作
特邀嘉宾 彭智勇教授 简介
彭智勇,武汉大学教授、博士生导师, 国务院软件工程学科评议组成员,中国计算机学会会士、数据库专业委员会副主任、大数据专家委员会成员。1985年获武汉大学理学学士,1988年获国防科技大学工学硕士,1995年获日本京都大学工学博士。1995-1997年在日本京都高度技术研究所工作,研究员。1997-2000年在美国惠普公司的研究所工作,研究员。提出了一个新的数据库模型:对象代理模型,发表在数据库国际顶级会议IEEE ICDE和权威期刊IEEE TKDE上,得到了学术界认可;分析了开源数据库PostgreSQL源代码,出版了《PostgreSQL数据库内核分析》专著,受到了产业界欢迎;研制了对象代理数据库管理系统TOTEM,形成了自主知识产权;目前主要从事对象代理数据库、大数据管理系统、制造业大数据、科技大数据、教育大数据、可信云数据和地理数据水印等方面的研究。
特邀嘉宾 Ihab F. Ilyas Kaldas 教授 简介
Ihab Ilyas is a professor in the Cheriton School of Computer Science at the University of Waterloo, where his main research focuses on the areas of big data and database systems, with special interest in data quality and integration, managing uncertain data, rank-aware query processing, and information extraction. Ihab is also a co-founder of Tamr, a startup focusing on large-scale data integration and cleaning. He is a recipient of the Ontario Early Researcher Award (2009), a Cheriton Faculty Fellowship (2013), an NSERC Discovery Accelerator Award (2014), and a Google Faculty Award (2014), and he is an ACM Distinguished Scientist. Ihab is an elected member of the VLDB Endowment board of trustees, elected SIGMOD vice chair, and an associate editor of the ACM Transactions of Database Systems (TODS). He holds a PhD in computer science from Purdue University, West Lafayette.
https://cs.uwaterloo.ca/~ilyas/
特邀嘉宾Chen Li教授 简介
Chen Li is a professor in the Department of Computer Science at UC Irvine. He received his Ph.D. degree in Computer Science from Stanford University, and his M.S. and B.S. in Computer Science from Tsinghua University, China, respectively. His research interests are in the field of data management, including data-intensive computing, query processing and optimization, visualization, and text analytics. His current focus is building open source systems for data management and analytics. He was a recipient of an NSF CAREER Award, several test-of-time publication awards, and many grants and industry gifts. He was once a part-time Visiting Research Scientist at Google. He founded a company to commercialize university research.
特邀嘉宾 Haruo Yokota 教授 简介
Haruo Yokota received his B.E., M.E., and Dr.Eng. degrees from Tokyo Institute of Technology in 1980, 1982, and 1991, respectively. He joined Fujitsu Ltd. in 1982, and was a researcher at ICOT for the 5th Generation Computer Project from 1982 to 1986, and at Fujitsu Laboratories Ltd. from 1986 to 1992. From 1992 to 1998, he was an Associate Professor at Japan Advanced Institute of Science and Technology (JAIST). He moved to Tokyo Institute of Technology 1998, and has been a Full Professor at the Department of Computer Science since 2001. He is currently the Dean of School of Computing in Tokyo Institute of Technology. His research interests include the general research areas of data engineering, information storage systems, and dependable computing. He was a vice president of DBSJ, a chair of ACM SIGMOD Japan Chapter, a trustee board member of IPSJ, the Editor-in-Chief of Journal of Information Processing, and an associate editor of the VLDB Journal. He is currently a board member of DBSJ, a fellow of IEICE and IPSJ, a senior member of IEEE, and a member of IFIP-WG10.4, JSAI, ACM, and ACM-SIGMOD.
特邀嘉宾李飞飞教授 简介
李飞飞博士现任阿里巴巴副总裁, 达摩院数据库首席科学家, 负责达摩院数据库实验室,以及平台技术群下的数据库事业部和存储技术事业部。加入阿里巴巴之前是美国犹他大学计算机系的终身正教授。他的研究方向是数据库系统,大数据管理理论及系统设计开发,以及云数据管理的安全性。他获得了美国自然科学基金的Caeer Award, 美国惠普公司的Innovation Research Program Award, 美国谷歌公司的Faculty Award,美国Visa公司的Faculty Research Award. 他的研究成果获得了IEEE ICDE 2004 最佳论文奖, IEEE ICDE 2014 10年最有影响力奖, ACM SIGMOD 2015最佳系统演示奖, ACM SIGMOD 2016最佳论文奖, ACM SIGMOD 2017研究亮点奖。他的研究获得了美国自然科学基金以及其他机构和公司的广泛资助, 累计主持了超过1千万美金的各类科研项目。 他是VLDB 2014和SIGMOD 2018的演示程序主席, SIGMOD 2014的大会主席,ICDE 2014, SIGMOD 2015, SIGMOD 2019的技术领域程序主席,VLDB 2019和ICDE 2019的博士论坛主席,IEEE TKDE, ACM TODS, Springer DAPD编委会成员. 他也是年度SIGMOD Jim Gray最佳博士论文奖评选委员会委员.