Title: A Wakeup Call: Databases in an Untrusted UniverseAmr El Abbadi
Department of Computer Science,
University of California, Santa Barbara
Once upon a time databases were structured, one size fit all and they resided on machines that were trustworthy and even when they failed, they simply crashed. This era has come and gone as eloquently stated by Mike Stonebraker. We now have key-value stores, graph databases, text databases, and a myriad of unstructured data repositories. However, we, as a database community still cling to our 20th century belief that databases always reside on trustworthy, honest servers. This notion has been challenged and abandoned by many other Computer Science communities, most notably the security and the distributed systems communities. The rise of the cloud computing paradigm as well as the rapid popularity of blockchains demand a rethinking of our naive, comfortable beliefs in an ideal benign infrastructure. In the cloud, clients store their sensitive data in remote servers owned and operated by cloud providers. The Security and Crypto Communities have made significant inroads to protect both data and access privacy from malicious untrusted storage providers using encryption and oblivious data stores. The Distributed Systems and the Systems Communities have developed consensus protocols to ensure the fault-tolerant maintenance of data residing on untrusted, malicious infrastructure. However, these solutions face significant scalability and performance challenges when incorporated in large scale data repositories. Novel database designs need to directly address the natural tension between performance, fault-tolerance and trustworthiness. This is a perfect setting for the database community to lead and guide. In this talk, I will discuss the state of the art in terms of data management in malicious, untrusted settings, its limitations and potential approaches to mitigate these shortcomings. As examples, I will use cloud and distributed databases that reside on untrustworthy malicious infrastructure and discuss specific approaches for standard database problems like commitment and replication. I will also explore blockchains, which can be viewed as asset management databases in untrusted infrastructures.
Amr El Abbadi is a Professor of Computer Science at the University of California, Santa Barbara. He received his B. Eng. from Alexandria University, Egypt, and his Ph.D. from Cornell University. His research interests are in the fields of fault-tolerant distributed systems and databases, focusing recently on Cloud data management and blockchain based systems. Prof. El Abbadi is an ACM Fellow, AAAS Fellow, and IEEE Fellow. He was Chair of the Computer Science Department at UCSB from 2007 to 2011. He has served as a journal editor for several database journals, including, The VLDB Journal, IEEE Transactions on Computers and The Computer Journal. He has been Program Chair for multiple database and distributed systems conferences. He currently serves on the executive committee of the IEEE Technical Committee on Data Engineering (TCDE) and was a board member of the VLDB Endowment from 2002 to 2008. In 2007, Prof. El Abbadi received the UCSB Senate Outstanding Mentorship Award for his excellence in mentoring graduate students. In 2013, his student, Sudipto Das received the SIGMOD Jim Gray Doctoral Dissertation Award. Prof. El Abbadi is also a co-recipient of the Test of Time Award at EDBT/ICDT 2015. He has published over 300 articles in databases and distributed systems and has supervised over 35 PhD students.
Title: In-NVM DBMS – Is There A Case?Kian-Lee Tan
Department of Computer Science
National University of Singapore (NUS)
Today’s database management systems are essentially based on a two-layered storage architecture: (a) data are stored on cheap (and high capacity but slow) persistent storage like solid state drives (NAND flash) or magnetic disks; and (b) data are loaded and processed in volatile (and fast but expensive) DRAM. More recently, the emergence of byte-addressable non-volatile memory (NVM) technologies, such as Intel/Micron’s 3D-XPoint memory and phase change memory (PCM), has prompted researches to investigate how best to exploit this technology for database systems. On one hand, NVM can be used as a form of persistent cache for disks so that “hot” data can be stored on NVM, while “cold” data on disks (leading to a 3-tier storage). On the other hand, it is not impossible to have just a single level storage architecture by replacing DRAM with NVM; given NVM is non-volatile, the persistent storage tier can also be removed. This talk focuses on the latter, and examines the opportunities and challenges in building an in-nvm database management system.
Kian-Lee Tan is a Professor of Computer Science at the School of Computing, National University of Singapore (NUS). He received his Ph.D. in computer science in 1994 from NUS. His current research interests include query processing and optimization in multiprocessor and distributed systems, database performance, data science, and database security. Kian-Lee has published over 300 research articles in international journals and conference proceedings, and co-authored several books/monographs. Kian-Lee was a recipient of the NUS Outstanding University Researchers Award in 1998, and the NUS Graduate School (NGS) Excellent Mentor Award in 2011. He was a co-recipient of Singapore's President Science Award in 2011. He is also a 2013 IEEE Technical Achievement Award recipient. Kian-Lee is a member of the VLDB Endowment Board (2012-2017) and PVLDB Advisory Committee (2014-2017). He is an associate editor of the ACM Transactions on Database Systems (TODS) and the WWW Journal. He has also served in the editorial board of the Very Large Data Base (VLDB) Journal (associate editor: 2007-2009; editor-in-chief: 2009-2015) and the IEEE Transactions on Knowledge and Data Engineering (2009-2013). Kian-Lee was the Technical Program Committee co-chair for the 27th International Conference on Data Engineering (ICDE 2011), the 36th International Conference on Very Large Data Bases (VLDB 2010), the 11th International Conference on Database Systems for Advanced Applications (DASFAA 2006) and 3rd International Conference on Mobile Data Management (MDM 2002). He has also served as a member of Steering Committee of DASFAA (2005-2010). Kian-Lee is a member of ACM and a senior member of the IEEE.
Title: No Data Left Behind – Exploiting Unstructured Data Using Database SystemsWolfgang Lehner
Institute of System Architecture
Technische Universität Dresden (TU Dresden)
In our data-driven culture, more and more data sources of semi-structured or unstructured nature are getting incorporated into decision workflows. However, relational database systems are still the “lingua franca” for data storage, query processing, and large-scale analytics in almost every organization and they will probably remain for the next decades. Tapping into the value of unstructured data in the realm of databases systems remains a challenging task. In this talk, I will present our journey of building database-centric systems that are able to exploit external knowledge during query processing with an emphasis on Web tables and spreadsheets as well as textual documents. I will introduce the problem of table extraction and layout identification, giving an idea on how to solve it and present our initiative on building a corpus consisting of more than 125M Web tables. The extracted tables can be leveraged using relational augmentation techniques integrated into a database system by introducing a novel database engine operator dealing with top-k results. For textual data, I will report on recent developments in the field of language models such as word embeddings and outline how this can be utilized to enrich database query capabilities and enabling inductive reasoning on text values stored in database tables.
Wolfgang Lehner is full professor and head of the Database Technology Group as well as director of the Institute of System Architecture at TU Dresden, Germany. His research focuses on database system architectures specifically looking at crosscutting aspects from data engineering algorithms and data structures down to hardware-related aspects mostly in main-memory centric settings. He is heading a Research Training Group on large-scale adaptive system software design and acts as a principal investigator in Germany’s national “Competence Center for Scalable Data Services and Solutions” (ScaDS). Wolfgang also maintains a close research relationship with the international SAP HANA development team. He serves the community in many PCs, is the Managing Editor of “Proceedings of the VLDB Endowment” (PVLDB), and serves on the grants committee of collaborative research centers within the German Research Foundation (DFG). He is an appointed member of the Academy of Europe.