ACM SIGMOD Record

ACM SIGMOD Record | DeepDyve

journal article

LitStream Collection

Making Learned Query Optimization Practical

2022 ACM SIGMOD Record

Query optimization has been a challenging problem ever since the relational data model had been proposed. The role of the query optimizer in a database system is to compute an execution plan for a (relational) query expression comprised of physical operators whose implementations correspond to the operations of the (relational) algebra. There are many degrees of freedom for selecting a physical plan, in particular due to the laws of associativity, commutativity, and distributivity among the operators in the (relational) algebra, which necessitates our taking the order of operations into consideration. In addition, there are many alternative access paths to a dataset and a multitude of physical implementations for operations, such as relational joins (e.g., merge-join, nestedloop join, hash-join). Thus, when seeking to determine the best (or even a sufficiently good) execution plan there is a huge search space.

journal article

LitStream Collection

Bao

Marcus, Ryan; Negi, Parimarjan; Mao, Hongzi; Tatbul, Nesime; Alizadeh, Mohammad; Kraska, Tim

2022 ACM SIGMOD Record

doi: 10.1145/3542700.3542703pmid: N/A

Recent efforts applying machine learning techniques to query optimization have shown few practical gains due to substantive training overhead, inability to adapt to changes, and poor tail performance. Motivated by these difficulties, we introduce Bao (the Bandit optimizer). Bao takes advantage of the wisdom built into existing query optimizers by providing per-query optimization hints. Bao combines modern tree convolutional neural networks with Thompson sampling, a well-studied reinforcement learning algorithm. As a result, Bao automatically learns from its mistakes and adapts to changes in query workloads, data, and schema. Experimentally, we demonstrate that Bao can quickly learn strategies that improve end-to-end query execution performance, including tail latency, for several workloads containing longrunning queries. In cloud environments, we show that Bao can offer both reduced costs and better performance compared with a commercial system.

journal article

LitStream Collection

Technical perspective: DFI: The Data Flow Interface for High-Speed Networks

Alonso, Gustavo

2022 ACM SIGMOD Record

doi: 10.1145/3542700.3542704pmid: N/A

Optimizing data movement has always been one of the key ways to get a data processing system to perform efficiently. Appearing under different disguises as computers evolved over the years, the issue is today as relevant as ever. With the advent of the cloud, data movement has become the bottleneck to address in any data processing system. In the cloud, compute and storage are typically disaggregated, with a network in between. In addition, cloud systems are scale-out, i.e., performance is obtained by parallelizing across machines, which also involves network communication. And while it is possible to use machines with large amounts of memory, the pricing models and the virtualized nature of the cloud tends to favor clusters of smaller computing nodes. Nowadays, the problem of optimizing data movement has become the problem of using the network as efficiently as possible.

journal article

LitStream Collection

DFI: The Data Flow Interface for High-Speed Networks

Thostrup, Lasse; Skrzypczak, Jan; Jasny, Matthias; Ziegler, Tobias; Binnig, Carsten

2022 ACM SIGMOD Record

doi: 10.1145/3542700.3542705pmid: N/A

In this paper, we propose the Data Flow Interface (DFI) as a way to make it easier for data processing systems to exploit high-speed networks without the need to deal with the complexity of RDMA. By lifting the level of abstraction, DFI factors out much of the complexity of network communication and makes it easier for developers to declaratively express how data should be efficiently routed to accomplish a given distributed data processing task. As we show in our experiments, DFI is able to support a wide variety of data-centric applications with high performance at a low complexity for the applications.

journal article

LitStream Collection

Technical Perspective

Kemper, Alfons

2022 ACM SIGMOD Record

doi: 10.1145/3542700.3542706pmid: N/A

With the emergence of (geographically) distributed data mangement in cloud infrastructures the key value systems were promoted as so-called NoSQL systems. In order to achieve maximum availability and performance these KV stores sacrificed the "holy grail" of database consistency and relied on relaxed consistency models, such as eventual consistency.

journal article

LitStream Collection

journal article

LitStream Collection

Technical Perspective of TURL

Papotti, Paolo

2022 ACM SIGMOD Record

doi: 10.1145/3542700.3542708pmid: N/A

Several efforts aim at representing tabular data with neural models for supporting target applications at the intersection of natural language processing (NLP) and databases (DB) [1-3]. The goal is to extend to structured data the recent neural architectures, which achieve state of the art results in NLP applications. Language models (LMs) are usually pre-trained with unsupervised tasks on a large text corpus. The output LM is then fine-tuned on a variety of downstream tasks with a small set of specific examples. This process has many advantages, because the LM contains information about textual structure and content, which are used by the target application without manually defining features.

journal article

LitStream Collection

TURL

Deng, Xiang; Sun, Huan; Lees, Alyssa; Wu, You; Yu, Cong

2022 ACM SIGMOD Record

doi: 10.1145/3542700.3542709pmid: N/A

Relational tables on the Web store a vast amount of knowledge. Owing to the wealth of such tables, there has been tremendous progress on a variety of tasks in the area of table understanding. However, existing work generally relies on heavily-engineered task-specific features and model architectures. In this paper, we present TURL, a novel framework that introduces the pre-training/fine-tuning paradigm to relational Web tables. During pre-training, our framework learns deep contextualized representations on relational tables in a self-supervised manner. Its universal model design with pre-trained representations can be applied to a wide range of tasks with minimal task-specific fine-tuning.

journal article

LitStream Collection

Technical Perspective - No PANE, No Gain

Hogan, Aidan

2022 ACM SIGMOD Record

doi: 10.1145/3542700.3542710pmid: N/A

The machine learning community has traditionally been proactive in developing techniques for diverse types of data, such as text, audio, images, videos, time series, and, of course, matrices, tensors, etc. "But what about graphs?" some of us graph enthusiasts may have asked ourselves, dejectedly, before transforming our beautiful graph into a brutalistic table of numbers that bore little resemblance to its parent, nor the phenomena it represented, but could at least be shovelled into the machine learning frameworks of the time. Thankfully those days are coming to an end.

journal article

LitStream Collection

No PANE, No Gain

Yang, Renchi; Shi, Jieming; Xiao, Xiaokui; Yang, Yin; Bhowmick, Sourav S.; Liu, Juncheng

2022 ACM SIGMOD Record

doi: 10.1145/3542700.3542711pmid: N/A

Given a graph G where each node is associated with a set of attributes, attributed network embedding (ANE) maps each node v 2 G to a compact vector Xv, which can be used in downstream machine learning tasks in a variety of applications. Existing ANE solutions do not scale to massive graphs due to prohibitive computation costs or generation of low-quality embeddings. This paper proposes PANE, an effective and scalable approach to ANE computation for massive graphs in a single server that achieves state-of-the-art result quality on multiple benchmark datasets for two common prediction tasks: link prediction and node classification. Under the hood, PANE takes inspiration from well-established data management techniques to scale up ANE in a single server. Specifically, it exploits a carefully formulated problem based on a novel random walk model, a highly efficient solver, and non-trivial parallelization by utilizing modern multi-core CPUs. Extensive experiments demonstrate that PANE consistently outperforms all existing methods in terms of result quality, while being orders of magnitude faster.

Showing 1 to 10 of 20 Articles

Articles per page

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2007

2006

2005

2004

2003

2002

2001

2000

1999

1998

1997

1996

1995

1994

1993

1992

1991

1990

1989

1988

1987

1986

1985

1984

1983

1982

1981

1980

1978

1977

1976

1975

1974

1973

1969

Related Journals: