Pontificia Universidad Católica de Chile Pontificia Universidad Católica de Chile
(2021)

When is approximate counting for conjunctive queries tractable?

Revista : Proceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing
Páginas : 1015–1027
Tipo de publicación : ISI Ir a publicación

Abstract

Conjunctive queries are one of the most common class of queries used in database systems, and the best studied in the literature. A seminal result of Grohe, Schwentick, and Segoufin (STOC 2001) demonstrates that for every class G of graphs, the evaluation of all conjunctive queries whose underlying graph is in G is tractable if, and only if, G has bounded treewidth. In this work, we extend this characterization to the counting problem for conjunctive queries. Specifically, for every class C of conjunctive queries with bounded treewidth, we introduce the first fully polynomial-time randomized approximation scheme (FPRAS) for counting answers to a query in C, and the first polynomial-time algorithm for sampling answers uniformly from a query in C. As a corollary, it follows that for every class G of graphs, the counting problem for conjunctive queries whose underlying graph is in G admits an FPRAS if, and only if, G has bounded treewidth (unless BPP is different from P). In fact, our FPRAS is more general, and also applies to conjunctive queries with bounded hypertree width, as well as unions of such queries.

The key ingredient in our proof is the resolution of a fundamental counting problem from automata theory. Specifically, we demonstrate the first FPRAS and polynomial time sampler for the set of trees of size n accepted by a tree automaton, which improves the prior quasi-polynomial time randomized approximation scheme (QPRAS) and sampling algorithm of Gore, Jerrum, Kannan, Sweedyk, and Mahaney ’97. We demonstrate how this algorithm can be used to obtain an FPRAS for many open problems, such as counting solutions to constraint satisfaction problems (CSP) with bounded hypertree width, counting the number of error threads in programs with nested call subroutines, and counting valid assignments to structured DNNF circuits.