1 Introduction
Communication complexity is a basic, useful model, introduced by Yao [Yao:1979], which quantifies the total number of bits of communication and rounds of communication required between two or more players to compute a function, where each player holds only part the function’s input. A detailed description of the model can be found, for example, in the book by Kushilevitz and Nisam [KushilevitzNisan1997].
Communication complexity is widely studied and has found application in many areas, including problems such as equality, membership, greaterthan, and predecessor (see the recent book by Rao and Yehudayoff [RaoYehudayoff18]). For the approximate string matching problem, the paper by Starikovskaya [Starikovskaya17] studies its deterministic oneway communication complexity, with application to streaming algorithms, and provides the first sublinearspace algorithm. Apart from these results, little work seems to have been done in general for the communication complexity of string problems [Starikovskaya18].
In this paper, we study the fundamental longest common prefix problem, denoted Lcp, where Alice and Bob each hold a string, and and want to determine the length of the longest common prefix of and , that is, the maximum , such that (where indicates the empty prefix). This problem is also called the greater than problem, since if we view both and as integers, the position immediately after their longest common prefix determines which is larger and smaller. The complexity is measured using the number of rounds required and the total amount of bits exchanged in the communication. An optimal randomized protocol for this problem uses communication and rounds [Smirnov1988, Nisan1993] where is the length of the strings. Other tradeoffs between communication and rounds are also possible [senaVenkateshb2008]. Buhrman et al. [Buhrman] describe how to compute Lcp in rounds and communication.
We show that if and are compressible we can significantly reduce the number of needed rounds while simultaneously matching the bound on the number of bits of communication. With the classic and widely used LempelZiv 77 (LZ77) compression scheme [Ziv1977] we obtain the following bound.
theoremlcp The Lcp problem has a randomized publiccoin round protocol with communication complexity, where is the length of the longest common prefix of and and is the number of phrases in the LZ77 parse of this prefix.
Compared to the optimal uncompressed bound we reduce the number of rounds from to (where typically is much smaller than ). At the same time we achieve communication complexity and thus match or improve the uncompressed bound. Note that the number of rounds is both compressed and output sensitive and the communication is output sensitive.
As far as we know, this is the first result studying the communication complexity problems in LZ77 compressed strings. A previous result by BarYossef et al. [BarYossefJKK04] gives some impossibility results on compressing the text for (approximate) string matching in the sketching model, where a sketching algorithm can be seen as a publiccoin oneway communication complexity protocol. Here we exploit the fact that the common prefixes have the same parsing into phrases up to a certain point, and that the “mismatching” phrase has a back pointer to the portion of the text represented by the previous phrases: Alice and Bob can thus identify the mismatching symbol inside that phrase without further communication (see the “techniques” paragraph).
We extend the result stated in Theorem 1 so as to compute longest common prefixes when Bob holds a set of strings , and the goal is to compute the maximal longest common prefix between and any of the strings . This problem, denoted , naturally captures the distributed scenario, where clients need to search for query strings in a text data base stored at a server. To efficiently handle many queries we want to reduce both communication and rounds for each search. If we again view the strings as integers this is the predecessor problem. We generalize Theorem 1 to this scenario.
theoremlcpk The Lcp problem has a randomized publiccoin round communication protocol with communication complexity, where is the maximal common prefix between and any one of , and is the number of phrases in the LZ77 parse of this prefix. Compared to Theorem 1 we obtain the same number of rounds and only increase the total communication by an additive term. As the total communication increases by at most a factor .
The mentioned results hold only for LZ77 parses without selfreferences (see Sec. LABEL:sub:LZ). We also show how to handle selfreferential LZ77 parses and obtain the following bounds, where we add either extra rounds or extra communication.
theoremlcpself The Lcp problem has an randomized publiccoin protocol with

rounds and communication complexity,

rounds and communication complexity
where is the length of the longest common prefix of and , and is the number of phrases in the selfreferential LZ77 parse of this prefix. theoremlcpkself The Lcp problem has a randomized publiccoin protocol with

rounds and communication complexity,

rounds and communication complexity
where is the length of the maximal common prefix between and any one of , and is the number of phrases in the selfreferential LZ77 parse of this prefix.
Turning again to LZ77 parses without selfreferences we also show the following tradeoffs between rounds and communication.
theoremmultisearch For any constant the Lcp problem has a randomized publiccoin protocol with

rounds and total communication where is the number of phrases in the LZ77 parse of

rounds and total communication where is the number of phrases in the LZ77 parse of the longest common prefix between and
Using the standard transformation technique by Newman [Newman1991] all of the above results can be converted into privatecoin results for bounded length strings: If the sum of the lengths of the strings is , then, Newman’s construction adds an term in communication complexity, and only gives rise to additional round.
1.0.1 Techniques.
Our results rely on the following key idea. First, we want to perform a binary search over the LZ77parses of the strings, to find the first phrase where Alice and Bob disagree. Then, the longest common prefix must end somewhere in the next phrase (see Figure LABEL:fig:common). So Alice needs only to send the offset and length of her next phrase, and Bob can determine the longest common prefix with his string or strings (as proven in Lemma LABEL:lem:lz).
Comments
There are no comments yet.