Summary

Balance lookup filters are a Private Information Retrieval (PIR) method employed by non-full node wallet clients to obscure which Bitcoin address's balance information the client is querying a neighbor for. To date, most discussion in this area has revolved around Address Bloom filters.

"A Bloom filter is a space-efficient probabilistic data structure, conceived by Burton Howard Bloom in 1970, that is used to test whether an element is a member of a set. False positive matches are possible, but false negatives are not, thus a Bloom filter has a 100% recall rate." (Wikipedia) For additional background on the way that general bloom filters work, please see the Wikipedia article.

Since Bloom filters are a general PIR method, they could be used by a variety of non-full node wallet clients to query from a variety of information sources, but most discussion of bloom filter-based lookups has revolved specifically around SPV nodes in the Bitcoin P2P network.

Types

Address Bloom Filters

In this scheme, the SPV client creates a Bloom filter that matches some or all of the wallet's addresses. The client will send this filter to full nodes in the Bitcoin P2P network. These nodes, instead of sending all Bitcoin transaction data to the SPV client, will only send the data that matches the Bloom filter.

The SPV client can also update the filter if the wallet generates new addresses for which it needs to look up balance information for but does not bother creating an entirely new and separate Bloom filter for.

The Bitcoin P2P messages related to this communication are detailed in BIP 37, written by Mike Hearn and Matt Corallo. This style of bloom filtering is currently implemented in BitcoinJ.

Privacy Effectiveness of Address Bloom Filters

The privacy/efficiency trade-off is weaker than originally anticipated; ultimately, they offer the same privacy/efficiency trade-off that prefix filters offer.

In On the Privacy Provisions of Bloom Filters in Lightweight Bitcoin Clients, the authors wrote:

We show analytically and empirically that the reliance on Bloom filters within existing SPV clients leaks considerable information about the addresses of Bitcoin users. Our results show that an SPV client who uses a modest number of Bitcoin addresses (e.g., < 20) risks revealing almost all of his addresses. We also show that this information leakage is further exacerbated when users restart their SPV clients and/or when the adversary has access to more than one Bloom filter pertaining to the same SPV client.

Additionally, it is known that the practice of updating existing bloom filters has very low privacy, since the number of addresses that match the difference of the pre-update and post-update filter is low.

Prefix Filtering

Prefix filters are simpler but have similar privacy properties to address Bloom filters. Instead of sending a bloom filter, a client simply queries all of the results for all addresses that start with a specific prefix, such as '1abcd…'. Except for the initial bits of an address that define its type (e.g. P2SH vs. P2PKH), addresses are randomly distributed, and so the privacy set of a prefix filter should be relatively uniformly distributed for all prefix filters of that length; the size of the transaction history of particular prefix filter may be unusually long if addresses that match that prefix have been reused; this is an important practical consideration since address reuse rates in the blockchain have often been around 70%.

Prefix filtering could be generalized to match a substring of an arbitrary length and offset within an address.

It has not been implemented in any Bitcoin wallet client known yet.

Block Bloom Filters

In this scheme, Bitcoin full nodes produce bloom filters for each block added to their chain. SPV nodes subscribe to these filters and receive one per block; since bloom filters are compact, this is a small amount of data per block to receive. SPV nodes can then query their own list of addresses in their wallets to see if any match the bloom filter; if so, they can request the full contents of the transaction history in that block from the full node. The privacy set created depends on the number of unique addresses in that block, which can range from 1 (e.g. due to SPV mining) or several thousand given a 1 MB block.

SPV nodes may be able to detect attempts to falsify the contents of a block bloom filter by also downloading block header information and sanity checking between the two.

This proposal was first posted to the Bitcoin Development mailing list by Adam Back.

Links