Vebjorn Ljosa and Ambuj K. Singh:
Top-k Spatial Joins of Probabilistic Objects,”
in Proceedings of the 24th International Conference on Data Engineering (ICDE), p. 566–575, April, 2008, doi:10.1109/ICDE.2008.4497465.


Probabilistic data have recently become popular in applications such as scientific and geospatial databases. For images and other spatial datasets, probabilistic values can capture the uncertainty in extent and class of the objects in the images. Relating one such dataset to another by spatial joins is an important operation for data management systems.

We propose probabilistic spatial join (PSJ) queries, which rank the results according to a score that incorporates both the uncertainties associated with the objects and the distances between them. We present the first algorithms for two kinds of PSJ queries: Threshold PSJ queries, which return all pairs that score above a given threshold, and top-k PSJ queries, which return the k top-scoring pairs.

For threshold PSJ queries, we propose a plane sweep algorithm that, because it exploits the special structure of the problem, runs in O(n log n + k)) time, where n is the number of points and k is the number of results. We extend the algorithms to multidimensional data and to top-k PSJ queries. To further speed up top-k PSJ queries, we develop a scheduling technique that estimates the scores at the level of blocks, then hands the blocks to the plane sweep algorithm. By finding high-scoring pairs early, the scheduling allows a large portion of the datasets to be pruned. Experiments demonstrate speed-ups of two orders of magnitude.

Paper (PDF) | Slides (PDF) | Talk (MP3) | Publisher's page | Data | Software