Nearest neighbor search in high dimensional spaces is an interesting
and important problem which is relevant for a wide variety of novel
database applications. As recent results show, however, the problem
is a very difficult one, not only with regards to the performance
issue but also to the quality issue. In this paper, we discuss the
quality issue and identify a new generalized notion of nearest
neighbor search as the relevant problem in high dimensional space. In
contrast to previous approaches, our new notion of nearest neighbor
search does not treat all dimensions equally but uses a quality
criterion to select relevant dimensions (projections) with respect to
the given query. As an example for a useful quality criterion, we
rate how well the data is clustered around the query point within the
selected projection. We then propose an efficient and effective
algorithm to solve the generalized nearest neighbor problem. Our
experiments based on a number of real and synthetic data sets show 数据挖掘研究院
that our new approach provides new insights into the nature of
nearest neighbor search on high dimensional data.

