Relevance feedback is regarded as one of the most powerful techniques to improve the results of content-based image retrieval systems (CBIRSs). Much has been written about different ways to implement relevance feedback [19, 24, 26, 29, 32, 35]. Most systems calculate the relevance feedback from one query step to the following query step. Like this, not the entire query session is used for calculating the next results but only the preceding step. In FourEyes, across session learning is proposed [17] and also, in [33], it is proposed to learn over several temporal scales. Some systems like the image browsers PicHunter [6] or TrackingViper [22] take several steps of user interaction into account to find a target image in a database. In [13], an approach to learn query concepts over several interaction steps without using seed images is proposed, using support vector machines. 数据挖掘实验室
In [12], a system is presented that groups images into clusters and changes these clusters when they are marked with contradicting relevances by another user. For this tool to be effective, all images of the database should have been selected by several users and should have been marked by at least one of these users. In [11], old user judgments are used to propose new images to users based on the items they have already marked. This collaborative filtering is applied to art images in a museum. A web demonstration is accessible at http://abyss.eurecom.fr:1111/AWM/login.html. Amazon (http://www.amazon.com/) also employs this collaborative filtering technique to propose books to potential customers.
In [14], a method to store correlations between images is proposed that promises good results when all images in the database are marked at least once. Large image databases will require extremely large storage capacities to implement this technique. The data to train the system is gained from automatically created usage log files which somewhat limits its expressive power. 数据挖掘研究院
The use of log files to discover knowledge is also very common in many other research areas in connection with the Internet. Log files are used to adapt web pages or web accessible systems to the users needs [36]. In [4], the behavior of users within web pages is analyzed to improve the page layout. Experimental results in [18] have shown strong improvements in CBIRS performance when using feature weights that are calculated with the help of usage log files. This study uses images marked together in the same relevance feedback step for the calculation of a new feature weight. Therefore, it seems logical to take a more formal approach to exploit the usage log files of a CBIRS.
The stated problem has many similarities to the market basket analysis often described in the data mining literature. Supermarkets have large files of items purchased together by a customer at the same shopping. One would like to know which combinations of these items occur significantly often and how association rules can be derived from these data efficiently. In [1, 2, 9, 10, 28], efficient algorithms are described to solve the task. An explorative evaluation of all combinations is infeasible as there are thousands of items and several hundreds of them can be purchased together at the same shopping trip. Thus, we need algorithms to efficiently filter the data and follow promising groupings of items. Association rules can be derived from these data. A sort introduction of association rules is given in Section 3.1. without going into details. 数据挖掘研究院
Section 2. describes the Viper CBIRS we used to validate our approach. It will become clear that the specific architecture of the Viper CBIRS has been a major factor for the success of this study. Section 3. introduces association rules and compares the market basket analysis problem from traditional data mining with our problem of images that are marked together. The section also explains how the data is reduced to make it usable for our purpose. Section 4. shows how the actual feature weighting is calculated and how the learned information is integrated into the image feature weighting scheme. In Section 5.we include the calculated weights into our system and compare the results of the system before and after the use of the additional probabilistic feature weighting. The last section critically discusses the experimental outcome and gives ideas for future work. 数据挖掘研究院

