Concept lattice is an effective tool for concept discovery from data, having the ability to embody relationship of concepts in a vivid and concise way using hasse diagram. Concept lattice has been widely used in information retrieval, digital library, software engineering and knowledge discovery. Rough set theory is a new mathematical tool dealing with vagueness and uncertainty, has found its applications in many areas such as AI, KDD, pattern recognition and classification and fault diagnostication. This paper studies several data mining methods based on concept lattice and rough set theory. Main topics include
1) We present an incremental association rule mining algorithm based on Godin′s lattice construction algorithm. The algorithm scans database only once and generates only maximal item sets. Experiment shows that it is efficient than apriori. We also propose an integrated classification and association rule mining algorithm from concept lattice. The algorithm generates association and classification rule from the lattice with specified right hand side. We present several heuristic rules to fasten and simplify the rule generation process. Experiment shows the algorithm outperforms c4.5 in 8 of 10 data sets. 数据挖掘研究院
2) We propose a new concept approximation method on concept lattice. Using the similar idea of rough set theory and unique properties of concept lattice, upper and lower approximations of any object or attribute set can be found by exploiting meet-(union-)irreducible elements in concept lattice, the approximations can be performed on the fly. We show that our approach is more natural and effective than existing approach. Furthermore, we present a novel approach to compute lattice node from data without generating the whole concept lattice based on the previous work. Upper and lower approximations of any object or attribute set can be found directly by exploiting meet- (union-) irreducible elements in concept lattice. Furthermore, once the approximate nodes are found, we can explore their neighbor nodes. This avoids computing and memory burden brought by lattice generation and makes the concept lattice approach available to large database.
3) Reduct finding, especially optimal reduct finding, similar to feature selection problem, is a crucial task in rough set applications to data mining, We propose a heuristic optimal reduct finding algorithm, which is based on frequencies of attributes appeared in discernibility matrix. Our method does not guarantee to find optimal reduct, but experiment shows that in most situations it does; and it is fast. Constructing a discernibility matrix is a very expensive operation in very large databases. A sampling method for finding approximate reduct in very large database is presented. Experiment shows the method can find good approximate reducts in short time. 数据挖掘研究院
4) It is observed in some practical cases that information system has multivalued attributes. The classic rough set theory has difficulties in dealing with such cases. We present a multivalued rough set model that deal with information systems with multivalued attributes. We generalize the equivalence relation to common relation and propose a new definition of approximation that can approximate rough concept in a closer way. We prove that our definition is better than similar-based rough set model in that rough entropy based on our method is monotonic. We also show how to use our model to extract rules from data.
5) A prototype data mining system is proposed. The system includes main data mining algorithm and procedures. The system is design to meet the following criteria: quick response, scalability, friendly user interface, extensibility, and ability to select models and parameters automatically.
Keywords data mining, rough set, concept lattice, approximation, rule, reduct
数据挖掘研究院

