A two-armed bandit collective for hierarchical examplar based mining of frequent itemsets with applications to intrusion detection
Chapter, Peer reviewed
MetadataShow full item record
Original versionHaugland, V., Kjølleberg, M., Larsen, S. v.-E., & Granmo, O.-C. (2014). A two-armed bandit collective for hierarchical examplar based mining of frequent itemsets with applications to intrusion detection Transactions on Computational Collective Intelligence XIV (Vol. 8615, pp. 1-19): Springer. 10.1007/978-3-662-44509-9_1
In this paper we address the above problem by posing frequent item-set mining as a collection of interrelated two-armed bandit problems. We seek to find itemsets that frequently appear as subsets in a stream of itemsets, with the frequency being constrained to support granularity requirements. Starting from a randomly or manually selected examplar itemset, a collective of Tsetlin automata based two-armed bandit players - one automaton for each item in the examplar - learns which items should be included in the mined frequent itemset. A novel reinforcement scheme allows the bandit players to learn this in a decentralized and online manner by observing one itemset at a time. By invoking the latter procedure recursively, a progressively more fine granular summary of the itemset stream is produced, represented as a hierarchy of frequent item-sets. The proposed scheme is extensively evaluated using both artificial data as well as data from a real-world network intrusion detection application. The results are conclusive, demonstrating an excellent ability to find frequent itemsets. Also, computational complexity grows merely linearly with the cardinality of the examplar itemset. Finally, the hierarchical collections of frequent itemsets produced for network intrusion detection are compact, yet accurately describe the different types of network traffic present Over the last decades, frequent itemset mining has become a major area of research, with applications including indexing and similarity search, as well as mining of data streams, web, and software bugs. Although several efficient techniques for generating frequent itemsets with a minimum frequency have been proposed, the number of item-sets produced is in many cases too large for effective usage in real-life applications. Indeed, the problem of deriving frequent itemsets that are both compact and of high quality, remains to a large degree open.
Published version of a chapter in the book: Transactions on Computational Collective Intelligence XIV. Also available from the publisher at: http://dx.doi.org/10.1007/978-3-662-44509-9_1