Excess Entropy Based Outlier Detection In Categorical Data Set

Many outlier detection methods have been proposed because of need of finding meaningful information by removal of unwanted data based on classification, clustering, frequent patterns and statistics. Among them information theory has some different perspective while its computation is based on statistical approach. The outlier detection from unsupervised data sets is more challenging since there is no inherent measurement of distance between objects. We proposed a novel framework based on information theoretic measures for outlier detection in unsupervised data with the help of Excess Entropy. In which we are using different information theoretic measures such as entropy and dual correlation. Based on this model we proposed EEB-SP outlier detection algorithm which do not require any user defined parameter except input data set. We also used the formal definition of outliers which depends upon the weighted entropy. This algorithm detects outliers in large scale unsupervised datasets expertly than other existing methods.