Ananysis of Modified Inverse Document Frequency Variants For Word Sense Disambiguation
Word Sense Disambiguation (WSD) is a method to find the correct sense of ambiguous word from the existing
senses by calculating the similarity between ambiguous words in Information Retrieval system. In WSD process, there are
many WSD approaches, in which K-Nearest Neighbour (KNN) is used because it is extremely simple and effective in text
classification. The cosine similarity method in KNN, in which term frequency and inverse document frequency (TF-IDF)
scheme is used to calculate the weight of each word. There is a challenge that the original TF-IDF scheme eliminates the
related senses in WSD process although there is a related sense. This paper thus proposes the three modified TF-MIDF
methods to solve the no-relevant problem by modifying the IDF equation and analyses the modified IDF methods to
ascertain which MIDF method can improve the performance of WSD method.
Keywords— WSD, KNN Classifier, Cosine Similarity, Modified Inverse Document Frequency (MIDF), WordNet.