Design and Develop Semantic Textual Document Clustering Model
SK Ahammad Fahad, Wael Mohamed Shaher Yafooz

The utilization of textual documents is spontaneously increasing over the internet, email, web pages, reports, journals, articles and they stored in the electronic database format. It is challenging to find and access these documents without proper classification mechanisms. To overcome such difficulties we proposed a semantic document clustering model and develop this model. The document pre-processing steps, semantic information from WordNet help us to be bioavailable the semantic relation from raw text. By reminding the limitation of traditional clustering algorithms on the natural language, we consider semantic clustering by COBWEB conceptual clustering. Clustering quality and high accuracy were one of the most important aims of our research, and we chose F-Measure evaluation for ensuring the purity of clustering. However, there still exist many challenges, like the word, high spatial property, extracting core linguistics from texts, and assignment adequate description for the generated clusters. By the help of Word Net database, we eliminate those issues. In this research paper, there have a proposed framework and describe our development evaluation with evaluation.

Full Text: PDF     DOI: 10.15640/jcsit.v5n2a4