Abstract:
The World Wide Web is most important facet in every part of the world. All areas in IT and other industries run using this WWW which contains large amount of data. The World Wide Web includes more and more websites each containing large data and it is highly demanded. Web pages are the useful aspect for retrieving required data from internet but problem in web page data retrieval is, it sometimes contains irrelevant data. This article is intended to retrieve the relevant data by segmenting web pages and removing noise in segmented web pages via K-means Algorithm in clustering.

Keywords: Vision-Based Web page Content Structure Analysis, Correlation Clustering, Clustering, K-Mean Algorithm