A Review on Noise Removal from Web pages for Web Content Mining

Data Mining

Data Mining

Yogita Patel

Yogita Patel

Abstract :

Today internet has made the life of human depen-dent on it. Almost everything and anything can be used for discovering useful knowledge or information from the web page. A web page typically contain large amount of information that is not part of the main contents of the pages. E.g. Banner ads, navigation bars, copy right, privacy notices, advertisements which are not related to the main content. So noisy data affect the performance of web content mining. The main objectives of this area is removing such irrelevant information in web pages. The main purpose of this paper is to review and discuss the research work that has been done in this area and identifying the issues in this area.

