A file oriented unstructured data collected and transformed into the data warehouse .Two or more records identified separately actually represent same real world entity, detection and prevention to improve data quality. The proposed technique introduces smart tokens of most representative attributes by sorting those tokens identical records are bring into close neighborhood, record duplicates are identified and removed from the data. Clean consistent and non duplicated data loaded into warehouse. The technique is a mile stone for cleaning data as with the explosive amount of data recording it is the need of time that more corrected data to be provided to the data mangers for effective decisions making.