De-duplication is a task of identifying one or more records in repository that represents same object or entity. The problem is that the same data may be represented in different way in every database ...