It features a standalone clone detection tool evolved from the research outlined here. This tool is based on simhash, a similarity preserving hashing technique proven effective in developing near-duplicate detection system for a multi-billion page repository. Simhash has been used successfully in different areas of research, such as text retrieval, web mining and so on. SimCad is the first attempt of using this hashing technique particularly in software code clone detection. It provides a command line interface, where it takes path to a source folder and corresponding language as two mandatory inputs. The result of the detection is written in xml file to a location specified by user or to a default location pre-configured in the tool. The detection configuration can be varied by choosing appropriate values to the number of other options, which are configured as optional along with default value for each of those as appropriate.

Tool Download Link:

Setup Instruction: SimCad-2.2 Readme


It is a clone detection API written in java that provides the similar clone detection functionality as simCad. However unlike simCad it provides a broad range of configuration facility to satisfy the need of identifying clone data having diverse characteristics and most importantly the facility of having the library integrated with the other clone research tool so that the tool gets the clone detection result of a subject system in runtime and directly from the memory. It has been designed on top of a highly modular architecture that facilitates to extend it and design a fully customized new clone detection system including instant clone search that can be used for detecting clone in variety of structured or non-structured data.

SimLib API with Dependencies :   

SimCad Scripts :

SimLib External Config file : simcad.cfg.xml

SimLib Hibernate Config file : hibernate.cfg.xml

SimLib Integration : SimLib Integration.pdf


This is an Eclipse Plugin based on simLib to provide clone detection, visualization and management support in Eclipse IDEs.

simEclipse feature sets:

- Clone Detection
-- Detect clone in complete project
-- Just in time/on demand clone detection for a particular code fragment/file/folder

- Clone Visualization
-- Identifying/highlighting clone code in editor
-- Filter clones [based on type/source region]
-- Side-by-side/pairwise comparison of clone fragments

- Clone Marker
-- Marking presence of clone fragment in a particular file in editor with reference to the siblings outside the file

- Clone Tracking
-- Identification and Notification of newly introduced clone due to development activity

- Source History [must have previous versions of source]
-- Display Code history on demand for any code fragment [function/block]
-- Step by step (version by version) trace back form current version to older ones
-- Display of Code history has been made: a) Focused (onto one or more fragments [function block] rather than for whole file), b) Atomic (not just display the changed area but covers a complete fragment [function/block] to comprehend the whole picture) and c) End-to-End change path (see change altogether across multiple versions)
-- Unified diff based presentation to trace changes back in source across multiple version
-- Visualization leads to a Clone History Map when multiple branches of a selected code fragment history are explored towards older versions

- Clone Genealogy [must have previous versions of source]
-- Generate and display genealogy of whole project
-- Change tracking of each clone fragment by unified diff [with fragment from previous version]
-- Genealogy on demand for a particular clone group [selected from a clone detection result]

Plugin Download Link: SimEclipse

Setup Instruction: SimEclipse Readme

User Manual: SimEclipse User Manual