Live Locally, Grow Globally

Research at OU


CCFiner : A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code




A code clone is a code portion in source files that is identical or similar to another. Clones are introduced because of various reasons such as reusing code by ‘copy-and-paste’, etc [4]. Clones make the source files very hard to modify consistently. For example, let’s assume that a software system has several clone subsystems created by duplication with slight modification. When a fault is found in one subsystem, the engineer has to carefully modify all other subsystems[7]. Various clone detection tools have been proposed and implemented [1] [2] [4].

In this paper, we have devised a clone detection algorithm and implemented a tool named CCFinder (Code clone finder). The underlying concepts for designing the tool were as follows.
(1) The tool should be industrial strength, and be applicable to a million-line size system within affordable computation time and memory usage. (2) A clone detection system should have ability to select clones or to report only helpful information for user to examine clones, since large number of clones is expected to be found in large software systems. (3) Renaming variables or editing pasted code after copy-and-paste makes a slightly different pair of code portions. These code portions have to be effectively detected. (4) The language dependent parts of the tool should be limited to a small size, and the tool has to be easily adaptable to many other languages.


 1 of 4 



Back to top