Code cloning is serious and ubiquitous. Are you affected?
We found that this corpus has a mere 85 million unique files. In other words, 70% of the code on GitHub consists of clones of previously created files.
In this website you can find how to access the code clone mapping, through a web service or direct access to a database, how to download the clone mapping and how to access the source code used to create it.
DéjàVu Web App
We provide a web-service for clones information retrieval and easy source code/projects/datasets analysis.
This service is ongoing work and depends on community feedback. We are happy to implement functionalities you require.
Access to the Code Clone Mapping
You can directly download the data for each language individually:
If you want access to the dumps through a different process we will do our best to suit your needs (come visit us and bring a hard drive!). Contact us, we like to talk.
Software used to create the Clone Mapping
The software used to create this mapping can be found on GitHub here and here.
We also created an artifact in the form of a VirtualBox virtual machine, which provides a quick access to the pipeline through a guided tutorial, and can be found here. The password is p. 8.7Gb
This website supports a research project about code cloning on GitHub, accepted for publication at OOPSLA'17 (Distinguished Award at OOPSLA).
Read in “The Morning Paper”: https://t.co/VG1lWDVt8D
Read the paper: https://t.co/4GCauHzvmG pic.twitter.com/Quk6LCmVqX