Co-Change Dispersion Analysis Based on Architecture

This research has investigated the impact of architecture on bugs: do co-changes spanning multiple architecture modules more likely to introduce bugs, compared to co-changes that are within modules?


The method designed and implemented to run our empirical study has several different components. The first component is Co-changes Extractor, which searches source code repositories and retrieves the groups of files which have been changed together. The second component is Defect Extractor, which parses the commit logs of projects and identifies the software changes which introduced the defects/bugs in the system. The third component is Architecture Explorer component which utilizes different reverse engineering approaches and obtains Surrogate Views that approximate the system’s architecture.


The results show that the co-changes that cross architectural module boundaries are more correlated with defects than co-changes within modules, implying that, to improve accuracy, bug predictors should also take the software architecture of the system into consideration.


Surrogate views:

We used six open source projects for our empirical study: Hive, OpenJPA, HBase, PDFBox, Camel and Solr. For each of the projects you can find multiple sub folders for different versions that we used in the study. For each version you can find the following files:

  1. ".odem" file which is generated by class dependency analyzer tool.
  2. ".txt" file which contains the dependency information and is generated based on the ".odem" file and is the input to the Bunch tool
  3. ".bunch" file which is generated by Bunch and includes the clusters in the system
  4.  Two ".rsf" files which are the input and output for ACDC
  5.  ArchDRH view
  6. ".prn" which shows the LDA view


We used R for statistical analysis. Each row int the input file, includes the file name in the project and the corresponding metrics for that file (Intra-module co-change, Cross-module co-change, Number-of-co-changed-files, number-of-defects and LOC). The results of regression analysis and Spearman correlation is located in each folder as well.