journal article
LitStream Collection
Briand, Lionel; Wüst, Jürgen; Lounis, Hakim
doi: 10.1023/A:1009815306478pmid: N/A
Thispaper aims at empirically exploring the relationships betweenmost of the existing design coupling, cohesion, and inheritancemeasures for object-oriented (OO) systems, and the fault-pronenessof OO system classes. The underlying goal of this study is tobetter understand the relationship between existing design measurementin OO systems and the quality of the software developed. in addition,we aim at assessing whether such relationships, once modeled,can be used to effectively drive and focus inspections or testing.The study described here is a replication of an analogous studyconducted in a university environment with systems developedby students. In order to draw more general conclusions and to(dis)confirm the results obtained there, we now replicated thestudy using data collected on an industrial system developedby professionals. Results show that many of our findings areconsistent across systems, despite the very disparate natureof the systems under study. Some of the strong dimensions capturedby the measures in each data set are visible in both the universityand industrial case study. For example, the frequency of methodinvocations appears to be the main driving factor of fault-pronenessin all systems. However, there are also differences across studies,which illustrate the fact that, although many principles andtechniques can be reused, quality does not follow universal lawsand quality models must be developed locally, wherever needed.
Khoshgoftaar, Taghi; Allen, Edward
doi: 10.1023/A:1009803004576pmid: N/A
Predictingwhich modules are likely to have faults during operations isimportant to software developers, so that software enhancementefforts can be focused on those modules that need improvementthe most. Modeling software quality with classification treesis attractive because they readily model nonmonotonic relationships.In this paper, we apply the TREEDISCalgorithm which is a refinement of the CHAID algorithmto build classification-tree models. Chaid-based algorithmsdiffer from other classification-tree algorithms in their relianceon chi-squared tests when building the tree. Classification-treemodels are vulnerable to overfitting, where the model reflectsthe structure of the training data set too closely. Even thougha model appears to be accurate on training data, if overfitted,it may be much less accurate when applied to a current data set.To account for the severe consequences of misclassifying fault-pronemodules, our measure of overfitting is based on expected costsof misclassification, rather than the total number of misclassifications.We conducted a case study of a very large telecommunicationssystem. A two-way analysis of variance with repetitions foundthat TREEDISC's significance level was highly relatedto overfitting, and can be used to control it. Moreover, theminimum number of modules in a leaf also influenced the degreeof overfitting.
Showing 1 to 6 of 6 Articles