Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Coherence of comments and method implementations: a dataset and an empirical investigation

Coherence of comments and method implementations: a dataset and an empirical investigation In this paper, we present the results of a manual assessment on the coherence between the comments and the implementation of 3636 methods in three open source software applications (for one of these applications, we considered two different subsequent versions) implemented in Java. The results of this assessment have been collected in a dataset we made publicly available on the Web. The creation of this dataset is based on a protocol that is detailed in this paper. We present that protocol to let researchers evaluate the goodness of our dataset and to ease its future possible extensions. Another contribution of this paper consists in preliminarily investigating on the effectiveness of adopting a Vector Space Model (VSM) with the tf-idf schema to discriminate coherent and non-coherent methods. We observed that the lexical similarity alone is not sufficient for this distinction, while encouraging results have been obtained by applying an Support Vector Machine (SVM) classifier on the whole vector space. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png Software Quality Journal Springer Journals

Coherence of comments and method implementations: a dataset and an empirical investigation

Loading next page...
 
/lp/springer_journal/coherence-of-comments-and-method-implementations-a-dataset-and-an-BsN9LddU0u
Publisher
Springer Journals
Copyright
Copyright © 2016 by Springer Science+Business Media New York
Subject
Computer Science; Software Engineering/Programming and Operating Systems; Programming Languages, Compilers, Interpreters; Data Structures, Cryptology and Information Theory; Operating Systems
ISSN
0963-9314
eISSN
1573-1367
DOI
10.1007/s11219-016-9347-1
Publisher site
See Article on Publisher Site

Abstract

In this paper, we present the results of a manual assessment on the coherence between the comments and the implementation of 3636 methods in three open source software applications (for one of these applications, we considered two different subsequent versions) implemented in Java. The results of this assessment have been collected in a dataset we made publicly available on the Web. The creation of this dataset is based on a protocol that is detailed in this paper. We present that protocol to let researchers evaluate the goodness of our dataset and to ease its future possible extensions. Another contribution of this paper consists in preliminarily investigating on the effectiveness of adopting a Vector Space Model (VSM) with the tf-idf schema to discriminate coherent and non-coherent methods. We observed that the lexical similarity alone is not sufficient for this distinction, while encouraging results have been obtained by applying an Support Vector Machine (SVM) classifier on the whole vector space.

Journal

Software Quality JournalSpringer Journals

Published: Nov 7, 2016

References