Get 20M+ Full-Text Papers For Less Than $1.50/day. Start a 14-Day Trial for You or Your Team.

Learn More →

Mining GitHub for research and education: challenges and opportunities

Mining GitHub for research and education: challenges and opportunities This study aims to highlight the challenges and opportunities of using GitHub as a data source in both research and programming education.Design/methodology/approachThis study provides general overview of the challenges and opportunities faced while conducting empirical research using GitHub as a data source. The challenges and opportunities are framed using the input–process–output model of open-source software.FindingsGitHub data accessed from the application programming interface (API) can have several limitations, which can be overcome by Web scraping and using external data repositories such as GHArchive and GHTorrent. There are also several idiosyncrasies about GitHub that researchers need to be aware of to be able to use the data effectively, which can represent an opportunity for research. The challenges and opportunities are summarized for the licenses, community, development process and product of free/libra and open-source software communities hosted on GitHub.Originality/valueThis study provides a summary of GitHub-related challenges and opportunities that researchers can leverage to improve their empirical research. Furthermore, this summary can be a valuable resource for instructors that plan to use GitHub as a data source in their data-focused programming courses. http://www.deepdyve.com/assets/images/DeepDyve-Logo-lg.png International Journal of Web Information Systems Emerald Publishing

Mining GitHub for research and education: challenges and opportunities

Loading next page...
 
/lp/emerald-publishing/mining-github-for-research-and-education-challenges-and-opportunities-aY1m0qGH02

References (34)

Publisher
Emerald Publishing
Copyright
© Emerald Publishing Limited
ISSN
1744-0084
DOI
10.1108/ijwis-03-2020-0016
Publisher site
See Article on Publisher Site

Abstract

This study aims to highlight the challenges and opportunities of using GitHub as a data source in both research and programming education.Design/methodology/approachThis study provides general overview of the challenges and opportunities faced while conducting empirical research using GitHub as a data source. The challenges and opportunities are framed using the input–process–output model of open-source software.FindingsGitHub data accessed from the application programming interface (API) can have several limitations, which can be overcome by Web scraping and using external data repositories such as GHArchive and GHTorrent. There are also several idiosyncrasies about GitHub that researchers need to be aware of to be able to use the data effectively, which can represent an opportunity for research. The challenges and opportunities are summarized for the licenses, community, development process and product of free/libra and open-source software communities hosted on GitHub.Originality/valueThis study provides a summary of GitHub-related challenges and opportunities that researchers can leverage to improve their empirical research. Furthermore, this summary can be a valuable resource for instructors that plan to use GitHub as a data source in their data-focused programming courses.

Journal

International Journal of Web Information SystemsEmerald Publishing

Published: Oct 8, 2020

Keywords: Communities on the Web; Web mining; Data mining; Data sources; Open source; Applications of Web mining and searching; Web-based education; GitHub; Web scraping; Cloud platform

There are no references for this article.