Table of Links
3.1 RQ1: Types of unethical behavior
We crawled issues in GitHub, and obtained 1235 issues/PRs of 842 projects submitted by stakeholders. After reading the stakeholders’ discussion in GitHub issue/PRs and manually filtering out the invalid issues (e.g., issues that mentioned “ethic”’ but only involved updating terms and conditions in document [4]), we obtained 316 issues with 23 keywords (e.g., “copy”, “plagiarism”) shown in the supplementary material. We then identified themes in these keywords by referring to the six principles and their corresponding guidelines. For example, keywords such as “copy” and “plagiarism” belong to the same ethical guideline (“To respect copyright”) for “(S1) No attribution to the author in code”, “(S2) Soft forking”, and “(S3) Plagiarism” as they are all related to giving proper credits to the
authors but we separate them into different types as they involve different degrees of copying (copying entire repository in “Soft forking” versus copying texts in “Plagiarism”). Subsequently, we obtained 15 types of unethical behavior with 11 ethical guidelines. After the generation of initial themes, both authors meet to discuss the 39 cases (12%) with divergent themes to reach a consensus.
Figure 2 shows the 15 types of unethical behavior in our study. Boxes on the left (e.g., “Attribution”) describes the ethical principle behind each type, whereas the grey heading for the boxes on the right (e.g., “To respect copyright”) includes the 11 ethical guidelines, and the contents present the related types of unethical behavior. Six of the 15 types have not been previously studied (i.e., S2, S6, S8, S9, S11, S12).
We explain the 11 ethical guidelines and the corresponding types of unethical behavior below:
(1) To respect copyright. There are three types of unethical behavior related to copyright, described below:
S1: No attribution to the author in code. This issue occurs when the stakeholders failed to give proper credit after copying a piece of code [42]. An example for S1 is:
An example for S1 is: (S1) “it is unethical not to credit or at the least, point out that these features are inspired by...” [5]
S2: Soft forking. This issue occurs if the copied item is a repository and the copied repository has not been forked. Although GitHub encourages forking for social coding, a copied repository should acknowledge the original repository by creating an official fork [72]. An example discussion for S2 is:
(S2) “Unauthorised copy of... unethical... You must delete this repo and fork from the original...” [6]
S3: Plagiarism. Plagiarism occurs if the stakeholders copied texts (non-source code) or the entire product regardless of giving credit or not [35]. An example discussion for S3 is:
(S3) “Interactive book should be free of plagiarism. By replicating the content used by...unethical.” [7]
In this example, the repository of an interactive book is unethical because the book uses copied texts from several websites.
(2) To help individuals make informed consent decisions easier via licensing. There are three types of unethical behavior related to licensing, described below:
S4: License incompatibility. It occurs if the repository includes source code or text files carrying different license types than the project’s license because stakeholders must ensure license compatibility of the repository. Example for S4 is:
(S4) “To continue distributing when we know they have incompatible licenses is unethical.” [8].
S5: No license provided in public repository. This issue occurs if the public repository does not have any license and the stakeholders request for it because licenses state the official permissions to use a repository, and project owners should provide them if the OSS is public for greater transparency. An example comment for S5 is:
(S5) “The repository is public which implies an intent of being open-source but no license is specified making review of the code an issue...People get...at the end of the day, but they are funding this stuff instead of the... developers. That’s unethical but legal.” [9].
S6: Uninformed license change. Due to transparency concerns, OSS developers should inform the stakeholders about the license change (via CHANGELOG or PR) prior to changing the license. S6 occurs if the contributors fail to do so. An example for S6 is:
(S6) “Normally license change are announced in some form of PR or announcement or discussion and none of that has taken place...I find this silent change unethical.” [10]
(3) To avoid license violation. Stakeholders must obey the OSS license agreement and avoid integrating prohibited licenses that cause violations in license dependency chains [61].
S7: Depending on proprietary software. This issue occurs if the OSS project relies on closed-source software because OSS projects should be fully open-sourced. An example comment for S7 is:
(S7) “Since ... is fully open source software, I believe depending on closed source software is unethical” [11].
(4) To respect expectations between people through goodwill. Trust is an ethical principle that refers to respecting expectations between people through goodwill. The following type of unethical behavior may lead to broken trust among stakeholders in OSS projects:
S8: Self-promotion. This issue occurs when the stakeholder advertises his or her repository by suggesting to incorporate it into another repository without mentioning that he or she is a contributor or owner of the artifact. This goal of the stakeholder is to attract attention to his or her less well-known repository to increase its popularity. An example comment for S8 is:
(S8) “I strongly advise against migrating to nanocolors...Seeing him leverage his notability and following to promote and increase the adoption of nanocolors ..., which he just released a few days ago, is unethical...failing to disclose that you are promoting your own package here is a bad” [1] contributor of ESLint later suggested the user to disclose the fact that he is promoting his own library.
(5) To be responsible for the project maintenance. Project owners, especially those who offer paid services should actively maintain their projects. If the project owners would to discontinue their technical supports, they should inform the users before asking them to pay for the service.
S9: Unmaintained project with paid service. This issue occurs if the project repository is not actively maintained when it has a paid service. It is unethical because the project owner is responsible for providing support to paid users who reported the bugs, and fix the bugs within a reasonable time. An example for S9 is:
(S9) “I just bought the pro version, and now I’m having this same problem...definitely unethical.” [12]
In this example, the user who has paid for the open-source app reported the failure in using themes (a functionality that is only available for paid users) but the app is no longer maintained.
(6) To avoid fraudulent activities. As a code of conduct in OSS, stakeholders should be aware of malicious activities.
S10: Vulnerable code/API. The issue occurs when stakeholders or a project is involved in malicious activities (e.g., contributing malicious code/API or leaving an unfixed vulnerability in the code). An example comment for S10 is:
(S10) “Given that iText 2.1.7 has...unfixed security vulnerability, ...continuing to release it is unethical. In my opinion, iText 2.1.7 should be replaced by OpenPDF.” [13]
In this example, the user suggested replacing iText which has unfixed vulnerability with another library (OpenPDF) where the vulnerability has been fixed. Another example for S10 is the “hypocrite commits” incident mentioned in Section 1.
(7) To be responsible for naming. Stakeholders are responsible for all software artifacts that they owned, including the selected names.
S11: Naming confusion. This issue occurs when it involves the stakeholders’ duty to give unique names for their artifacts (e.g., packages, variables, and libraries). Project owners should identify unique names before using the names. An example for S11 is:
(S11) “There is already a package ‘click’ for creating command-line interfaces. I am using coreapi package which ... import click package:... your library does not have a style component and python throws an error...this kind of behavior for a company... unethical” [14].
In the above example, a user complained that the developers of the click-integration-django library select the same package name as the Click package, causing a error when using the package due to naming conflicts.
(8) To be responsible for explaining public actions. Owners of OSS projects should explain each decision made for supporting users.
S12: Closing issue/PR without explanation. This occurs when an issue/PR has been closed without providing any explanation because all stakeholders are expected to receive reasonable explanations for informational fairness [47]. An example for S12 is:
(S12) “It’s a bit unfair to just close something without explaining why?...I don’t understand why this (despite several closed issues all saying the same thing) isn’t being implemented” [15].
(9) To avoid offensive language. Stakeholders should encourage respectful environment in OSS projects by avoiding offensive language because words with offensive language might represent unethical behavior [49]. Prior study stated that hate speech (offensive words) might not be a criminal offense but can still be harmful [70].
S13: Offensive language. This occurs if the stakeholders or part of the project uses offensive language. An example for S13 is:
(S13) “Rename the Scroll of Genocide to something else...It was never a good or ethical name...It is not “merely” systemic and deliberate mass-murder...but state-enacted systemic destruction, neglect and suppression of entire schools of culture, science, literature, truth, of everything that makes us human” [16].
In this example, the stakeholder thinks that using the word “Genocide” to name a scroll in the open-source game is unethical because the word promotes intentional destruction of human being.
(10) To allow individuals to choose which tasks to perform. Based on the “Autonomy” ethical principle, stakeholders of OSS should have the freedom to choose the tasks to perform.
S14: No opt-in or no option allowed. This occurs if the system does not provide users options such as withdrawing from using the product. For example, no option is available for uninstalling the third-party library. We focus on issues with “no option” or “no opt-in” because they provide stronger protections than opt-out [44]. An example comment for S14 is:
(S14) “There should be an option if someone wants to completely remove ... from the system...I think it’s unethical to not provide an easy way for a program to be uninstalled” [17].
(11) To protect the right of an individual of personal information. The privacy of stakeholders of OSS should be protected.
S15: Privacy Violation. This occurs in OSS projects under two common scenarios: (1) if the software still collects data despite opting-out via consent, and (2) if there exist personal data leaks regardless of the options (opt-in/out). Example for S15 is:
(S15) “Form submitted even if opt-in checkbox is unchecked...Signing people up when they haven’t opted in is a major enough bug that it renders the plugin useless (or at least unethical)” [18]
Table 1 presents the numbers of issues we found for each type of unethical behavior. The “Type” and “Issues (#)” columns represent the types of unethical behavior and the number of issues we found in GitHub, respectively. Overall, our study identifies 15 types of unethical behavior where the most common types of unethical behavior are related to copyright (S1, S2, and S3) and licensing (S4, S5, S6, and S7). As our study shows that illegal copying of code (S1) or copying the entire repository (S2), or copying texts (S3) are common in OSS projects, we hope to raise awareness to stakeholders of OSS projects that such behavior is considered unethical.
Authors:
(1) Hsu Myat Win, Southern University of Science and Technology, China ([email protected]);
(2) Haibo Wang, Southern University of Science and Technology, China ([email protected]);
(3) Shin Hwei Tan, a corresponding author from Southern University of Science and Technology, China ([email protected]).
This paper is