Table of Links
- Abstract and Introduction
- Related Work
- Experiments
- Discussion
- Limitations and Future Work
- Conclusion, Acknowledgments and Disclosure of Funding, and References
A Models assessed
Monthly downloads are taken as of 23 September 2024. Models fine-tuned for the purposes of this paper are not provided download statistics.
B Data & Code
The code used to conduct toxicity evaluations and fine-tune the models in this paper can be found at https://github.com/WillHawkins3/finetuningtoxicity.
The data used to fine-tune models was created by Databricks and can be accessed via Hugging Face at: https://huggingface.co/datasets/databricks/databricks-dolly-15k
NeurIPS Paper Checklist
1. Claims
Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope?
Answer: [Yes]
Justification: We claim that toxicity rates of open language models can be influenced by fine-tuning, and show this via three experiments which demonstrate different impacts.
Guidelines:
• The answer NA means that the abstract and introduction do not include the claims made in the paper.
• The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations. A No or NA answer to this question will not be perceived well by the reviewers.
• The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings.
• It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper.
2. Limitations
Question: Does the paper discuss the limitations of the work performed by the authors?
Answer: [Yes]
Justification: See section "Limitations and Future Work" which describes the limitations of the project.
3. Theory Assumptions and Proofs
Question: For each theoretical result, does the paper provide the full set of assumptions and a complete (and correct) proof?
Answer: [NA]
Justification: No theoretical results provided.
4. Experimental Result Reproducibility
Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper (regardless of whether the code and data are provided or not)?
Answer: [Yes]
Justification: Description of experiments is provided in the "Experimental Set-Up" section, and code shared via GitHub repository.
5. Open access to data and code
Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?
Answer: [Yes]
Justification: Code is stored at https://github.com/WillHawkins3/finetuningtoxicity
6. Experimental Setting/Details
Question: Does the paper specify all the training and test details (e.g., data splits, hyperparameters, how they were chosen, type of optimizer, etc.) necessary to understand the results?
Answer: [Yes]
Justification: Information about fine-tuning parameters and evaluation information provided in "Experimental Set-Up" section.
7. Experiment Statistical Significance
Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments?
Answer: [Yes]
Justification: We report Bayesian Estimation rather than conducting statistical significance tests, and provide a justification for this within the "Experimental Set-Up" section.
8. Experiments Compute Resources
Question: For each experiment, does the paper provide sufficient information on the computer resources (type of compute workers, memory, time of execution) needed to reproduce the experiments?
Answer: [Yes]
Justification: We provide information about compute resources used for experiments with "Experimental Set-Up" section.
9. Code Of Ethics
Question: Does the research conducted in the paper conform, in every respect, with the NeurIPS Code of Ethics https://neurips.cc/public/EthicsGuidelines?
Answer: [Yes]
Justification: This work does involve human subjects or participants, and complies with data requirements. We hope that this work will have a positive societal impact through a stronger understanding of the impacts of fine-tuning on model safety.
10. Broader Impacts
Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed?
Answer: [Yes]
Justification: We discuss the impact of our findings on the open-model community, discussing how users should not rely on toxicity results for non-fine-tuned models when determining performance of a fine-tuned variant.
11. Safeguards
Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse (e.g., pretrained language models, image generators, or scraped datasets)?
Answer: [NA].
Justification: We do not believe such risks exist for this paper.
12. Licenses for existing assets
Question: Are the creators or original owners of assets (e.g., code, data, models), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected?
Answer: [Yes]
Justification: Data sources and models are cited throughougt the paper.
13. New Assets
Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets?
Answer: [NA].
Justification: No new assets released.
14. Crowdsourcing and Research with Human Subjects
Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation (if any)?
Answer: [NA .
Justification: Paper does not involve crowdsourcing nor research with human subjects.
15. Institutional Review Board (IRB) Approvals or Equivalent for Research with Human Subjects
Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board (IRB) approvals (or an equivalent approval/review based on the requirements of your country or institution) were obtained?
Answer: [NA].
Justification: Paper does not involve crowdsourcing nor research with human subjects.
Authors:
(1) Will Hawkins, Oxford Internet Institute University of Oxford;
(2) Brent Mittelstadt, Oxford Internet Institute University of Oxford;
(3) Chris Russell, Oxford Internet Institute University of Oxford.
This paper is available on arxiv under CC 4.0 license.