DOE vs. Github (amended complaint) Court Filing (Redacted), June 8, 2023, is part of HackerNoon’s Legal PDF Series. You can jump to any part in this filing here. This is part 23 of 38.

VII. FACTUAL ALLEGATIONS

129. Codex and Copilot have no way to determine whether license text or other Copyright Management Information (“CMI”)[21] is part of the code it appears immediately before or after. Unless instructed otherwise, it will assume that CMI that usually appears just before a given block of code is an important part of that code or otherwise necessary for it to function.

130. It is a common practice to provide the applicable license text at the top of every source file in the codebase. The purpose of this practice is to avoid the code from being divorced from the license. This may occur via “vendoring,” a method of creating a derivative work by including source files from a copyrighted project directly into another project without following the terms of the license or providing attribution or a copyright notice. Copilot circumvents this protective measure to mask the degree of vendoring it engages in.

131. Early iterations of Copilot reproduced license text. For example, in a blog post, GitHub noted “In one instance, GitHub Copilot suggested starting an empty file with something it had even seen more than a whopping 700,000 different times during training–that was the GNU General Public License.”[22] Copilot no longer suggests licenses in this way because it has been altered not to. As GitHub explains: “GitHub Copilot has changed to require a minimum file content. So some of the suggestions flagged here would not have been shown by the current version.”

132. In July 2021, near Copilot’s launch, it would sometimes produce license text, attribution, and copyright notices. This CMI was not always accurate. Copilot no longer reproduces these types of CMI, incorrect or otherwise, on a regular basis. It has been altered not to.

133. In July 2022, in response to public criticism of Copilot’s mishandling of Licensed Materials, GitHub introduced a user-settable Copilot filter called “Suggestions matching public code.” If set to “block,” this filter claims to prevent Copilot from suggesting verbatim excerpts of “about 150 characters” that come from Licensed Materials. But even assuming the filter works as advertised, because it only checks for verbatim excerpts, it does nothing to impede the Outputs from Copilot that are modifications of Licensed Materials. Thus, as a device for respecting the rights of Plaintiffs and the Class, it is essentially worthless.

134. In GitHub’s hands, the propensity for small cosmetic variations in Copilot’s Output is a feature, not a bug. These small cosmetic variations mean that GitHub can deliver to Copilot customers unlimited modified copies of Licensed Materials without ever triggering Copilot’s verbatim-code filter. AI models like Copilot often have a setting called temperature that specifically controls the propensity for variation in their output. On information and belief, GitHub has optimized the temperature setting of Copilot to produce small cosmetic variations of the Licensed Materials as often as possible, so that GitHub can deliver code to Copilot users that works the same way as verbatim code, while claiming that Copilot only produces verbatim code 1% of the time. Copilot is an ingenious method of software piracy.

135. In December 2022, GitHub launched Copilot for Business. The initial terms of service included one notable extra provision compared to ordinary Copilot: a “Defense of Third Party Claims” that read:

GitHub will defend you against any claim by an unaffiliated thirdparty that your use of GitHub Copilot misappropriated a trade secret or directly infringes a patent, copyright, trademark, or other intellectual property right of a third party, up to the greater of $500,000.00 USD or the total amount paid to GitHub for the use of GitHub Copilot during the 12 months preceding the claim. GitHub’s defense obligations do not apply if (i) the claim is based on Code that differs from a Suggestion provided by GitHub Copilot, (ii) you fail to follow reasonable software development review practices designed to prevent the intentional or inadvertent use of Code in a way that may violate the intellectual property or other rights of a third party, or (iii) you have not enabled all filtering features available in GitHub Copilot.

136. If Copilot had been designed to reproduce the attribution, license terms, and copyright notices of the Licensed Materials, this kind of contractual reassurance wouldn’t be necessary. With this provision (since removed), GitHub acknowledged that Copilot disrupts— possibly with legal consequences—the relationship between authors and users of open-source software.

Continue Reading Here.


About HackerNoon Legal PDF Series: We bring you the most important technical and insightful public domain court case filings.

This court case 4:22-cv-06823-JST retrieved on August 26, 2023, from Storage Courtlistener is part of the public domain. The court-created documents are works of the federal government, and under copyright law, are automatically placed in the public domain and may be shared without legal restriction.