I’ve generally been against giving AI works copyright, but this article presented what I felt were compelling arguments for why I might be wrong. What do you think?

  • Even_Adder@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    2
    ·
    9 months ago

    You should know that the statistical models don’t contain copies of their training data. During training, the data is used just to give a bump to the numbers in the model. This is all in service of getting LLMs to generate cohesive text that is original and doesn’t occur in their training sets. It’s also very hard if not impossible to get them to quote back copyrighted source material to you verbatim. If they’re going with the copying angle, this is going to be an uphill battle for them.

    • FlowVoid@midwest.social
      link
      fedilink
      English
      arrow-up
      3
      ·
      edit-2
      9 months ago

      I know the model doesn’t contain a copy of the training data, but it doesn’t matter.

      If the copyrighted data is downloaded at any point during training, that’s an IP violation. Even if it is immediately deleted after being processed by the model.

      As an analogy, if you illegally download a Disney movie, watch it, write a movie review, and then delete the file … then you still violated copyright. The movie review doesn’t contain the Disney movie and your computer no longer has a copy of the Disney movie. But at one point it did, and that’s all that matters.

        • FlowVoid@midwest.social
          link
          fedilink
          English
          arrow-up
          2
          ·
          9 months ago

          No, it doesn’t.

          It defends web scraping (downloading copyrighted works) as legal if necessary for fair use. But fair use is not a foregone conclusion.

          In fact, there was a recent case in which a company was sued for scraping images and texts from Facebook users. Their goal was to analyze them and create a database of advertising trackers, in competition with Facebook. The case settled, but not before the judge noted that the web scraper was not fair use and very likely infringing IP.

            • FlowVoid@midwest.social
              link
              fedilink
              English
              arrow-up
              1
              ·
              edit-2
              9 months ago

              Yes, it absolutely hinges on fair use. That’s why the very first page of the lawsuit alleges:

              “Defendants’ LLMs endanger fiction writers’ ability to make a living, in that the LLMs allow anyone to generate—automatically and freely (or very cheaply)—texts that they would otherwise pay writers to create”

              If the court agrees with that claim, it will basically kill the fair use defense.

              • Even_Adder@lemmy.dbzer0.com
                link
                fedilink
                English
                arrow-up
                1
                ·
                9 months ago

                First of all, fair use is not simple or as clear-cut a concept that can be applied uniformly to all cases than you make it out to be. It’s flexible and context-dependent on careful analysis of four factors: the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect of the use upon the potential market. No one factor is more important than the others, and it is possible to have a fair use defense even if you do not meet all the criteria of fair use.

                Generative models create new and original works based on their weights, such as poems, stories, code, essays, songs, images, video, celebrity parodies, and more. These works may have their own artistic merit and value, and may be considered transformative uses that add new expression or meaning to the original works. Allowing people to generate text that they would otherwise pay writers to create that isn’t making the original redundant nor isn’t reproducing the original is likely fair use. Stopping people from cheaply producing non-infringing text doesn’t seem like something the courts would agree should be stopped just 'cause someone wants to get paid instead.

                I think you’re being too narrow and rigid with your interpretation of fair use, and I don’t think you understand the doctrine that well.

                • FlowVoid@midwest.social
                  link
                  fedilink
                  English
                  arrow-up
                  1
                  ·
                  edit-2
                  9 months ago

                  the purpose and character of the use, the nature of the copyrighted work, the amount and substantiality of the portion used, and the effect of the use upon the potential market.

                  Yes, and I named three of those factors:

                  the key questions are often whether the use of the work (a) is commercial, or (b) may substitute for the original work. Furthermore, the amount of the work copied is also considered.

                  And while you don’t need to meet all the criteria, the odds are pretty long when you fail three of the four (commercial nature, copying complete work rather than a portion, and negative effect on the market for the original).

                  Think of it this way: if it were legal to download books in order to train an AI, then it would also be legal to download books in order to train a human student. After all, why would a human have fewer rights than an AI?

                  Do you really think courts are going to decide that it’s ok to download books from The Pirate Bay or Z-Library, provided they are being read by the next generation of writers?

                  • Even_Adder@lemmy.dbzer0.com
                    link
                    fedilink
                    English
                    arrow-up
                    2
                    ·
                    9 months ago

                    I haven’t seen anyone that has been able to reproduce complete works from an LLM. Open AI also actively stops people from even trying to reproduce anything that resembles copyrighted materials. Signaling their commercial purpose isn’t to substitute for the plaintiff’s works. Filing suit doesn’t make their claims true, you should hold off on hasty judgements.