As AI-generated content fills the Internet, it’s corrupting the training data for models to come. What happens when AI eats itself?

  • admiralteal@kbin.social
    link
    fedilink
    arrow-up
    4
    ·
    11 months ago

    But malicious actors don’t want their generated data to be recognizable to LLMs. They want it to be impersonating real people in order to promote advertising/misinformation goals.

    Which means that even if they started flagging LLM generated content as LLM generated, that would just mean only the most malicious and vile LLM contents will be out there training models in the future.

    I don’t see any solution to this on the horizon. Pandora is out of the box.

    • blivet@kbin.social
      link
      fedilink
      arrow-up
      6
      ·
      11 months ago

      If the quality of AI-generated content degrades to the point where it’s useless that is also fine with me.

    • Machinist3359@kbin.social
      link
      fedilink
      arrow-up
      1
      ·
      11 months ago

      To flip it, this means that only AI which responsibly manages it’s initial data set will be successful. Can’t simply scrape and pray, need to have some level of vetting with input.

      More labor intensive? Sure, but AI companies aren’t entitled to quick and easy solutions they started with…

      • admiralteal@kbin.social
        link
        fedilink
        arrow-up
        1
        ·
        11 months ago

        That doesn’t follow.

        It means the AI companies that don’t behave responsibly will have a huge advantage over the ones that do.