Archived link

Opinionated article by Alexander Hanff, a computer scientist and privacy technologist who helped develop Europe’s GDPR (General Data Protection Regulation) and ePrivacy rules.

We cannot allow Big Tech to continue to ignore our fundamental human rights. Had such an approach been taken 25 years ago in relation to privacy and data protection, arguably we would not have the situation we have to today, where some platforms routinely ignore their legal obligations at the detriment of society.

Legislators did not understand the impact of weak laws or weak enforcement 25 years ago, but we have enough hindsight now to ensure we don’t make the same mistakes moving forward. The time to regulate unlawful AI training is now, and we must learn from mistakes past to ensure that we provide effective deterrents and consequences to such ubiquitous law breaking in the future.

  • Bronzebeard@lemm.ee
    link
    fedilink
    English
    arrow-up
    1
    ·
    17 days ago

    That’s stupid. The damage is still done to the owner of that data used illegally. Make them destroy it.

    But when you levy such miniscule fines that are less than they stand to make from it, it’s just a cost of business. Fines can work if they were appropriate to the value derived.

    • GenderNeutralBro@lemmy.sdf.org
      link
      fedilink
      English
      arrow-up
      0
      ·
      17 days ago

      I guess the idea is that the models themselves are not infringing copyright, but the training process DID. Some of the big players have admitted to using pirated material in training data. The rest obviously did even if they haven’t admitted it.

      While language models have the capacity to produce infringing output, I don’t think the models themselves are infringing (though there are probably exceptions). I mean, gzip can reproduce infringing material too with the correct input. If producing infringing work requires both the algorithm AND specific, intentional user input, then I don’t think you should put the blame solely on the algorithm.

      Either way, I don’t think existing legal frameworks are suitable to answer these questions, so I think it’s more important to think about what the law should be rather than what it currently is.

      I remember stories about the RIAA suing individuals for many thousands of dollars per mp3 they downloaded. If you applied that logic to OpenAI — maximum fine for every individual work used — it’d instantly bankrupt them. Honestly, I’d love to see it. But I don’t think any copyright holder has the balls to try that against someone who can afford lawyers. They’re just bullies.

      • P03 Locke@lemmy.dbzer0.com
        link
        fedilink
        English
        arrow-up
        0
        ·
        edit-2
        16 days ago

        I guess the idea is that the models themselves are not infringing copyright, but the training process DID.

        I’m still not understanding the logic. Here is a copyrighted picture. I can search for it, download it, view it, see it with my own eye balls. My browser already downloaded the image for me, in order for me to see it in the browser. I can take that image and edit it in a photo editor. I can do whatever I want with the image on my own computer, as long as I don’t publish the image elsewhere on the internet. All of that is legal. None of it infringes on copyright.

        Hell, it could be argued that if I transform the image to a significant degree, I can still publish it under Fair Use. But, that still gets into a gray area for each use case.

        What is not a gray area is what AI training does. They download the image and use it in training, which is like me looking at a picture in a browser. The image isn’t republished, or stored in the published model, or represented in any way that could be reconstructed back to the source image in any reasonable form. It just changes a bunch of weights in a LLM model. It’s mathematically impossible for a 4GB model to somehow store the many many terabytes of images on the internet.

        Where is the copyright infringement?

        I remember stories about the RIAA suing individuals for many thousands of dollars per mp3 they downloaded. If you applied that logic to OpenAI — maximum fine for every individual work used — it’d instantly bankrupt them. Honestly, I’d love to see it. But I don’t think any copyright holder has the balls to try that against someone who can afford lawyers. They’re just bullies.

        You want to use the same bullshit tactics and unreasonable math that the RIAA used in their court cases?

        • GenderNeutralBro@lemmy.sdf.org
          link
          fedilink
          English
          arrow-up
          2
          ·
          13 days ago

          I agree that the models themselves are clearly transformative. That doesn’t mean it’s legal for Meta to pirate everything on earth to use for training. THAT’S where the infringement is. And they admitted they used pirated material: https://www.techspot.com/news/101507-meta-admits-using-pirated-books-train-ai-but.html

          You want to use the same bullshit tactics and unreasonable math that the RIAA used in their court cases?

          I would enjoying seeing megacorps held to at least the same standards as individuals. I would prefer for those standards to be reasonable across the board, but that’s not really on the table here.

    • haverholm@kbin.earth
      link
      fedilink
      arrow-up
      0
      ·
      17 days ago

      Yeah, the only threat to Big Tech is that they might sink a lot of money into training material they’d have to give away later. But releasing the material into the Public Domain is not exactly an improvement for the people whose data and work has been used without consent or payment.

      “Congratulations, your rights are still being violated, but now the data is free to use for everyone”.

      • Teils13@lemmy.eco.br
        link
        fedilink
        arrow-up
        1
        ·
        13 days ago

        They would actually still benefit from public-domain’ing LLMs, because they themselves also get to use the data produced by others. Everyone gets losses but also gets gains on this idea, which is much better than current model.

        • haverholm@kbin.earth
          link
          fedilink
          arrow-up
          1
          ·
          13 days ago

          That’s like saying victims of deepfake porn benefit because they get to watch themselves having sex. Nope, not buying it.

          • Teils13@lemmy.eco.br
            link
            fedilink
            arrow-up
            1
            ·
            edit-2
            10 days ago

            Well, the better analogy would be that these victims would be able to do deepfake porn of their enemies too, or any other generated video that can compromise he-she too. Instead of the status quo of the victim not being able to generate anything while the criminal can just mass produce deepfake porn. Not really a happy world, but a better model, which was the point.