• IphtashuFitz@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 months ago

    Asking for a friend…

    What would it take to create a domain that just acts as a proxy to Reddit but serves up its own robots.txt that allows all bots?

    • Wilzax@lemmy.world
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 months ago

      Probably a LOT of proxy IPs to act as different “Users” so you can overcome the rate limit that I expect they would be using to enforce such a deal

  • Boozilla@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    2 months ago

    I’m sure they’ve convinced the board and the shareholders that this is some kind of big win. But I don’t think it’s going to be impressive for very long.

    There’s only so much value an AI can learn from reddit bullshit like “1. break off all contact 2. hit the gym 3. profit” and “the narwhal bacons at midnight” and endless boring pun threads.

    • Pechente@feddit.org
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 months ago

      Short term profit is all they care about until this platform crashes down completely

    • Even_Adder@lemmy.dbzer0.com
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 months ago

      It sounds a lot like this quote from Andrej Karpathy :

      Turns out that LLMs learn a lot better and faster from educational content as well. This is partly because the average Common Crawl article (internet pages) is not of very high value and distracts the training, packing in too much irrelevant information. The average webpage on the internet is so random and terrible it’s not even clear how prior LLMs learn anything at all.

      • vxx@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        2 months ago

        So it will end in a downward spiral because it starts learning from AI articles, from which articles are being written, from which the AI learns, from which articles are being written …

    • SpaceCowboy@lemmy.ca
      link
      fedilink
      English
      arrow-up
      1
      ·
      2 months ago

      It’s happening with Amazon now.

      Wheels are in motion on anti-monopoly, but it’s a major societal shift and that takes time. Time and not electing billionaires to public office. A democracy isn’t going to build up momentum to do anything about the unchecked power of billionaires if around half the population is voting for billionaires.

  • mozz@mbin.grits.dev
    link
    fedilink
    arrow-up
    0
    ·
    2 months ago

    “Hey, so it’s me, the guys who left all those comments. Yeah, so we decided that since we wrote them, and the American system says that means we hold the copyright, we don’t really want you selling them without (a) securing our permission first, and (b) giving us a cut of the action. Were thinking maybe like a 30% royalty. It’s not like exorbitant; it probably won’t work out to much more than a few cents per user. But it’s more about the principle, you know?”

    “Anyway, what do you think?”

    WHAT DO I THINK

    I THINK IT’S ALL MINE

    DO YOU HEAR ME

    MINE

    NOW PAY ME FOR THE USE OF MY API YOU FILTHY PEASANT

    PAY ME NOW

    IT’S ALL MINE, PAY ME

    600K A YEAR IS NOT ENOUGH

    PAY ME MORE PAY ME PAY ME PAY ME

    • RmDebArc_5@sh.itjust.works
      link
      fedilink
      English
      arrow-up
      2
      ·
      2 months ago

      I hate to break to you, but when you accepted the TOS you gave away everything including your soul. Check out Tosdr, look for Reddit and click on “you wave your moral rights”

  • Vaeril@sh.itjust.works
    link
    fedilink
    English
    arrow-up
    0
    ·
    2 months ago

    Why do I still see Reddit results on DDG? Is that just old stuff and new stuff won’t be indexed?

    • tb_@lemmy.world
      link
      fedilink
      English
      arrow-up
      1
      arrow-down
      1
      ·
      2 months ago

      404 notes that Bing, DuckDuckGo, Mojeek, and Qwant are all affected, with results either not showing anything recent, or not showing the full site result. Kagi, a paid search engine, is apparently still showing data, but only because it buys some of its search index from Google, which continues to have access to Reddit data through the aforementioned deal.