• 0 Posts
  • 6 Comments
Joined 1 year ago
cake
Cake day: June 19th, 2023

help-circle

  • I’d honestly go one step further and say that the problem cannot be fully solved period.

    There are limited uses for voice cloning: commercial (voice acting), malicious (impersonation), accessibility (TTS readers), and entertainment (porn, non-commercial voice acting, etc.).

    Out of all of these only commercial uses can really be regulated away as corporations tend to be risk averse. Accessibility use is mostly not an issue since it usually doesn’t matter whose voice is being used as long as it’s clear and understandable. Then there’s entertainment. This one is both the most visible and arguably the least likely to disappear. Long story short, convincing enough voice cloning is easy - there are cutting-edge projects for it on github, written by a single person and trained on a single PC, capable of being run locally on average hardware. People are going to keep using it just like they were using photoshop to swap faces and manual audio editing software to mimic voices in the past. We’re probably better off just accepting that this usage is here to stay.

    And lastly, malicious usage - in courts, in scam calls, in defamation campaigns, etc. There’s strong incentive for malicious actors to develop and improve these technologies. We should absolutely try to find a way to limit its usage, but this will be eternal cat and mouse game. Our best bet is to minimize how much we trust voice recordings as a society and, for legal stuff, developing some kind of cryptographic signature that would confirm whether or not the recording was taken using a certified device - these are bound to be tampered with, especially in high profile cases, but should hopefully somewhat limit the damage.



  • GPT3 is pretty bad at it compared to alternatives (although it’s hard to compete with calculators on that field), but if it was just repeating after the training dataset it would be way worse. From the study I’ve linked in my other comment (https://arxiv.org/pdf/2005.14165.pdf):

    On addition and subtraction, GPT-3 displays strong proficiency when the number of digits is small, achieving 100% accuracy on 2 digit addition, 98.9% at 2 digit subtraction, 80.2% at 3 digit addition, and 94.2% at 3-digit subtraction. Performance decreases as the number of digits increases, but GPT-3 still achieves 25-26% accuracy on four digit operations and 9-10% accuracy on five digit operations, suggesting at least some capacity to generalize to larger numbers of digits.

    To spot-check whether the model is simply memorizing specific arithmetic problems, we took the 3-digit arithmetic problems in our test set and searched for them in our training data in both the forms " + =" and " plus ". Out of 2,000 addition problems we found only 17 matches (0.8%) and out of 2,000 subtraction problems we found only 2 matches (0.1%), suggesting that only a trivial fraction of the correct answers could have been memorized. In addition, inspection of incorrect answers reveals that the model often makes mistakes such as not carrying a “1”, suggesting it is actually attempting to perform the relevant computation rather than memorizing a table.



  • That’s not entirely true.

    LLMs are trained to predict next word given context, yes. But in order to do that, they develop internal model that minimizes error across wide range of contexts - and emergent feature of this process is that the model DOES perform more than pure compression of the training data.

    For example, GPT-3 is able to calculate addition and subtraction problems that didn’t appear in the training dataset. This would suggest that the model learned how to perform addition and subtraction, likely because it was easier or more efficient than storing all of the examples from the training data separately.

    This is a simple to measure example, but it’s enough to suggests that LLMs are able to extrapolate from the training data and perform more than just stitch relevant parts of the dataset together.