@ton @Erik @tindall Copilot has been known to generate existing snippets verbatim so it has the original code encoded in its knowledge base. Facial recognition AI training (which I do not condone, FYI) presumably does not store the raw images after training, if it's not running afoul of copyright law. Either that or it is and the regulators don't give a shit, like many suspect will be the case with GitHub.
@AgreeableLandscape @Erik @tindall afair the IBM facial recog training set based on flickr did not contain images, it contained values describing the image (facial measurements and color tones), and metadata pointing to the image. If code is getting regurgitated verbatim the trained model indeed might be a copyright breach, and I suppose specific output might be too, depending on clip size and originality of snippet.
@AgreeableLandscape @Erik @tindall another q I haven't seen much disc on is if the ML model is not in breach nor quoting chunks entirely, then any output fails the originanlity req for copyright, and is public domain. Ianal, but that's bound to have consequences on the orginiality and copyright of anyone using copilot heavily?
The social network of the future: No ads, no corporate surveillance, ethical design, and decentralization! Own your data with Mastodon! Anyone with an @utwente.nl or @*.utwente.nl email address can create an account on mastodon.utwente.nl.