404 Media reports that NVIDIA is allegedly taking videos from the internet such as movie and game trailers for their AI products.
Recent reports have revealed that NVIDIA has allegedly been taking videos from the internet such as movie and game trailers to train their AI products.
This information was published by 404 Media (via Game Developer). If you are interested in the state of the video game industry, you can check out our other articles here.
NVIDIA Collects Game Clips from the Internet to Train AI?
New from 404 Media: we got a massive leak from inside Nvidia (emails, Slack chats, documents) which show how it created a yet-to-be-released AI model. The leak shows that Nvidia scraped YouTube en masse, had clearance from highest levels of the company https://t.co/iLaGPgIDSD— Joseph Cox (@josephfcox) August 5, 2024
404 Media reports that NVIDIA is allegedly harvesting videos from the internet such as movie and game trailers for its AI products. As a result, clients using NVIDIA’s AI products are at risk of inadvertent copyright infringement.
Like other AI product makers, NVIDIA requires data for its text, video, and audio generators using “data scrapping” techniques without getting permission from the people who create them.
404 Media highlights how big tech companies are playing around with copyright law when it comes to generative AI and how other industries like entertainment and gaming could be affected by it.
NVIDIA employees have expressed concerns about the company’s behavior. Despite these concerns, NVIDIA told 404 Media that its data scraping directives “are fully consistent with the letter and spirit of copyright law. (…) Fair use protects the ability to use a work for transformative purposes, such as model training.”
An employee who spoke to 404 Media claims that he and others were told to capture full-length videos that could help train NVIDIA’s AI products, specifically gameplay footage that NVIDIA engineers highly covet. Getting the footage involved collaborating with NVIDIA GeForce NOW.
In a Slack conversation, senior research analyst Jim Fan noted NVIDIA GeForce NOW’s streaming capabilities for capturing and storing video. According to Fan, all of that “high-quality gameplay video” is “very useful” data to capture.
“We will work closely with (GeForce NOW) and related engineering teams to prepare for live game data capture, improve the pipeline, and process it for training,” Fan wrote.
However, NVIDIA employees who raised concerns were also told by project managers that the data harvesting was an “executive decision” that was not worth worrying about. “Open legal issues,” such as violating YouTube’s Terms of Service, would apparently be resolved in the future.
404 Media also included excerpts from internal documents and Slack channels from several AI researchers that show NVIDIA’s active efforts to avoid bad press. NVIDIA’s vice president of research, Ming-Yu Liu, emphasized that there would be no “negative sentiment” if the company did not publish research on data scraping.
“What we’re doing here is not going to get published,” Liu wrote. He and other employees also created their own YouTube “data scrapper” and an API account to help with the process.