Apple, Nvidia, Salesforce and few of the other big tech companies across the world have been accused of training their AI models through YouTube videos of famous creators. As per a report by Wired, the tech giants fed subtitle files downloaded by a non-profit company from over 1,70,000 videos of popular creators including MrBeast, Marques Brownlee (MKBHD), PewDiePie, John Oliver, and Jimmy Kimmel and others, without their consent. For those who don’t know, the subtitle files are effectively transcripts of the video content. While many may think of it as violation of privacy and YouTube’s rules, it is also a major concern of potential copyright violation.
Also read: Hybrid AI is the way ahead to make artificial intelligence more practical on smartphones: Samsung’s Won-Joon Choi
How Apple, Nvidia got the data
The report claims that an investigation by Proof News revealed that several tech giants have used subtitles of thousands of videos on YouTube to train AI. Although YouTube did have a policy that doesn’t allow anyone to harvest materials from their platform without permissions. However, the big tech players reportedly sourced the data from EleutherAI, a platform that claims to help small developers and academics to train AI models. It appears that the data extracted by EleutherAI has also been used by companies such as Apple and Nvidia.
Also read: Apple Intelligence vs Samsung Galaxy AI: Who is ahead in the mobile phone AI race?
Research paper by EleutherAI reveals that their datasets, called the Pile, are open and accessible to anyone with enough computing power and space to access them. The research paper and posts from big tech companies also reflect how these firms valued in hundreds of billions and trillions of dollars, used Pile to train AI. Documents also shed light on Apple using EleutherAI’s Pile to train its high-profile model called OpenELM which debuted in April.
Also read: OpenAI Develops System to Track Progress Toward Human-Level AI
Is Apple responsible for the violation?
It is worth noting that YouTube’s terms and conditions have not been broken by Apple, but by EleutherAI who sourced the data from Google-owned video streaming platform and spread it to numerous developers via Pile. This is not the first example where data has been sourced illegally to train AI systems. One can often spot AI chatbots providing information while plagiarizing entire text when asked for information about niche topics.
One more thing! We are now on WhatsApp Channels! Follow us there so you never miss any updates from the world of technology. To follow the HT Tech channel on WhatsApp, click here to join now!