Advertisements
Home News OpenAI’s Legal Woes Driven by Unclear Mesh of Web-Scraping Laws

OpenAI’s Legal Woes Driven by Unclear Mesh of Web-Scraping Laws

by 玛丽

OpenAI Inc. faces a barrage of lawsuits that will test the legality of web-scraping practices used by the artificial intelligence industry to soak up enormous volumes of data across the internet to train popular programs like ChatGPT and DALL-E.

Advertisements

A wide-ranging class action filed last week against OpenAI and investor Microsoft Corp. claimed the company scraped the personal data of hundreds of millions of internet users in violation of a swath of privacy, intellectual property, and anti-hacking laws.

Advertisements

But the legality of using bots to grab information from public websites isn’t entirely clear. Many of the applicable laws were written well before the widespread use of the internet or the development of generative AI, and which laws take priority hasn’t been resolved.

Advertisements

Courts have so far generally allowed the practice. The Computer Fraud and Abuse Act, a 1986 law that prohibits hacking, has been ineffective in blocking scraping. Suing over terms-of-service violations provides few remedies, and scrapers haven’t yet faced privacy suits from those whose data was swept up, attorneys say.

“Setting aside the AI piece of this, web-scraping law was developing in a very different direction, in that it was getting easier and easier to scrape data,” said Gregory Leighton, a privacy attorney at Polsinelli LLP. But now, renewed scrutiny of how AI companies obtain their training data could be shifting those views.

“I think it’s all potentially being turned on its head,” Leighton said.

Decades-Old Tech Law

The most prominent data scraping cases have dealt with claims brought under the 1986 CFAA, which was enacted before the advent of the World Wide Web. The law provides criminal and civil penalties for accessing a computer without authorization or in excess of authorization.

How the statute applies to the modern web roiled courts until the US Supreme Court embraced a narrow reading of the CFAA in 2020. The court in Van Buren v. US concluded that a police officer who misused his authorized access to an online database wasn’t in violation of the law.

The precedent allowed the US Court of Appeals for the Ninth Circuit in 2022 to rule against Microsoft unit LinkedIn Corp. in a long-running scraping dispute with hiQ Labs Inc., a now-defunct analytics firm that was taking data from public LinkedIn profiles.

Those rulings established that the CFAA generally isn’t the right legal avenue for scraping cases, unless a bot is accessing entirely unauthorized data, attorneys say.

Megan Iorio, senior counsel at the Electronic Information Privacy Center, said the issue with most web scraping cases is they usually involve only two parties: the website owner and the scraper.

“The judge in the district court and the Ninth Circuit had a difficult time acknowledging the harm that resulted from the scraping, and that’s in part because the users whose information was scraped were not before the court,” Iorio said of the LinkedIn case, where she was part of an amicus brief highlighting the violations of users’ privacy.

Privacy Questions

The recent lawsuit against OpenAI was brought by 16 internet users who claim their data was stolen. That will help courts recognize that website owners aren’t the only parties affected by scraping, Iorio said.

Even then, the claims are limited without a clear federal privacy statute. States are attempting to fill the gap, with 10 having passed comprehensive consumer privacy laws in recent years.

The complaint alleged violations of Illinois’ Biometric Information Privacy Act, which Iorio said is the plaintiffs’ best argument. But those claims would be limited to Illinois residents.

“The ideal situation would be to have a new federal law that gave people the right to just sue companies that scrape their personal information,” Iorio said.

While there are still outstanding questions about the privacy implications of scraping public websites, the class action included claims under the California Invasion of Privacy Act, a wiretapping statute that protects against the recording of private consumer communications.

The complaint alleged OpenAI takes private data from other products and apps that have begun to integrate ChatGPT. That includes image and location data from Snapchat, music preferences on Spotify, financial information from Stripe, and conversations on Slack and Microsoft Teams, the complaint said.

“It’s not only the major AI providers that have the burden of thinking through this, but it’s the companies that are integrating the AI products and services into their system,” said Caitlin Fennessy, vice president of the International Association of Privacy Professionals.

Where the liability falls for privacy violations when an app integrates an AI model “is still an ongoing debate,” she said.

Clashing Statutes

Although LinkedIn’s CFAA legal pursuit was unsuccessful, it was able to advance claims that hiQ Labs violated the website’s terms of service by scraping data.

Attorneys say breach-of-contract claims now appear to be the most viable option for websites looking to prevent scraping. But that route still comes with limitations.

In a recent data-scraping battle between Google LLC and the song lyrics database Genius, the Second Circuit determined that Genius’ breach-of-contract claims against Google were preempted by the federal Copyright Act.

The US Supreme Court declined to review the case last month.

Genius alleged that Google had scraped lyrics off its website and put them at the top of search results, diverting web traffic from Genius’ website and causing millions in lost ad revenue.

The case highlights what attorneys say is an unresolved tension between copyright and contract law.

Google argued that Genius’ terms of service agreement preventing scraping is actually a copyright claim in disguise, which is prohibited by the Copyright Act’s preemption clause. But because Genius doesn’t own the copyrights to the lyrics, it was left without a legal option to stop Google.

Some think tanks urged the Supreme Court to take the case, arguing it implicated a range of internet business models built around websites that collect and present content that they don’t own.

Leighton said he wasn’t surprised by that outcome: “The Copyright Act is going to knock out everything else more often than not for any datasets that are copyrighted or copyrightable.”

The confusing mesh of laws makes it difficult to predict how the OpenAI lawsuit and future data-scraping cases will play out.

“It will be interesting to see whether this causes new legal standards to emerge out of legislatures,” Leighton said, “or whether we’ll ultimately decide that old, existing stuff like copyright law works here.”

Advertisements

You may also like

logo

Bilkuj is a comprehensive legal portal. The main columns include legal knowledge, legal news, laws and regulations, legal special topics and other columns.

「Contact us: [email protected]

© 2023 Copyright bilkuj.com