ChatGPT is making up fake Guardian articles. Here’s how we’re responding

ChatGPT is making up fake Guardian articles. Here’s how we’re responding | Chris Moran

Last month one of our journalists received an interesting email. A researcher had come across mention of a Guardian article, written by the journalist on a specific subject from a few years before. But the piece was proving elusive on our website and in search. Had the headline perhaps been changed since it was launched? Had it been removed intentionally from the website because of a problem we’d identified? Or had we been forced to take it down by the subject of the piece through legal means?

The reporter couldn’t remember writing the specific piece, but the headline certainly sounded like something they would have written. It was a subject they were identified with and had a record of covering. Worried that there may have been some mistake at our end, they asked colleagues to go back through our systems to track it down. Despite the detailed records we keep of all our content, and especially around deletions or legal issues, they could find no trace of its existence.

Why? Because it had never been written.

Luckily the researcher had told us that they had carried out their research using ChatGPT. In response to being asked about articles on this subject, the AI had simply made some up. Its fluency, and the vast training data it is built on, meant that the existence of the invented piece even seemed believable to the person who absolutely hadn’t written it.

Huge amounts have been written about generative AI’s tendency to manufacture facts and events. But this specific wrinkle – the invention of sources – is particularly troubling for trusted news organisations and journalists whose inclusion adds legitimacy and weight to a persuasively written fantasy. And for readers and the wider information ecosystem, it opens up whole new questions about whether citations can be trusted in any way, and could well feed conspiracy theories about the mysterious removal of articles on sensitive issues that never existed in the first place.

If this seems like an edge case, it’s important to note that ChatGPT, from a cold start in November, registered 100 million monthly users in January. TikTok, unquestionably a digital phenomenon, took nine months to hit the same level. Since that point we’ve seen Microsoft implement the same technology in Bing, putting pressure on Google to follow suit with Bard.

They are now implementing these systems into Google Workspace and Microsoft 365, which have a 90% plus share of the market between them. A recent study of 1,000 students in the US found that 89% have used ChatGPT to help with a homework assignment. The technology, with all its faults, has been normalised at incredible speed, and is now at the heart of systems that act as the key point of discovery and creativity for a significant portion of the world.

Two days ago our archives team was contacted by a student asking about another missing article from a named journalist. There was again no trace of the article in our systems. The source? ChatGPT.

It’s easy to get sucked into the detail on generative AI, because it is inherently opaque. The ideas and implications, already explored by academics across multiple disciplines, are hugely complex, the technology is developing rapidly, and companies with huge existing market shares are integrating it as fast as they can to gain competitive advantages, disrupt each other and above all satisfy shareholders.

But the question for responsible news organisations is simple, and urgent: what can this technology do right now, and how can it benefit responsible reporting at a time when the wider information ecosystem is already under pressure from misinformation, polarisation and bad actors.

This is the question we are currently grappling with at the Guardian. And it’s why we haven’t yet announced a new format or product built on generative AI. Instead, we’ve created a working group and small engineering team to focus on learning about the technology, considering the public policy and IP questions around it, listening to academics and practitioners, talking to other organisations, consulting and training our staff, and exploring safely and responsibly how the technology performs when applied to journalistic use.

In doing this we have found that, along with asking how we can use generative AI, we are reflecting more and more on what journalism is for, and what makes it valuable. We are excited by the potential, but our first task must be to understand it, evaluate it and decode its potential impact on the wider world.

In the next few weeks we’ll be publishing a clear and concise explanation of how we plan to employ generative AI. In the simplest terms, we will continue to hold ourselves to the highest journalistic standards and remain accountable to our readers and the world for the journalism we publish. While so much has changed in the last six months, in this crucial respect, nothing has changed at all.

via the Guardian

April 6, 2023 at 04:03PM