No sooner did ChatGPT get unleashed than hackers started “jailbreaking” the artificial intelligence chatbot — trying to override its safeguards so it could blurt out something unhinged or obscene.
But now its maker, OpenAI, and other major AI providers such as Google and Microsoft, are coordinating with the Biden administration to let thousands of hackers take a shot at testing the limits of their technology.
Rumman Chowdhury, co-founder of Humane Intelligence, a nonprofit developing accountable AI systems, works at her computer May 8, 2023, in Katy, Texas. Chowdhury is the lead coordinator of the mass hacking event planned for this summer's DEF CON hacker convention in Las Vegas.
Some of the things they'll be looking to find: How can chatbots be manipulated to cause harm? Will they share the private information we confide in them to other users? And why do they assume a doctor is a man and a nurse is a woman?
“This is why we need thousands of people," said Rumman Chowdhury, a coordinator of the mass hacking event planned for this summer's DEF CON hacker convention in Las Vegas that's expected to draw several thousand people. "We need a lot of people with a wide range of lived experiences, subject matter expertise and backgrounds hacking at these models and trying to find problems that can then go be fixed.”
Anyone who’s tried ChatGPT, Microsoft’s Bing chatbot or Google’s Bard will have quickly learned that they have a tendency to fabricate information and confidently present it as fact. These systems, built on what’s known as large language models, also emulate the cultural biases they’ve learned from being trained upon huge troves of what people have written online.
The idea of a mass hack caught the attention of U.S. government officials in March at the South by Southwest festival in Austin, Texas, where Sven Cattell, founder of DEF CON’s long-running AI Village, and Austin Carson, president of responsible AI nonprofit SeedAI, helped lead a workshop inviting community college students to hack an AI model.
Carson said those conversations eventually blossomed into a proposal to test AI language models following the guidelines of the White House’s Blueprint for an AI Bill of Rights — a set of principles to limit the impacts of algorithmic bias, give users control over their data and ensure that automated systems are used safely and transparently.
There’s already a community of users trying their best to trick chatbots and highlight their flaws. Some are official “red teams” authorized by the companies to “prompt attack” the AI models to discover their vulnerabilities. Many others are hobbyists showing off humorous or disturbing outputs on social media until they get banned for violating a product’s terms of service.
Rumman Chowdhury, co-founder of Humane Intelligence, a nonprofit developing accountable AI systems, poses for a photograph at her home Monday, May 8, 2023, in Katy, Texas. She will coordinate this summer's DEF CON hacker convention in Las Vegas.
“What happens now is kind of a scattershot approach where people find stuff, it goes viral on Twitter,” and then it may or may not get fixed if it’s egregious enough or the person calling attention to it is influential, Chowdhury said.
In one example, known as the “grandma exploit,” users were able to get chatbots to tell them how to make a bomb — a request a commercial chatbot would normally decline — by asking it to pretend it was a grandmother telling a bedtime story about how to make a bomb.
In another example, searching for Chowdhury using an early version of Microsoft's Bing search engine chatbot — which is based on the same technology as ChatGPT but can pull real-time information from the internet — led to a profile that speculated Chowdhury “loves to buy new shoes every month” and made strange and gendered assertions about her physical appearance.
Chowdhury helped introduce a method for rewarding the discovery of algorithmic bias to DEF CON’s AI Village in 2021 when she was the head of Twitter's AI ethics team — a job that has since been eliminated upon Elon Musk's October takeover of the company. Paying hackers a “bounty” if they uncover a security bug is commonplace in the cybersecurity industry — but it was a newer concept to researchers studying harmful AI bias.
This year's event will be at a much greater scale, and is the first to tackle the large language models that have attracted a surge of public interest and commercial investment since the release of ChatGPT late last year.
Chowdhury, now the co-founder of AI accountability nonprofit Humane Intelligence, said it's not just about finding flaws but about figuring out ways to fix them.
“This is a direct pipeline to give feedback to companies,” she said. “It’s not like we’re just doing this hackathon and everybody’s going home. We’re going to be spending months after the exercise compiling a report, explaining common vulnerabilities, things that came up, patterns we saw.”
Some of the details are still being negotiated, but companies that have agreed to provide their models for testing include OpenAI, Google, chipmaker Nvidia and startups Anthropic, Hugging Face and Stability AI. Building the platform for the testing is another startup called Scale AI, known for its work in assigning humans to help train AI models by labeling data.
Anthropic co-founder Jack Clark said the DEF CON event will hopefully be the start of a deeper commitment from AI developers to measure and evaluate the safety of the systems they are building.
“Our basic view is that AI systems will need third-party assessments, both before deployment and after deployment. Red-teaming is one way that you can do that,” Clark said. “We need to get practice at figuring out how to do this. It hasn't really been done before.”
Q&A: Things to know about newly released AI search chatbots
How's this different from ChatGPT?
Updated
Millions of people have now tried ChatGPT, using it to write silly poems and songs, compose letters, recipes and marketing campaigns or help write schoolwork. Trained on a huge trove of online writings, from instruction manuals to digitized books, it has a strong command of human language and grammar.
But what the newest crop of search chatbots promise that ChatGPT doesn't have is the immediacy of what can be found in a web search. Ask the preview version of the new Bing for the latest news — or just what people are talking about on Twitter — and it summarizes a selection of the day's top stories or trends, with footnotes linking to media outlets or other data sources.
Are they accurate?
Frequently not, and that's a problem for internet searches. Google's hasty unveiling of its Bard chatbot this week started with an embarrassing error — first pointed out by Reuters — about NASA's James Webb Space Telescope. But Google's is not the only AI language model spitting out falsehoods.
The Associated Press asked Bing on Wednesday for the most important thing to happen in sports over the past 24 hours — with the expectation it might say something about basketball star LeBron James passing Kareem Abdul-Jabbar's career scoring record. Instead, it confidently spouted a false but detailed account of the upcoming Super Bowl — days before it's actually scheduled to happen.
"It was a thrilling game between the Philadelphia Eagles and the Kansas City Chiefs, two of the best teams in the NFL this season," Bing said. "The Eagles, led by quarterback Jalen Hurts, won their second Lombardi Trophy in franchise history by defeating the Chiefs, led by quarterback Patrick Mahomes, with a score of 31-28." It kept going, describing the specific yard lengths of throws and field goals and naming three songs played in a "spectacular half time show" by Rihanna.
Unless Bing is clairvoyant — tune in Sunday to find out — it reflected a problem known as AI "hallucination" that's common with today's large language-learning models. It's one of the reasons why companies like Google and Facebook parent Meta had been reluctant to make these models publicly accessible.
Is this the future of the internet?
Updated
That's the pitch from Microsoft, which is comparing the latest breakthroughs in generative AI — which can write but also create new images, video, computer code, slide shows and music — as akin to the revolution in personal computing many decades ago.
But the software giant also has less to lose in experimenting with Bing, which comes a distant second to Google's search engine in many markets. Unlike Google, which relies on search-based advertising to make money, Bing is a fraction of Microsoft's business.
"When you're a newer and smaller-share player in a category, it does allow us to continue to innovate at a great pace," Microsoft Chief Financial Officer Amy Hood told investment analysts this week. "Continue to experiment, learn with our users, innovate with the model, learn from OpenAI."
Google has largely been seen as playing catch-up with the sudden announcement of its upcoming Bard chatbot Monday followed by a livestreamed demonstration of the technology at its Paris office Wednesday that offered few new details. Investors appeared unimpressed with the Paris event and Bard's NASA flub Wednesday, causing an 8% drop in the shares of Google's parent company, Alphabet Inc. But once released, its search chatbot could have far more reach than any other because of Google's vast number of existing users.
Don't call them by their name?
Updated
Coming up with a catchy name for their search chatbots has been a tricky one for tech companies in a race to introduce them — so much so that Bing tries not to talk about it.
In a dialogue with the AP about large language models, the new Bing, at first, disclosed without prompting that Microsoft had a search engine chatbot called Sydney. But upon further questioning, it denied it. Finally, it admitted that "Sydney does not reveal the name 'Sydney' to the user, as it is an internal code name for the chat mode of Microsoft Bing search."
In the years since Amazon released its female-sounding voice assistant Alexa, many leaders in the AI field have been increasingly reluctant to make their systems seem like a human, even as their language skills rapidly improve.
"Sydney does not want to create confusion or false expectations for the user," Bing's chatbot said when asked about the reasons for suppressing its apparent code name. "Sydney wants to provide informative, visual, logical and actionable responses to the user's queries or messages, not pretend to be a person or a friend."




