與 ChatGPT 和其他人工智能聊天機(jī)器人聊足夠長時(shí)間,就能發(fā)現(xiàn)它們很快就會(huì)說謊話,。
這種現(xiàn)象被描述為幻覺,、虛構(gòu)或純粹是胡編亂造,現(xiàn)在已成為每家企業(yè),、機(jī)構(gòu)和高中生試圖讓生成式人工智能系統(tǒng)編寫文檔和完成工作時(shí)會(huì)遇到的問題,。從心理治療到研究和撰寫法律摘要,有些人將其用于可能產(chǎn)生嚴(yán)重后果的任務(wù),。
聊天機(jī)器人Claude 2的制造商Anthropic的聯(lián)合創(chuàng)始人兼總裁丹妮拉·阿莫代伊(Daniela Amodei)說:“我認(rèn)為,,如今沒有一種模型不會(huì)產(chǎn)生幻覺?!?/p>
阿莫代伊表示:“實(shí)際上,,它們的設(shè)計(jì)初衷只是用來預(yù)測下一個(gè)單詞。因此,,模型在某些情況下會(huì)出現(xiàn)失誤?!?/p>
Anthropic,、ChatGPT 制造商 OpenAI 和其他被稱為大型語言模型的人工智能系統(tǒng)的主要開發(fā)商表示,他們正在努力使這些模型變得更準(zhǔn)確,。
至于這需要多長時(shí)間,,以及它們是否能做到準(zhǔn)確無誤地提供醫(yī)療建議,還有待觀察,。
語言學(xué)教授,、華盛頓大學(xué)(University of Washington)計(jì)算語言學(xué)實(shí)驗(yàn)室主任艾米麗·本德(Emily Bender)說:"這是無法解決的,是技術(shù)與擬議用例不匹配的通病,?!?/p>
生成式人工智能技術(shù)的可靠性至關(guān)重要。麥肯錫全球研究院(McKinsey Global Institute)預(yù)計(jì),,這將為全球經(jīng)濟(jì)帶來相當(dāng)于2.6萬億至4.4萬億美元的收入,。聊天機(jī)器人引爆一波熱潮,可以生成新圖像,、視頻,、音樂和計(jì)算機(jī)代碼等的技術(shù)也掀起了熱潮。幾乎所有的工具都包含一些語言組件,。
谷歌(Google)已經(jīng)在向新聞機(jī)構(gòu)推銷一款新聞寫作人工智能產(chǎn)品,。對新聞機(jī)構(gòu)來說,,準(zhǔn)確性至關(guān)重要。作為與OpenAI合作的一部分,,美聯(lián)社(The Associated Press)也在探索使用這項(xiàng)技術(shù),,而OpenAI正在付費(fèi)使用美聯(lián)社的部分存檔文本來改進(jìn)其人工智能系統(tǒng)。
計(jì)算機(jī)科學(xué)家加內(nèi)什·巴格勒(Ganesh Bagler)與印度的酒店管理機(jī)構(gòu)合作,,多年來一直致力于讓人工智能系統(tǒng)(包括 ChatGPT 的前身)發(fā)明南亞菜肴的食譜,,比如新式印度比爾亞尼菜(以米飯為主)。一種“令人產(chǎn)生幻覺”的配料就可能決定菜肴美味與否,。
今年 6 月,,OpenAI 首席執(zhí)行官山姆·奧特曼訪問印度時(shí),一位德里英德拉普拉斯塔信息技術(shù)研究所(Indraprastha Institute of Information Technology Delhi)的教授提出了一些尖銳的問題,。
“我想ChatGPT產(chǎn)生幻覺仍然是可以接受的,,但當(dāng)食譜出現(xiàn)幻覺時(shí),問題就嚴(yán)重了,?!卑透窭赵谶@位美國科技高管的全球之行新德里站上,在擁擠的校園禮堂里站起來對奧特曼說道,。
“你怎么看待這個(gè)問題?”巴格勒最后問道,。
即使沒有做出明確的承諾,奧特曼也表達(dá)了樂觀的態(tài)度,。
奧特曼說:“我相信,,用一年半到兩年的時(shí)間,我們團(tuán)隊(duì)就能基本解決幻覺的問題,。大致如此,。到那時(shí),我們就無需討論這一問題了,。創(chuàng)意和完全準(zhǔn)確之間存在微妙的平衡,,模型需要學(xué)習(xí)在特定時(shí)間,你需要的是哪一種類型的答案,?!?/p>
但對于一些研究這項(xiàng)技術(shù)的專家來說,比如華盛頓大學(xué)的語言學(xué)家本德,,這些改進(jìn)還遠(yuǎn)遠(yuǎn)不夠,。
本德將語言模型描述為根據(jù)訓(xùn)練語料,“對不同詞形字符串的可能性進(jìn)行建?!钡南到y(tǒng),。
拼寫檢查器就是通過這樣的語言模型來檢查你是否打錯(cuò)字了。本德說,這樣的語言模型還能助力自動(dòng)翻譯和轉(zhuǎn)錄服務(wù),,"使輸出結(jié)果看起來更像目標(biāo)語言中的典型文本",。許多人在編寫短信或電子郵件使用"自動(dòng)補(bǔ)全"功能時(shí),都依賴這項(xiàng)技術(shù)的某個(gè)版本,。
最新一批聊天機(jī)器人,,如ChatGPT、Claude 2或谷歌的Bard,,試圖通過生成全新的文本段落來將這一技術(shù)提高到新水平,,但本德表示,它們?nèi)匀恢皇侵貜?fù)選擇字符串中最合理的下一個(gè)單詞,。
當(dāng)用于生成文本時(shí),,語言模型“被設(shè)計(jì)為編造內(nèi)容。這就是語言模型完成的所有任務(wù),?!北镜抡f。他們擅長模仿各種寫作形式,,比如法律合同,、電視劇本或十四行詩。
本德說:“但由于它們只會(huì)編造內(nèi)容,,所以當(dāng)它們編造出來的文本恰好被解讀為內(nèi)容正確(我們認(rèn)為準(zhǔn)確無誤)時(shí),,那只是偶然。即使通過微調(diào),,使其在大多數(shù)情況下都是正確的,,它們?nèi)匀粫?huì)出現(xiàn)失誤——而且很可能出現(xiàn)的情況是,閱讀文本的人很難注意到這類錯(cuò)誤,,因?yàn)檫@類錯(cuò)誤更隱蔽?!?/p>
Jasper AI公司總裁謝恩·奧利克(Shane Orlick)說,,對于那些向 Jasper AI 尋求幫助撰寫宣傳文案的營銷公司來說,這些錯(cuò)誤并不是什么大問題,。
奧利克說:"幻覺實(shí)際上是一種額外的獎(jiǎng)勵(lì),。經(jīng)常有客戶告訴我們Jasper是如何提出創(chuàng)意的——Jasper是如何創(chuàng)作出他們想不到的故事或是從他們都想不到的角度切入的?!?/p>
這家總部位于德克薩斯州的初創(chuàng)公司與OpenAI,、Anthropic、谷歌或臉書(Facebook)母公司Meta等合作伙伴合作,,為客戶提供各種人工智能語言模型,,以滿足他們的需求。奧利克說,該公司可能為關(guān)注準(zhǔn)確性的客戶提供Anthropic的模型,,而為關(guān)注其專有源數(shù)據(jù)安全性的客戶提供不同的模型,。
奧利克說,他知道幻覺不會(huì)輕易被修復(fù),。他寄希望于像谷歌這樣的公司投入大量精力和資源來解決這一問題,,他表示谷歌的搜索引擎必須有"高標(biāo)準(zhǔn)的事實(shí)性內(nèi)容"。
"我認(rèn)為他們不得不解決這一問題,。"奧利克說,。"他們必須解決這一問題。我不知道它是否會(huì)變得完美,,但隨著時(shí)間的推移,,它可能會(huì)日臻完善?!?/p>
包括微軟(Microsoft)聯(lián)合創(chuàng)始人比爾·蓋茨(Bill Gates)在內(nèi)的科技樂觀主義者一直在預(yù)測樂觀的前景,。
蓋茨在7月份的一篇博客文章中詳細(xì)闡述了他對人工智能社會(huì)風(fēng)險(xiǎn)的看法,他說:“隨著時(shí)間的推移,,我很樂觀地認(rèn)為,,人工智能模型能夠?qū)W會(huì)區(qū)分事實(shí)和虛構(gòu)?!?/p>
他引用了OpenAI 2022年的一篇論文,,論證“在這方面大有可為”。
但即使是奧特曼,,當(dāng)他推銷產(chǎn)品的各種用途時(shí),,也不指望模型在為自己尋找信息時(shí)是可信的。
“我可能是世界上最不相信ChatGPT給出的答案的人了,?!眾W特曼在巴格勒所在的大學(xué)里對聽眾說,引來一片笑聲,。(財(cái)富中文網(wǎng))
譯者:中慧言-王芳
與 ChatGPT 和其他人工智能聊天機(jī)器人聊足夠長時(shí)間,,就能發(fā)現(xiàn)它們很快就會(huì)說謊話。
這種現(xiàn)象被描述為幻覺,、虛構(gòu)或純粹是胡編亂造,,現(xiàn)在已成為每家企業(yè)、機(jī)構(gòu)和高中生試圖讓生成式人工智能系統(tǒng)編寫文檔和完成工作時(shí)會(huì)遇到的問題,。從心理治療到研究和撰寫法律摘要,,有些人將其用于可能產(chǎn)生嚴(yán)重后果的任務(wù)。
聊天機(jī)器人Claude 2的制造商Anthropic的聯(lián)合創(chuàng)始人兼總裁丹妮拉·阿莫代伊(Daniela Amodei)說:“我認(rèn)為,,如今沒有一種模型不會(huì)產(chǎn)生幻覺,?!?/p>
阿莫代伊表示:“實(shí)際上,它們的設(shè)計(jì)初衷只是用來預(yù)測下一個(gè)單詞,。因此,,模型在某些情況下會(huì)出現(xiàn)失誤?!?/p>
Anthropic,、ChatGPT 制造商 OpenAI 和其他被稱為大型語言模型的人工智能系統(tǒng)的主要開發(fā)商表示,他們正在努力使這些模型變得更準(zhǔn)確,。
至于這需要多長時(shí)間,,以及它們是否能做到準(zhǔn)確無誤地提供醫(yī)療建議,還有待觀察,。
語言學(xué)教授,、華盛頓大學(xué)(University of Washington)計(jì)算語言學(xué)實(shí)驗(yàn)室主任艾米麗·本德(Emily Bender)說:"這是無法解決的,是技術(shù)與擬議用例不匹配的通病,?!?/p>
生成式人工智能技術(shù)的可靠性至關(guān)重要。麥肯錫全球研究院(McKinsey Global Institute)預(yù)計(jì),,這將為全球經(jīng)濟(jì)帶來相當(dāng)于2.6萬億至4.4萬億美元的收入,。聊天機(jī)器人引爆一波熱潮,可以生成新圖像,、視頻,、音樂和計(jì)算機(jī)代碼等的技術(shù)也掀起了熱潮。幾乎所有的工具都包含一些語言組件,。
谷歌(Google)已經(jīng)在向新聞機(jī)構(gòu)推銷一款新聞寫作人工智能產(chǎn)品,。對新聞機(jī)構(gòu)來說,準(zhǔn)確性至關(guān)重要,。作為與OpenAI合作的一部分,,美聯(lián)社(The Associated Press)也在探索使用這項(xiàng)技術(shù),而OpenAI正在付費(fèi)使用美聯(lián)社的部分存檔文本來改進(jìn)其人工智能系統(tǒng),。
計(jì)算機(jī)科學(xué)家加內(nèi)什·巴格勒(Ganesh Bagler)與印度的酒店管理機(jī)構(gòu)合作,,多年來一直致力于讓人工智能系統(tǒng)(包括 ChatGPT 的前身)發(fā)明南亞菜肴的食譜,比如新式印度比爾亞尼菜(以米飯為主),。一種“令人產(chǎn)生幻覺”的配料就可能決定菜肴美味與否,。
今年 6 月,,OpenAI 首席執(zhí)行官山姆·奧特曼訪問印度時(shí),,一位德里英德拉普拉斯塔信息技術(shù)研究所(Indraprastha Institute of Information Technology Delhi)的教授提出了一些尖銳的問題。
“我想ChatGPT產(chǎn)生幻覺仍然是可以接受的,,但當(dāng)食譜出現(xiàn)幻覺時(shí),,問題就嚴(yán)重了,。”巴格勒在這位美國科技高管的全球之行新德里站上,,在擁擠的校園禮堂里站起來對奧特曼說道,。
“你怎么看待這個(gè)問題?”巴格勒最后問道。
即使沒有做出明確的承諾,,奧特曼也表達(dá)了樂觀的態(tài)度,。
奧特曼說:“我相信,用一年半到兩年的時(shí)間,,我們團(tuán)隊(duì)就能基本解決幻覺的問題,。大致如此。到那時(shí),,我們就無需討論這一問題了,。創(chuàng)意和完全準(zhǔn)確之間存在微妙的平衡,模型需要學(xué)習(xí)在特定時(shí)間,,你需要的是哪一種類型的答案,。”
但對于一些研究這項(xiàng)技術(shù)的專家來說,,比如華盛頓大學(xué)的語言學(xué)家本德,,這些改進(jìn)還遠(yuǎn)遠(yuǎn)不夠。
本德將語言模型描述為根據(jù)訓(xùn)練語料,,“對不同詞形字符串的可能性進(jìn)行建?!钡南到y(tǒng)。
拼寫檢查器就是通過這樣的語言模型來檢查你是否打錯(cuò)字了,。本德說,,這樣的語言模型還能助力自動(dòng)翻譯和轉(zhuǎn)錄服務(wù),"使輸出結(jié)果看起來更像目標(biāo)語言中的典型文本",。許多人在編寫短信或電子郵件使用"自動(dòng)補(bǔ)全"功能時(shí),,都依賴這項(xiàng)技術(shù)的某個(gè)版本。
最新一批聊天機(jī)器人,,如ChatGPT,、Claude 2或谷歌的Bard,試圖通過生成全新的文本段落來將這一技術(shù)提高到新水平,,但本德表示,,它們?nèi)匀恢皇侵貜?fù)選擇字符串中最合理的下一個(gè)單詞。
當(dāng)用于生成文本時(shí),,語言模型“被設(shè)計(jì)為編造內(nèi)容,。這就是語言模型完成的所有任務(wù)?!北镜抡f,。他們擅長模仿各種寫作形式,,比如法律合同、電視劇本或十四行詩,。
本德說:“但由于它們只會(huì)編造內(nèi)容,,所以當(dāng)它們編造出來的文本恰好被解讀為內(nèi)容正確(我們認(rèn)為準(zhǔn)確無誤)時(shí),那只是偶然,。即使通過微調(diào),,使其在大多數(shù)情況下都是正確的,它們?nèi)匀粫?huì)出現(xiàn)失誤——而且很可能出現(xiàn)的情況是,,閱讀文本的人很難注意到這類錯(cuò)誤,,因?yàn)檫@類錯(cuò)誤更隱蔽?!?/p>
Jasper AI公司總裁謝恩·奧利克(Shane Orlick)說,,對于那些向 Jasper AI 尋求幫助撰寫宣傳文案的營銷公司來說,這些錯(cuò)誤并不是什么大問題,。
奧利克說:"幻覺實(shí)際上是一種額外的獎(jiǎng)勵(lì),。經(jīng)常有客戶告訴我們Jasper是如何提出創(chuàng)意的——Jasper是如何創(chuàng)作出他們想不到的故事或是從他們都想不到的角度切入的?!?/p>
這家總部位于德克薩斯州的初創(chuàng)公司與OpenAI,、Anthropic、谷歌或臉書(Facebook)母公司Meta等合作伙伴合作,,為客戶提供各種人工智能語言模型,,以滿足他們的需求。奧利克說,,該公司可能為關(guān)注準(zhǔn)確性的客戶提供Anthropic的模型,,而為關(guān)注其專有源數(shù)據(jù)安全性的客戶提供不同的模型。
奧利克說,,他知道幻覺不會(huì)輕易被修復(fù),。他寄希望于像谷歌這樣的公司投入大量精力和資源來解決這一問題,他表示谷歌的搜索引擎必須有"高標(biāo)準(zhǔn)的事實(shí)性內(nèi)容",。
"我認(rèn)為他們不得不解決這一問題,。"奧利克說。"他們必須解決這一問題,。我不知道它是否會(huì)變得完美,,但隨著時(shí)間的推移,它可能會(huì)日臻完善,?!?/p>
包括微軟(Microsoft)聯(lián)合創(chuàng)始人比爾·蓋茨(Bill Gates)在內(nèi)的科技樂觀主義者一直在預(yù)測樂觀的前景。
蓋茨在7月份的一篇博客文章中詳細(xì)闡述了他對人工智能社會(huì)風(fēng)險(xiǎn)的看法,,他說:“隨著時(shí)間的推移,,我很樂觀地認(rèn)為,人工智能模型能夠?qū)W會(huì)區(qū)分事實(shí)和虛構(gòu),?!?/p>
他引用了OpenAI 2022年的一篇論文,論證“在這方面大有可為”,。
但即使是奧特曼,,當(dāng)他推銷產(chǎn)品的各種用途時(shí),也不指望模型在為自己尋找信息時(shí)是可信的,。
“我可能是世界上最不相信ChatGPT給出的答案的人了,。”奧特曼在巴格勒所在的大學(xué)里對聽眾說,,引來一片笑聲,。(財(cái)富中文網(wǎng))
譯者:中慧言-王芳
Spend enough time with ChatGPT and other artificial intelligence chatbots and it doesn’t take long for them to spout falsehoods.
Described as hallucination, confabulation or just plain making things up, it’s now a problem for every business, organization and high school student trying to get a generative AI system to compose documents and get work done. Some are using it on tasks with the potential for high-stakes consequences, from psychotherapy to researching and writing legal briefs.
“I don’t think that there’s any model today that doesn’t suffer from some hallucination,” said Daniela Amodei, co-founder and president of Anthropic, maker of the chatbot Claude 2.
“They’re really just sort of designed to predict the next word,” Amodei said. “And so there will be some rate at which the model does that inaccurately.”
Anthropic, ChatGPT-maker OpenAI and other major developers of AI systems known as large language models say they’re working to make them more truthful.
How long that will take — and whether they will ever be good enough to, say, safely dole out medical advice — remains to be seen.
“This isn’t fixable,” said Emily Bender, a linguistics professor and director of the University of Washington’s Computational Linguistics Laboratory. “It’s inherent in the mismatch between the technology and the proposed use cases.”
A lot is riding on the reliability of generative AI technology. The McKinsey Global Institute projects it will add the equivalent of $2.6 trillion to $4.4 trillion to the global economy. Chatbots are only one part of that frenzy, which also includes technology that can generate new images, video, music and computer code. Nearly all of the tools include some language component.
Google is already pitching a news-writing AI product to news organizations, for which accuracy is paramount. The Associated Press is also exploring use of the technology as part of a partnership with OpenAI, which is paying to use part of AP’s text archive to improve its AI systems.
In partnership with India’s hotel management institutes, computer scientist Ganesh Bagler has been working for years to get AI systems, including a ChatGPT precursor, to invent recipes for South Asian cuisines, such as novel versions of rice-based biryani. A single “hallucinated” ingredient could be the difference between a tasty and inedible meal.
When Sam Altman, the CEO of OpenAI, visited India in June, the professor at the Indraprastha Institute of Information Technology Delhi had some pointed questions.
“I guess hallucinations in ChatGPT are still acceptable, but when a recipe comes out hallucinating, it becomes a serious problem,” Bagler said, standing up in a crowded campus auditorium to address Altman on the New Delhi stop of the U.S. tech executive’s world tour.
“What’s your take on it?” Bagler eventually asked.
Altman expressed optimism, if not an outright commitment.
“I think we will get the hallucination problem to a much, much better place,” Altman said. “I think it will take us a year and a half, two years. Something like that. But at that point we won’t still talk about these. There’s a balance between creativity and perfect accuracy, and the model will need to learn when you want one or the other.”
But for some experts who have studied the technology, such as University of Washington linguist Bender, those improvements won’t be enough.
Bender describes a language model as a system for “modeling the likelihood of different strings of word forms,” given some written data it’s been trained upon.
It’s how spell checkers are able to detect when you’ve typed the wrong word. It also helps power automatic translation and transcription services, “smoothing the output to look more like typical text in the target language,” Bender said. Many people rely on a version of this technology whenever they use the “autocomplete” feature when composing text messages or emails.
The latest crop of chatbots such as ChatGPT, Claude 2 or Google’s Bard try to take that to the next level, by generating entire new passages of text, but Bender said they’re still just repeatedly selecting the most plausible next word in a string.
When used to generate text, language models “are designed to make things up. That’s all they do,” Bender said. They are good at mimicking forms of writing, such as legal contracts, television scripts or sonnets.
“But since they only ever make things up, when the text they have extruded happens to be interpretable as something we deem correct, that is by chance,” Bender said. “Even if they can be tuned to be right more of the time, they will still have failure modes — and likely the failures will be in the cases where it’s harder for a person reading the text to notice, because they are more obscure.”
Those errors are not a huge problem for the marketing firms that have been turning to Jasper AI for help writing pitches, said the company’s president, Shane Orlick.
“Hallucinations are actually an added bonus,” Orlick said. “We have customers all the time that tell us how it came up with ideas — how Jasper created takes on stories or angles that they would have never thought of themselves.”
The Texas-based startup works with partners like OpenAI, Anthropic, Google or Facebook parent Meta to offer its customers a smorgasbord of AI language models tailored to their needs. For someone concerned about accuracy, it might offer up Anthropic’s model, while someone concerned with the security of their proprietary source data might get a different model, Orlick said.
Orlick said he knows hallucinations won’t be easily fixed. He’s counting on companies like Google, which he says must have a “really high standard of factual content” for its search engine, to put a lot of energy and resources into solutions.
“I think they have to fix this problem,” Orlick said. “They’ve got to address this. So I don’t know if it’s ever going to be perfect, but it’ll probably just continue to get better and better over time.”
Techno-optimists, including Microsoft co-founder Bill Gates, have been forecasting a rosy outlook.
“I’m optimistic that, over time, AI models can be taught to distinguish fact from fiction,” Gates said in a July blog post detailing his thoughts on AI’s societal risks.
He cited a 2022 paper from OpenAI as an example of “promising work on this front.”
But even Altman, as he markets the products for a variety of uses, doesn’t count on the models to be truthful when he’s looking for information for himself.
“I probably trust the answers that come out of ChatGPT the least of anybody on Earth,” Altman told the crowd at Bagler’s university, to laughter.