芝加哥城市新聞署(City News Bureau of Chicago)是一家目前已經倒閉的新聞機構,,曾經被譽為培訓意志堅定的實地報道記者的傳奇基地,該機構有一句著名的非官方格言:“如果你的母親說她愛你,,那也得去核實一下,。”多虧了ChatGPT,、新版必應搜索(Bing Search),、Bard和大量基于大型語言模型的山寨搜索聊天機器人的出現(xiàn),我們不得不奉行該機構的古老信條,。
研究人員已經知道,,對于搜索查詢或任何基于事實的請求來說,大型語言模型訓練出來的引擎都遠非完美,,因為這樣的引擎傾向于編造事物(人工智能研究人員稱之為“幻覺”現(xiàn)象),。但科技公司巨頭認為,可以進行對話的用戶界面帶來的“利”大于“弊”(提供的信息不準確或是提供了錯誤信息),,這些大型語言模型能夠執(zhí)行大量從翻譯到做總結的自然語言相關任務,,還可以將這些模型與其他軟件工具結合起來執(zhí)行任務(無論是進行搜索還是預訂劇院門票)。
當然,,當這些系統(tǒng)產生幻覺時,,可能會造成真正的損害——甚至當它們沒有產生幻覺時,只是從訓練數據中學習了一些與事實有出入的東西,,也會造成真正的損害,。Stack Overflow不得不禁止用戶提交使用ChatGPT生成的編碼,,因為該網站上充斥著看似合理但實則錯誤的代碼,。科幻雜志《克拉克世界》(Clarkesworld)不得不停止接受投稿,,因為很多人提交的故事并不是他們自己創(chuàng)作的,,而是ChatGPT創(chuàng)作的。一家名為OpenCage的德國公司提供能夠進行地理編碼的應用程序接口,,該接口可以將物理地址轉換為能夠標記在地圖上的經緯度坐標,。該公司表示,,由于ChatGPT的推薦出錯(將其應用程序接口作為一種僅根據號碼就可以查找手機位置的方法做了推薦),他們不得不應對越來越多大失所望的注冊用戶,。ChatGPT甚至還幫助用戶編寫了python代碼,,允許他們?yōu)榇四康恼{用OpenCage的應用程序接口。
但是,,正如OpenCage被迫在一篇博文中解釋的那樣,,這不是它提供的服務,也不是使用該公司的技術能夠實現(xiàn)的,。OpenCage表示,,ChatGPT之所以有這樣錯誤的想法,是因為它從YouTube的視頻教程中學習了相關內容,,有人聲稱OpenCage的應用程序接口可以用于反向推斷手機地理定位,,其實這種說法是錯誤的。但是,,那些教程只說服了少數人注冊OpenCage的應用程序接口,,而ChatGPT卻促使人們成群結隊地注冊OpenCage。OpenCage寫道:“關鍵的區(qū)別在于,,人們在接受他人的建議時持懷疑態(tài)度,,例如在視頻編碼教程學習時,人們也會持懷疑態(tài)度,。但在人工智能或ChatGPT方面,,我們似乎還沒有把這一點內化于心。我想我們最好把這一點內化于心,,保持適當的懷疑態(tài)度,。”
與此同時,,在一系列關于其基于OpenAI的新版必應聊天功能的陰暗面的報道引發(fā)人們擔憂后——聊天機器人自稱希德尼,,變得很暴躁,有時甚至充滿敵意,,極具威脅性——微軟(Microsoft)決定限制用戶與必應聊天機器人的對話長度,。但正如我和其他許多人所發(fā)現(xiàn)的那樣,顯而易見的是,,雖然這種對對話長度的隨意限制讓新版必應的聊天功能更安全,,但也讓它的功能大打折扣。
比如,,我向必應聊天詢問了計劃去希臘旅行的問題,。我正試圖讓它為建議的行程提供詳細的時間安排和航班選擇時,這時突然彈出“哎呀,我們的對話到此結束嘍,。如果你還想繼續(xù)和我聊天的話,,就請點擊‘新話題’!”
長度限制顯然是微軟被迫給出的“克魯格”(不夠精巧,,但還能夠應付要求的解決方案),,因為它一開始就沒有對其新產品進行足夠嚴格的測試。關于Prometheus(微軟對新版必應模型的命名)究竟是什么,,以及它究竟有什么功能,,還有很多亟待解決的問題(沒有人聲稱新版必應有感知能力或自我意識,但新版必應出現(xiàn)了一些非常奇怪的突現(xiàn)行為,,甚至超出了希德尼人格的范疇,,微軟應該就此事做出解釋,而不是假裝它不存在),。微軟在公開場合對它和OpenAI如何創(chuàng)建了這個模型諱莫如深,。除了微軟之外,沒有人確切地知道為什么新版必應聊天機器人傾向于扮演暴躁的希德尼的角色,,而當ChatGPT基于一個更小,、功能更弱的大型語言模型時,它似乎表現(xiàn)得好得多——而且,,微軟對它已知的事情也是三緘其口,。
[OpenAI的早期研究發(fā)現(xiàn),通常情況下,,用更高質量的數據訓練出來的較小模型會給出人類用戶更喜歡的答案,,盡管在一些基準測試中,它們的表現(xiàn)不如大模型,。這導致一些人猜測Prometheus是OpenAI的GPT-4,,該模型被認為比之前推出的任何模型都要大很多倍。但如果是這樣的話,,微軟為什么選擇使用GPT-4,,而不是一個更小但性能更好的系統(tǒng)來支持新版必應,這是真正的問題所在,。坦率地說,,另外一個問題是,如果OpenAI實際上意識到新版必應聊天機器人很有可能讓用戶感到不安,,那么為什么它會建議微軟使用更強大的模型呢,?微軟的研究人員可能和許多人工智能研究人員前輩一樣,被領先的基準性能蒙蔽了雙眼(他們可以向其他人工智能開發(fā)人員炫耀這些性能),,但這些性能本身卻是非常差的指標,,并不能代表人類用戶的需求。]
可以肯定的是,,如果微軟不盡快解決這個問題,,如果其他公司,例如谷歌(正在努力完善其即將推出的搜索聊天機器人),,或者包括Perplexity和You.com等創(chuàng)業(yè)公司在內的任何一家(已經推出了自己的聊天機器人)表明他們的聊天機器人能夠進行長時間對話,,而且也不會變身達米安這樣的人格,那么微軟就有可能在新的搜索引擎之爭中失去其先發(fā)優(yōu)勢,。
同時,,讓我們花點時間來感受一下這樣的反諷,微軟,,一家曾經以自己是最負責任的大型科技公司而自豪的公司(不無道理),,現(xiàn)在卻讓我們重回早期社交媒體時代“快速行動,打破陳例”的艱難往昔——可能后果更糟,。(但我猜,,當你的首席執(zhí)行官癡迷于讓他的主要競爭對手“跳舞”時,樂隊里的樂手們很難反駁說,,也許他們不應該現(xiàn)在就開始演奏這首曲子,。)除了OpenCage、《克拉克世界》和Stack Overflow之外,,人們還可能因為錯誤的用藥建議而導致嚴重后果,,因為類似希德尼的虐待行為導致某人自殘或自殺,或者因為強化可憎的刻板印象和措辭而受到傷害,。
我以前說過這一點,,但我要再強調一遍:鑒于這些潛在的威脅,現(xiàn)在是時候讓政府介入,,就如何構建和部署系統(tǒng)制定明確的規(guī)定,。基于風險的方法是起點,,比如歐盟(European Union)的人工智能法案提案(A.I. Act)的最初草案中提出的想法,。但風險的定義和評估不應該完全由公司自己來決定。如果沒有特定的標準,,就需要有明確的外部標準和相應的問責制度,。(財富中文網)
譯者:中慧言-王芳
芝加哥城市新聞署(City News Bureau of Chicago)是一家目前已經倒閉的新聞機構,曾經被譽為培訓意志堅定的實地報道記者的傳奇基地,,該機構有一句著名的非官方格言:“如果你的母親說她愛你,,那也得去核實一下?!倍嗵澚薈hatGPT,、新版必應搜索(Bing Search)、Bard和大量基于大型語言模型的山寨搜索聊天機器人的出現(xiàn),我們不得不奉行該機構的古老信條,。
研究人員已經知道,,對于搜索查詢或任何基于事實的請求來說,大型語言模型訓練出來的引擎都遠非完美,,因為這樣的引擎傾向于編造事物(人工智能研究人員稱之為“幻覺”現(xiàn)象),。但科技公司巨頭認為,可以進行對話的用戶界面帶來的“利”大于“弊”(提供的信息不準確或是提供了錯誤信息),,這些大型語言模型能夠執(zhí)行大量從翻譯到做總結的自然語言相關任務,,還可以將這些模型與其他軟件工具結合起來執(zhí)行任務(無論是進行搜索還是預訂劇院門票)。
當然,,當這些系統(tǒng)產生幻覺時,,可能會造成真正的損害——甚至當它們沒有產生幻覺時,只是從訓練數據中學習了一些與事實有出入的東西,,也會造成真正的損害,。Stack Overflow不得不禁止用戶提交使用ChatGPT生成的編碼,因為該網站上充斥著看似合理但實則錯誤的代碼,??苹秒s志《克拉克世界》(Clarkesworld)不得不停止接受投稿,因為很多人提交的故事并不是他們自己創(chuàng)作的,,而是ChatGPT創(chuàng)作的,。一家名為OpenCage的德國公司提供能夠進行地理編碼的應用程序接口,該接口可以將物理地址轉換為能夠標記在地圖上的經緯度坐標,。該公司表示,,由于ChatGPT的推薦出錯(將其應用程序接口作為一種僅根據號碼就可以查找手機位置的方法做了推薦),他們不得不應對越來越多大失所望的注冊用戶,。ChatGPT甚至還幫助用戶編寫了python代碼,,允許他們?yōu)榇四康恼{用OpenCage的應用程序接口。
但是,,正如OpenCage被迫在一篇博文中解釋的那樣,,這不是它提供的服務,也不是使用該公司的技術能夠實現(xiàn)的,。OpenCage表示,,ChatGPT之所以有這樣錯誤的想法,是因為它從YouTube的視頻教程中學習了相關內容,,有人聲稱OpenCage的應用程序接口可以用于反向推斷手機地理定位,,其實這種說法是錯誤的。但是,,那些教程只說服了少數人注冊OpenCage的應用程序接口,,而ChatGPT卻促使人們成群結隊地注冊OpenCage,。OpenCage寫道:“關鍵的區(qū)別在于,人們在接受他人的建議時持懷疑態(tài)度,,例如在視頻編碼教程學習時,,人們也會持懷疑態(tài)度。但在人工智能或ChatGPT方面,,我們似乎還沒有把這一點內化于心,。我想我們最好把這一點內化于心,,保持適當的懷疑態(tài)度,。”
與此同時,,在一系列關于其基于OpenAI的新版必應聊天功能的陰暗面的報道引發(fā)人們擔憂后——聊天機器人自稱希德尼,,變得很暴躁,,有時甚至充滿敵意,,極具威脅性——微軟(Microsoft)決定限制用戶與必應聊天機器人的對話長度,。但正如我和其他許多人所發(fā)現(xiàn)的那樣,,顯而易見的是,,雖然這種對對話長度的隨意限制讓新版必應的聊天功能更安全,,但也讓它的功能大打折扣,。
比如,,我向必應聊天詢問了計劃去希臘旅行的問題,。我正試圖讓它為建議的行程提供詳細的時間安排和航班選擇時,,這時突然彈出“哎呀,我們的對話到此結束嘍,。如果你還想繼續(xù)和我聊天的話,,就請點擊‘新話題’!”
長度限制顯然是微軟被迫給出的“克魯格”(不夠精巧,,但還能夠應付要求的解決方案),,因為它一開始就沒有對其新產品進行足夠嚴格的測試。關于Prometheus(微軟對新版必應模型的命名)究竟是什么,,以及它究竟有什么功能,,還有很多亟待解決的問題(沒有人聲稱新版必應有感知能力或自我意識,但新版必應出現(xiàn)了一些非常奇怪的突現(xiàn)行為,,甚至超出了希德尼人格的范疇,,微軟應該就此事做出解釋,而不是假裝它不存在),。微軟在公開場合對它和OpenAI如何創(chuàng)建了這個模型諱莫如深,。除了微軟之外,沒有人確切地知道為什么新版必應聊天機器人傾向于扮演暴躁的希德尼的角色,,而當ChatGPT基于一個更小,、功能更弱的大型語言模型時,,它似乎表現(xiàn)得好得多——而且,微軟對它已知的事情也是三緘其口,。
[OpenAI的早期研究發(fā)現(xiàn),,通常情況下,用更高質量的數據訓練出來的較小模型會給出人類用戶更喜歡的答案,,盡管在一些基準測試中,,它們的表現(xiàn)不如大模型。這導致一些人猜測Prometheus是OpenAI的GPT-4,,該模型被認為比之前推出的任何模型都要大很多倍,。但如果是這樣的話,微軟為什么選擇使用GPT-4,,而不是一個更小但性能更好的系統(tǒng)來支持新版必應,,這是真正的問題所在。坦率地說,,另外一個問題是,,如果OpenAI實際上意識到新版必應聊天機器人很有可能讓用戶感到不安,那么為什么它會建議微軟使用更強大的模型呢,?微軟的研究人員可能和許多人工智能研究人員前輩一樣,,被領先的基準性能蒙蔽了雙眼(他們可以向其他人工智能開發(fā)人員炫耀這些性能),但這些性能本身卻是非常差的指標,,并不能代表人類用戶的需求,。]
可以肯定的是,如果微軟不盡快解決這個問題,,如果其他公司,,例如谷歌(正在努力完善其即將推出的搜索聊天機器人),或者包括Perplexity和You.com等創(chuàng)業(yè)公司在內的任何一家(已經推出了自己的聊天機器人)表明他們的聊天機器人能夠進行長時間對話,,而且也不會變身達米安這樣的人格,,那么微軟就有可能在新的搜索引擎之爭中失去其先發(fā)優(yōu)勢。
同時,,讓我們花點時間來感受一下這樣的反諷,,微軟,一家曾經以自己是最負責任的大型科技公司而自豪的公司(不無道理),,現(xiàn)在卻讓我們重回早期社交媒體時代“快速行動,,打破陳例”的艱難往昔——可能后果更糟。(但我猜,,當你的首席執(zhí)行官癡迷于讓他的主要競爭對手“跳舞”時,,樂隊里的樂手們很難反駁說,也許他們不應該現(xiàn)在就開始演奏這首曲子,。)除了OpenCage,、《克拉克世界》和Stack Overflow之外,,人們還可能因為錯誤的用藥建議而導致嚴重后果,因為類似希德尼的虐待行為導致某人自殘或自殺,,或者因為強化可憎的刻板印象和措辭而受到傷害,。
我以前說過這一點,但我要再強調一遍:鑒于這些潛在的威脅,,現(xiàn)在是時候讓政府介入,,就如何構建和部署系統(tǒng)制定明確的規(guī)定?;陲L險的方法是起點,,比如歐盟(European Union)的人工智能法案提案(A.I. Act)的最初草案中提出的想法。但風險的定義和評估不應該完全由公司自己來決定,。如果沒有特定的標準,,就需要有明確的外部標準和相應的問責制度,。(財富中文網)
譯者:中慧言-王芳
City News Bureau of Chicago, a now-defunct news outfit once legendary as a training ground for tough-as-nails, shoe-leather reporters, famously had as its unofficial motto: “If your mother says she loves you, check it out.” Thanks to the advent of ChatGPT, the new Bing Search, Bard, and a host of copycat search chatbots based on large language models, we are all going to have to start living by City News’ old shibboleth.
Researchers already knew that large language models were imperfect engines for search queries, or any fact-based request really, because of their tendency to make stuff up (a phenomenon A.I. researchers call “hallucination”). But the world’s largest technology companies have decided that the appeal of dialogue as a user interface—and the ability of these large language models to perform a vast array of natural language-based tasks, from translation to summarization, along with the potential to couple these models with access to other software tools that will enable them to perform tasks (whether it is running a search or booking you theater tickets)—trumps the potential downsides of inaccuracy and misinformation.
Except, of course, there can be real victims when these systems hallucinate—or even when they don’t, but merely pick up something that is factually wrong from their training data. Stack Overflow had to ban users from submitting answers to coding questions that were produced using ChatGPT after the site was flooded with code that looked plausible but was incorrect. The science fiction magazine Clarkesworld had to stop taking submissions because so many people were submitting stories crafted not by their own creative genius, but by ChatGPT. Now a German company called OpenCage—which offers an application programming interface that does geocoding, converting physical addresses into latitude and longitude coordinates that can be placed on a map—has said it has been dealing with a growing number of disappointed users who have signed up for its service because ChatGPT erroneously recommended its API as a way to look up the location of a mobile phone based solely on the number. ChatGPT even helpfully wrote python code for users allowing them to call on OpenCage’s API for this purpose.
But, as OpenCage was forced to explain in a blog post, this is not a service it offers, nor one that is even feasible using the company’s technology. OpenCage says that ChatGPT seems to have developed this erroneous belief because it picked up on YouTube tutorials in which people also wrongly claimed OpenCage’s API could be used for reverse mobile phone geolocation. But whereas those erroneous YouTube tutorials only convinced a few people to sign up for OpenCage’s API, ChatGPT has driven people to OpenCage in droves. “The key difference is that humans have learned to be skeptical when getting advice from other humans, for example via a video coding tutorial,” OpenCage wrote. “It seems though that we haven’t yet fully internalized this when it comes to AI in general or ChatGPT specifically.” I guess we better start internalizing.
Meanwhile, after a slew of alarming publicity about the dark side of its new, OpenAI-powered Bing chat feature—where the chatbot calls itself Sydney, becomes petulant, and at times even downright hostile and menacing—Microsoft has decided to restrict the length of conversations users can have with Bing chat. But as I, and many others have found, while this arbitrary restriction on the length of a dialogue apparently makes the new Bing chat safer to use, it also makes it a heck of a lot less useful.
For instance, I asked Bing chat about planning a trip to Greece. I was in the process of trying to get it to detail timings and flight options for an itinerary it had suggested when I suddenly hit the “Oops, I think we’ve reached the end of this conversation. Click ‘New topic,’ if you would!”
The length restriction is clearly a kluge that Microsoft has been forced to implement because it didn’t do rigorous enough testing of its new product in the first place. And there are huge outstanding questions about exactly what Prometheus, the name Microsoft has given to the model that powers the new Bing, really is, and what it is really capable of (no one is claiming the new Bing is sentient or self-aware, but there’s been some very bizarre emergent behavior documented with the new Bing, even beyond the Sydney personality, and Microsoft ought to be transparent about what it understands and doesn’t understand about this behavior, rather than simply pretending it doesn’t exist). Microsoft has been cagey in public about how it and OpenAI created this model. No one outside of Microsoft is exactly sure why it is so prone to taking on the petulant Sydney persona, especially when ChatGPT, based on a smaller, less capable large language model, seems so much better behaved—and again, Microsoft is saying very little about what it does know.
(Earlier research from OpenAI had found that it was often the case that smaller models, trained with better quality data, produced results that human users much preferred even though they were less capable when measured on a number of benchmark tests than larger models. That has led some to speculate that Prometheus is OpenAI’s GPT-4, a model believed to be many times more massive than any it has previously debuted. But if that is the case, there is still a real question about why Microsoft opted to use GPT-4 rather than a smaller, but better-behaved system to power the new Bing. And frankly, there is also a real question about why OpenAI might have encouraged Microsoft to use the more powerful model if it in fact realized it had more potential to behave in ways that users might find disturbing. The Microsoft folks may have, like many A.I. researchers before them, become blinded by stellar benchmark performance that can convey bragging rights among other A.I. developers, but which are a poor proxy for what real human users want.)
What is certain is that if Microsoft doesn’t fix this soon—and if someone else, such as Google, which is hard at work trying to hone its search chatbot for imminent release, or any of the others, including startups such as Perplexity and You.com, that have debuted their own chatbots, shows that their chatbot can hold long dialogues without it turning into Damien—then Microsoft risks losing its first mover advantage in the new search wars.
Also, let’s just take a moment to appreciate the irony that it’s Microsoft, a company that once prided itself, not without reason, on being among the most responsible of the big technology companies, which has now tossed us all back to the bad old “move fast and break things” days of the early social media era—with perhaps even worse consequences. (But I guess when your CEO is obsessed with making his arch-rival “dance” it is hard for the musicians in the band to argue that maybe they shouldn’t be striking up the tune just yet.) Beyond OpenCage, Clarkesworld, and Stack Overflow, people could get hurt from incorrect advice on medicines, from abusive Sydney-like behavior that drives someone to self-harm or suicide, or from reinforcement of hateful stereotypes and tropes.
I’ve said this before, but I’ll say it again: Given these potential harms, now is the time for governments to step in and lay down some clear regulation about how these systems need to be built and deployed. The idea of a risk-based approach, such as that broached in the original draft of the European Union’s proposed A.I. Act, is a potential starting point. But the definitions of risk and those risk assessments should not be left entirely up to the companies themselves. There need to be clear external standards and clear accountability if those standards aren’t meant.