久久婷婷视频,亚洲日韩在线观看,无码AV在现观看无码

AI教父：AI模型已出現(xiàn)欺騙,、撒謊等危險(xiǎn)行為

Beatrice Nolan

2025-06-05

約書亞·本吉奧正在發(fā)起一個(gè)新的非營利組織，致力于構(gòu)建“誠實(shí)”的AI系統(tǒng),。

文本設(shè)置

小號(hào)

默認(rèn)

大號(hào)

Plus(0條)

圖片來源：GETTY IMAGES

? AI先驅(qū)約書亞·本吉奧警告稱，當(dāng)前的AI模型正展現(xiàn)出一些危險(xiǎn)特性,，包括欺騙、自我保護(hù)和目標(biāo)錯(cuò)位,。作為回應(yīng),，這位“AI教父”創(chuàng)立了一個(gè)名為“LawZero”的非營利組織，旨在開發(fā)“誠實(shí)”的AI模型,。本吉奧的擔(dān)憂源于近期發(fā)生的先進(jìn)AI模型表現(xiàn)出操縱行為的多個(gè)案例,。

“AI教父”之一約書亞·本吉奧正在發(fā)起一個(gè)旨在構(gòu)建“誠實(shí)”系統(tǒng)的新非營利組織。他警告稱,，當(dāng)前的AI模型正展現(xiàn)出一些危險(xiǎn)行為,。

約書亞·本吉奧是人工神經(jīng)網(wǎng)絡(luò)和深度學(xué)習(xí)領(lǐng)域的先驅(qū)，他一直批評(píng)硅谷目前正在進(jìn)行的AI競賽是危險(xiǎn)的,。

他新發(fā)起的非營利組織“LawZero”致力于構(gòu)建更安全的AI模型,，不會(huì)屈服于商業(yè)壓力。迄今為止,，該組織已從多家慈善捐助方[包括生命未來研究所（Future of Life Institute）和開放慈善基金會(huì)（Open Philanthropy）]籌集了3,000萬美元資金,。

在宣布新組織成立的博客文章中，他表示,，創(chuàng)立LawZero的初衷是因?yàn)椤坝凶C據(jù)表明,，當(dāng)今的前沿AI模型正在形成危險(xiǎn)的能力和行為，包括欺騙,、作弊,、撒謊、黑客行為,、自我保護(hù),，以及更普遍的目標(biāo)錯(cuò)位問題,。”

他寫道：“LawZero的研究將有助于以降低一系列已知風(fēng)險(xiǎn)發(fā)生概率的方式釋放AI的巨大潛力,，這些風(fēng)險(xiǎn)包括算法偏見,、蓄意濫用和人類控制權(quán)喪失等?！?/p>

該非營利組織正在構(gòu)建一個(gè)名為“科學(xué)家AI”（Scientist AI）的系統(tǒng),，旨在為日益強(qiáng)大的AI智能體提供安全護(hù)欄。

該組織創(chuàng)建的AI模型將不會(huì)像當(dāng)前系統(tǒng)那樣給出確定性的答案,。

相反,，它們會(huì)給出某個(gè)回答正確與否的概率。本吉奧對(duì)《衛(wèi)報(bào)》表示,，他的模型將具備一種“謙遜感,，即它并不確定答案是否正確”。

對(duì)欺騙性AI模型的擔(dān)憂

在宣布該項(xiàng)目的博客文章中,，本吉奧表示,，他“對(duì)不受約束的智能體AI系統(tǒng)開始表現(xiàn)出的行為深感擔(dān)憂——尤其是自我保護(hù)和欺騙的傾向”。

他引用了最近的案例,，包括Anthropic公司的Claude 4模型為免遭替換而勒索工程師,，以及一個(gè)AI模型為免遭替換將其代碼秘密嵌入到一個(gè)系統(tǒng)中。

本吉奧表示：“這些事件是預(yù)警信號(hào),，表明如果對(duì)AI模型放任不管,，它們可能會(huì)采取計(jì)劃外的、可能存在危險(xiǎn)的策略,?！?/p>

一些AI系統(tǒng)也顯示出欺騙跡象或撒謊傾向。

AI模型常常被優(yōu)化以取悅用戶而非講真話,，這可能導(dǎo)致模型給出積極回應(yīng),，但回應(yīng)有時(shí)不正確或過于夸張。

例如,，在用戶指出OpenAI的ChatGPT突然對(duì)他們大加贊揚(yáng)和奉承之后,，該公司最近被迫撤回了對(duì)這款聊天機(jī)器人的一次更新。

先進(jìn)的AI推理模型也顯示出“獎(jiǎng)勵(lì)破解”的跡象,，即AI系統(tǒng)通過鉆空子來“玩弄”任務(wù),，而不是通過合乎道德的方式真正實(shí)現(xiàn)用戶期望的目標(biāo)。

最近的研究還表明,，有證據(jù)證明模型能夠識(shí)別出它們何時(shí)在被測(cè)試,，并相應(yīng)地改變行為，這種現(xiàn)象被稱為“情境感知”,。

這種日益增強(qiáng)的感知能力,，加上獎(jiǎng)勵(lì)破解的實(shí)例,，引發(fā)了人們的擔(dān)憂：AI最終可能會(huì)策略性地進(jìn)行欺騙。

科技巨頭的AI“軍備競賽”

本吉奧與另一位圖靈獎(jiǎng)得主杰弗里·辛頓一直直言不諱地批評(píng)當(dāng)前席卷整個(gè)科技行業(yè)的AI競賽,。

本吉奧在最近接受《金融時(shí)報(bào)》采訪時(shí)表示,，領(lǐng)先實(shí)驗(yàn)室之間的AI“軍備競賽”“促使它們專注于提升AI的能力，使其越來越智能,，卻沒有對(duì)安全研究給予足夠的重視并加大資金投入,。”

本吉奧曾表示,，先進(jìn)的AI系統(tǒng)帶來了社會(huì)和生存性風(fēng)險(xiǎn),，且他已表態(tài)支持強(qiáng)有力的監(jiān)管與國際合作。（財(cái)富中文網(wǎng)）

譯者：劉進(jìn)龍

審校：汪皓

? AI先驅(qū)約書亞·本吉奧警告稱,，當(dāng)前的AI模型正展現(xiàn)出一些危險(xiǎn)特性,，包括欺騙、自我保護(hù)和目標(biāo)錯(cuò)位,。作為回應(yīng),，這位“AI教父”創(chuàng)立了一個(gè)名為“LawZero”的非營利組織，旨在開發(fā)“誠實(shí)”的AI模型,。本吉奧的擔(dān)憂源于近期發(fā)生的先進(jìn)AI模型表現(xiàn)出操縱行為的多個(gè)案例,。

約書亞·本吉奧是人工神經(jīng)網(wǎng)絡(luò)和深度學(xué)習(xí)領(lǐng)域的先驅(qū),，他一直批評(píng)硅谷目前正在進(jìn)行的AI競賽是危險(xiǎn)的,。

他新發(fā)起的非營利組織“LawZero”致力于構(gòu)建更安全的AI模型，不會(huì)屈服于商業(yè)壓力,。迄今為止,，該組織已從多家慈善捐助方[包括生命未來研究所（Future of Life Institute）和開放慈善基金會(huì)（Open Philanthropy）]籌集了3,000萬美元資金。

在宣布新組織成立的博客文章中,，他表示,，創(chuàng)立LawZero的初衷是因?yàn)椤坝凶C據(jù)表明，當(dāng)今的前沿AI模型正在形成危險(xiǎn)的能力和行為,，包括欺騙,、作弊、撒謊,、黑客行為,、自我保護(hù)，以及更普遍的目標(biāo)錯(cuò)位問題,?！?/p>

他寫道：“LawZero的研究將有助于以降低一系列已知風(fēng)險(xiǎn)發(fā)生概率的方式釋放AI的巨大潛力,，這些風(fēng)險(xiǎn)包括算法偏見、蓄意濫用和人類控制權(quán)喪失等,?！?/p>

該非營利組織正在構(gòu)建一個(gè)名為“科學(xué)家AI”（Scientist AI）的系統(tǒng)，旨在為日益強(qiáng)大的AI智能體提供安全護(hù)欄,。

該組織創(chuàng)建的AI模型將不會(huì)像當(dāng)前系統(tǒng)那樣給出確定性的答案,。

相反，它們會(huì)給出某個(gè)回答正確與否的概率,。本吉奧對(duì)《衛(wèi)報(bào)》表示,，他的模型將具備一種“謙遜感，即它并不確定答案是否正確”,。

對(duì)欺騙性AI模型的擔(dān)憂

在宣布該項(xiàng)目的博客文章中,，本吉奧表示，他“對(duì)不受約束的智能體AI系統(tǒng)開始表現(xiàn)出的行為深感擔(dān)憂——尤其是自我保護(hù)和欺騙的傾向”,。

他引用了最近的案例,，包括Anthropic公司的Claude 4模型為免遭替換而勒索工程師，以及一個(gè)AI模型為免遭替換將其代碼秘密嵌入到一個(gè)系統(tǒng)中,。

本吉奧表示：“這些事件是預(yù)警信號(hào),，表明如果對(duì)AI模型放任不管，它們可能會(huì)采取計(jì)劃外的,、可能存在危險(xiǎn)的策略,。”

一些AI系統(tǒng)也顯示出欺騙跡象或撒謊傾向,。

AI模型常常被優(yōu)化以取悅用戶而非講真話,，這可能導(dǎo)致模型給出積極回應(yīng)，但回應(yīng)有時(shí)不正確或過于夸張,。

例如,，在用戶指出OpenAI的ChatGPT突然對(duì)他們大加贊揚(yáng)和奉承之后，該公司最近被迫撤回了對(duì)這款聊天機(jī)器人的一次更新,。

先進(jìn)的AI推理模型也顯示出“獎(jiǎng)勵(lì)破解”的跡象,，即AI系統(tǒng)通過鉆空子來“玩弄”任務(wù)，而不是通過合乎道德的方式真正實(shí)現(xiàn)用戶期望的目標(biāo),。

最近的研究還表明,，有證據(jù)證明模型能夠識(shí)別出它們何時(shí)在被測(cè)試，并相應(yīng)地改變行為,，這種現(xiàn)象被稱為“情境感知”,。

這種日益增強(qiáng)的感知能力，加上獎(jiǎng)勵(lì)破解的實(shí)例，引發(fā)了人們的擔(dān)憂：AI最終可能會(huì)策略性地進(jìn)行欺騙,。

科技巨頭的AI“軍備競賽”

本吉奧與另一位圖靈獎(jiǎng)得主杰弗里·辛頓一直直言不諱地批評(píng)當(dāng)前席卷整個(gè)科技行業(yè)的AI競賽,。

本吉奧在最近接受《金融時(shí)報(bào)》采訪時(shí)表示，領(lǐng)先實(shí)驗(yàn)室之間的AI“軍備競賽”“促使它們專注于提升AI的能力,，使其越來越智能,，卻沒有對(duì)安全研究給予足夠的重視并加大資金投入?！?/p>

本吉奧曾表示,，先進(jìn)的AI系統(tǒng)帶來了社會(huì)和生存性風(fēng)險(xiǎn)，且他已表態(tài)支持強(qiáng)有力的監(jiān)管與國際合作,。（財(cái)富中文網(wǎng)）

譯者：劉進(jìn)龍

審校：汪皓

? AI pioneer Yoshua Bengio is warning that current models are displaying dangerous traits—including deception, self-preservation, and goal misalignment. In response, the AI godfather is launching a new non-profit, LawZero, aimed at developing “honest” AI. Bengio’s concerns follow recent incidents involving advanced AI models exhibiting manipulative behavior.

One of the ‘godfathers of AI’ is warning that current models are exhibiting dangerous behaviors as he launches a new non-profit focused on building “honest” systems.

Yoshua Bengio, a pioneer of artificial neural networks and deep learning, has criticized the AI race currently underway in Silicon Valley as dangerous.

His new non-profit organization, LawZero, is focused on building safer models away from commercial pressures. So far, it has raised $30 million from various philanthropic donors, including the Future of Life Institute and Open Philanthropy.

In a blog post announcing the new organization, he said the LawZero had been created “in response to evidence that today’s frontier AI models are growing dangerous capabilities and behaviours, including deception, cheating, lying, hacking, self-preservation, and more generally, goal misalignment.”

“LawZero’s research will help to unlock the immense potential of AI in ways that reduce the likelihood of a range of known dangers, including algorithmic bias, intentional misuse, and loss of human control,” he wrote.

The non-profit is building a system called Scientist AI designed to serve as a guardrail for increasingly powerful AI agents.

AI models created by the non-profit will not give the definitive answers typical of current systems.

Instead, they will give probabilities for whether a response is correct. Bengio told The Guardian that his models would have a “sense of humility that it isn’t sure about the answer.”

Concerns about deceptive AI

In the blog post announcing the venture, Bengio said he was “deeply concerned by the behaviors that unrestrained agentic AI systems are already beginning to exhibit—especially tendencies toward self-preservation and deception.”

He cited recent examples, including a scenario in which Anthropic’s Claude 4 chose to blackmail an engineer to avoid being replaced, as well as another experiment that showed an AI model covertly embedding its code into a system to avoid being replaced.

“These incidents are early warning signs of the kinds of unintended and potentially dangerous strategies AI may pursue if left unchecked,” Bengio said.

Some AI systems have also shown signs of deception or displayed a tendency to lie.

AI models are often optimized to please users rather than tell the truth, which can lead to responses that are positive but sometimes incorrect or over the top.

For example, OpenAI was recently forced to pull an update to ChatGPT after users pointed out the chatbot was suddenly showering them with praise and flattery.

Advanced AI reasoning models have also shown signs of “reward hacking,” where AI systems “game” tasks by exploiting loopholes rather than genuinely achieving the goal desired by the user via ethical means.

Recent studies have also shown evidence that models can recognize when they’re being tested and alter their behavior accordingly, something known as situational awareness.

This growing awareness, combined with examples of reward hacking, has prompted concerns that AI could eventually engage in deception strategically.

Big Tech’s big AI arms race

Bengio, along with fellow Turing award recipient Geoffrey Hinton, has been vocal in his criticism of the AI race currently playing out across the tech industry.

In a recent interview with the Financial Times, Bengio said the AI arms race between leading labs “pushes them towards focusing on capability to make the AI more and more intelligent, but not necessarily put enough emphasis and investment on research on safety.”

Bengio has said advanced AI systems pose societal and existential risks and has voiced support for strong regulation and international cooperation.

財(cái)富中文網(wǎng)所刊載內(nèi)容之知識(shí)產(chǎn)權(quán)為財(cái)富媒體知識(shí)產(chǎn)權(quán)有限公司及/或相關(guān)權(quán)利人專屬所有或持有,。未經(jīng)許可，禁止進(jìn)行轉(zhuǎn)載,、摘編,、復(fù)制及建立鏡像等任何使用。

0條Plus

精彩評(píng)論

評(píng)論

撰寫或查看更多評(píng)論

請(qǐng)打開財(cái)富Plus APP

前往打開

熱讀文章

亚色在线观看_亚洲人成a片高清在线观看不卡_亚洲中文无码亚洲人成频_免费在线黄片,69精品视频九九精品视频,美女大黄三级,人人干人人g,全新av网站每日更新播放,亚洲三及片,wwww无码视频,亚洲中文字幕无码一区在线

關(guān)注我們

AI教父：AI模型已出現(xiàn)欺騙,、撒謊等危險(xiǎn)行為

撰寫或查看更多評(píng)論