亚色在线观看_亚洲人成a片高清在线观看不卡_亚洲中文无码亚洲人成频_免费在线黄片,69精品视频九九精品视频,美女大黄三级,人人干人人g,全新av网站每日更新播放,亚洲三及片,wwww无码视频,亚洲中文字幕无码一区在线

首頁(yè) 500強(qiáng) 活動(dòng) 榜單 商業(yè) 科技 商潮 專(zhuān)題 品牌中心
雜志訂閱

我們對(duì)人工智能的評(píng)估方式存在嚴(yán)重偏差

Fernanda Dobal
2025-05-10

就人工智能的應(yīng)用場(chǎng)景而言,信任的重要性遠(yuǎn)超技術(shù)實(shí)力,。

文本設(shè)置
小號(hào)
默認(rèn)
大號(hào)
Plus(0條)

在人工智能應(yīng)用方面,,信任的重要性遠(yuǎn)超技術(shù)實(shí)力,。圖片來(lái)源:getty images

我們?cè)u(píng)估人工智能的方法存在一種吊詭的錯(cuò)位:我們創(chuàng)造出模仿和強(qiáng)化人類(lèi)能力的系統(tǒng),,然而衡量其成功的標(biāo)準(zhǔn)卻包羅萬(wàn)象,,唯獨(dú)遺漏了那些真正對(duì)人類(lèi)而言有價(jià)值的維度。

科技行業(yè)的數(shù)據(jù)展示界面上滿是有關(guān)人工智能的亮眼數(shù)字:處理速度,、參數(shù)數(shù)量,、基準(zhǔn)測(cè)試得分、用戶增長(zhǎng)率,。硅谷最頂尖的人才不斷調(diào)整算法,,只為拉高這些指標(biāo)。然而,,在這一堆衡量標(biāo)準(zhǔn)中,,我們卻忽視了一個(gè)基本事實(shí):世界上最先進(jìn)的人工智能,若無(wú)法實(shí)實(shí)在在地改善人類(lèi)生活,,便毫無(wú)價(jià)值可言,。

想想早期搜索引擎的故事。在谷歌嶄露頭角之前,,各大公司圍繞網(wǎng)頁(yè)索引數(shù)量展開(kāi)激烈競(jìng)爭(zhēng),。然而谷歌之所以能夠異軍突起,并非因?yàn)樗鼡碛幸?guī)模最大的數(shù)據(jù)庫(kù),,而在于它更深刻地洞察了用戶行為——相關(guān)性和可信度比單純的數(shù)量更重要,。

能建立信任的人工智能

當(dāng)下的人工智能領(lǐng)域,情形與往昔有著異曲同工之處,。各公司競(jìng)相構(gòu)建規(guī)模更大的模型,,卻可能忽視了以用戶為核心的設(shè)計(jì)里那些更微妙的元素,而這些元素才是真正推動(dòng)人工智能應(yīng)用和發(fā)揮影響力的關(guān)鍵所在,。

改善人工智能評(píng)估體系的關(guān)鍵在于建立信任機(jī)制,。最新研究成果顯示,那些能夠清晰闡釋自身推理過(guò)程的人工智能系統(tǒng)(即便存在偶發(fā)誤差)能贏得用戶更深度的持續(xù)使用,。這其中的道理通俗易懂——無(wú)論是面對(duì)技術(shù)還是人際交往,,信任的基石都在于透明度和可靠性,而非單純的性能指標(biāo),。

然而,,信任只是基礎(chǔ)。最有效的人工智能系統(tǒng)通過(guò)展現(xiàn)對(duì)人類(lèi)心理的真正洞察,與用戶建立起真正的情感聯(lián)系,。研究揭示出一種極具說(shuō)服力的模式:當(dāng)人工智能系統(tǒng)不再局限于執(zhí)行任務(wù),,而是能夠依據(jù)用戶的心理訴求進(jìn)行調(diào)整時(shí),它們就會(huì)成為人們?nèi)粘I钪胁豢苫蛉钡囊徊糠?。這絕非簡(jiǎn)單地編寫(xiě)幾句看似友好的程序,,而是要打造出真正理解并能回應(yīng)人類(lèi)體驗(yàn)的系統(tǒng)。

于人工智能的應(yīng)用場(chǎng)景而言,,信任的重要性遠(yuǎn)超技術(shù)實(shí)力,。一項(xiàng)針對(duì)近1100名消費(fèi)者展開(kāi)的、具有開(kāi)創(chuàng)性意義的人工智能聊天機(jī)器人研究揭示:人們?cè)诿鎸?duì)服務(wù)失誤時(shí),,是否愿意選擇諒解并保持對(duì)品牌的忠誠(chéng)度,,并不取決于人工智能解決問(wèn)題的快慢,而在于他們是否信任這個(gè)試圖提供幫助的系統(tǒng),。

理解你的人工智能

研究人員發(fā)現(xiàn),,構(gòu)建這種信任關(guān)系需滿足三個(gè)至關(guān)重要的維度:其一,人工智能需展現(xiàn)出真正理解問(wèn)題以及解決問(wèn)題的能力,;其二,,它需流露出善意,即真誠(chéng)地希望為用戶提供幫助,;其三,,它需在與用戶的互動(dòng)過(guò)程中,始終如一地保持誠(chéng)實(shí),,以此維系自身的誠(chéng)信形象,。當(dāng)人工智能聊天機(jī)器人具備這些特質(zhì)時(shí),,客戶便更傾向于原諒服務(wù)中出現(xiàn)的問(wèn)題,,也不太可能向他人抱怨自己的體驗(yàn)。

如何讓人工智能系統(tǒng)贏得用戶的信賴呢,?研究發(fā)現(xiàn),,一些簡(jiǎn)單的舉措就能帶來(lái)顯著效果:為人工智能賦予擬人化的特質(zhì),通過(guò)編程使其在回復(fù)中展現(xiàn)出同理心(“我理解這一定讓你很沮喪”),,以及在數(shù)據(jù)隱私方面保持透明,。有這樣一個(gè)典型案例,一位遭遇送貨延遲問(wèn)題的客戶,,在與名為羅素的聊天機(jī)器人溝通時(shí),,羅素不僅認(rèn)可了客戶的沮喪情緒,還條理清晰地解釋了問(wèn)題產(chǎn)生的原因以及解決方案,,客戶對(duì)羅素這樣的聊天機(jī)器人更有可能保持忠誠(chéng)度(與只是單純陳述事實(shí),、連名字都沒(méi)有的聊天機(jī)器人相比)。

這一發(fā)現(xiàn)沖擊了“人工智能只需做到快速準(zhǔn)確”的普遍假設(shè)。在醫(yī)療保健,、金融服務(wù)以及客戶支持領(lǐng)域,,那些最為成功的生成式人工智能系統(tǒng),并非一定是架構(gòu)最為復(fù)雜的,,而是那些能夠與用戶建立起真正融洽關(guān)系的系統(tǒng),。它們?cè)敢饣ㄙM(fèi)時(shí)間向用戶闡釋推理過(guò)程,承認(rèn)用戶心中的顧慮,,并且始終如一地滿足用戶需求,。

然而,傳統(tǒng)衡量標(biāo)準(zhǔn)并不總能反映出這些關(guān)鍵的性能維度,。我們需要構(gòu)建新型評(píng)估框架:在評(píng)估人工智能系統(tǒng)時(shí),,不能僅僅局限于技術(shù)嫻熟程度,還需關(guān)注系統(tǒng)營(yíng)造心理安全感,、與用戶建立起真正融洽關(guān)系的能力,,而最為關(guān)鍵的是,要看它們能否助力用戶達(dá)成目標(biāo),。

全新的人工智能衡量標(biāo)準(zhǔn)

在Cleo公司,,我們致力于借助人工智能助手來(lái)提升人們的財(cái)務(wù)健康水平,當(dāng)下正探索全新的衡量標(biāo)準(zhǔn),。這可能意味著要衡量用戶信任度,、用戶參與深度和質(zhì)量等因素,同時(shí),,我們還會(huì)關(guān)注整個(gè)對(duì)話過(guò)程,。對(duì)我們而言,了解公司的人工智能財(cái)務(wù)助手Cleo能否在每次交互中幫助用戶達(dá)成目標(biāo)至關(guān)重要,。

構(gòu)建更為細(xì)致的評(píng)估框架,,并不意味著要舍棄性能指標(biāo),畢竟性能指標(biāo)仍是衡量商業(yè)與技術(shù)成功的重要指標(biāo),。只不過(guò),,它們需要與那些能夠更深入衡量對(duì)人類(lèi)產(chǎn)生影響的指標(biāo)相互平衡。但這絕非易事,,其中一個(gè)挑戰(zhàn)在于這些指標(biāo)存在主觀性,,不同個(gè)體對(duì)于“好”的評(píng)判標(biāo)準(zhǔn)往往大相徑庭。即便如此,,這些指標(biāo)仍然值得探索,。

隨著人工智能越來(lái)越深入地融入日常生活,能夠理解這種轉(zhuǎn)變的公司才會(huì)取得成功,。過(guò)往引領(lǐng)我們至今的評(píng)估標(biāo)準(zhǔn),,已然無(wú)法滿足未來(lái)發(fā)展的需求。是時(shí)候開(kāi)始衡量真正關(guān)鍵的要素了:我們不能僅僅聚焦于人工智能的性能表現(xiàn),更應(yīng)關(guān)注它究竟能在多大程度上助力人類(lèi)實(shí)現(xiàn)蓬勃發(fā)展,。

費(fèi)爾南達(dá)·多巴爾(Fernanda Dobal)現(xiàn)任Cleo公司產(chǎn)品總監(jiān),,主要負(fù)責(zé)公司的人工智能與聊天機(jī)器人相關(guān)業(yè)務(wù)。

Fortune.com上發(fā)表的評(píng)論文章中表達(dá)的觀點(diǎn),,僅代表作者本人的觀點(diǎn),,不代表《財(cái)富》雜志的觀點(diǎn)和立場(chǎng)。(財(cái)富中文網(wǎng))

譯者:中慧言-王芳

我們?cè)u(píng)估人工智能的方法存在一種吊詭的錯(cuò)位:我們創(chuàng)造出模仿和強(qiáng)化人類(lèi)能力的系統(tǒng),,然而衡量其成功的標(biāo)準(zhǔn)卻包羅萬(wàn)象,,唯獨(dú)遺漏了那些真正對(duì)人類(lèi)而言有價(jià)值的維度。

科技行業(yè)的數(shù)據(jù)展示界面上滿是有關(guān)人工智能的亮眼數(shù)字:處理速度,、參數(shù)數(shù)量,、基準(zhǔn)測(cè)試得分、用戶增長(zhǎng)率,。硅谷最頂尖的人才不斷調(diào)整算法,,只為拉高這些指標(biāo)。然而,,在這一堆衡量標(biāo)準(zhǔn)中,,我們卻忽視了一個(gè)基本事實(shí):世界上最先進(jìn)的人工智能,若無(wú)法實(shí)實(shí)在在地改善人類(lèi)生活,,便毫無(wú)價(jià)值可言,。

想想早期搜索引擎的故事。在谷歌嶄露頭角之前,,各大公司圍繞網(wǎng)頁(yè)索引數(shù)量展開(kāi)激烈競(jìng)爭(zhēng),。然而谷歌之所以能夠異軍突起,并非因?yàn)樗鼡碛幸?guī)模最大的數(shù)據(jù)庫(kù),,而在于它更深刻地洞察了用戶行為——相關(guān)性和可信度比單純的數(shù)量更重要,。

能建立信任的人工智能

當(dāng)下的人工智能領(lǐng)域,情形與往昔有著異曲同工之處,。各公司競(jìng)相構(gòu)建規(guī)模更大的模型,,卻可能忽視了以用戶為核心的設(shè)計(jì)里那些更微妙的元素,,而這些元素才是真正推動(dòng)人工智能應(yīng)用和發(fā)揮影響力的關(guān)鍵所在,。

改善人工智能評(píng)估體系的關(guān)鍵在于建立信任機(jī)制。最新研究成果顯示,,那些能夠清晰闡釋自身推理過(guò)程的人工智能系統(tǒng)(即便存在偶發(fā)誤差)能贏得用戶更深度的持續(xù)使用,。這其中的道理通俗易懂——無(wú)論是面對(duì)技術(shù)還是人際交往,信任的基石都在于透明度和可靠性,,而非單純的性能指標(biāo),。

然而,信任只是基礎(chǔ)。最有效的人工智能系統(tǒng)通過(guò)展現(xiàn)對(duì)人類(lèi)心理的真正洞察,,與用戶建立起真正的情感聯(lián)系,。研究揭示出一種極具說(shuō)服力的模式:當(dāng)人工智能系統(tǒng)不再局限于執(zhí)行任務(wù),而是能夠依據(jù)用戶的心理訴求進(jìn)行調(diào)整時(shí),,它們就會(huì)成為人們?nèi)粘I钪胁豢苫蛉钡囊徊糠?。這絕非簡(jiǎn)單地編寫(xiě)幾句看似友好的程序,而是要打造出真正理解并能回應(yīng)人類(lèi)體驗(yàn)的系統(tǒng),。

于人工智能的應(yīng)用場(chǎng)景而言,,信任的重要性遠(yuǎn)超技術(shù)實(shí)力。一項(xiàng)針對(duì)近1100名消費(fèi)者展開(kāi)的,、具有開(kāi)創(chuàng)性意義的人工智能聊天機(jī)器人研究揭示:人們?cè)诿鎸?duì)服務(wù)失誤時(shí),,是否愿意選擇諒解并保持對(duì)品牌的忠誠(chéng)度,并不取決于人工智能解決問(wèn)題的快慢,,而在于他們是否信任這個(gè)試圖提供幫助的系統(tǒng),。

理解你的人工智能

研究人員發(fā)現(xiàn),構(gòu)建這種信任關(guān)系需滿足三個(gè)至關(guān)重要的維度:其一,,人工智能需展現(xiàn)出真正理解問(wèn)題以及解決問(wèn)題的能力,;其二,它需流露出善意,,即真誠(chéng)地希望為用戶提供幫助,;其三,它需在與用戶的互動(dòng)過(guò)程中,,始終如一地保持誠(chéng)實(shí),,以此維系自身的誠(chéng)信形象。當(dāng)人工智能聊天機(jī)器人具備這些特質(zhì)時(shí),,客戶便更傾向于原諒服務(wù)中出現(xiàn)的問(wèn)題,,也不太可能向他人抱怨自己的體驗(yàn)。

如何讓人工智能系統(tǒng)贏得用戶的信賴呢,?研究發(fā)現(xiàn),,一些簡(jiǎn)單的舉措就能帶來(lái)顯著效果:為人工智能賦予擬人化的特質(zhì),通過(guò)編程使其在回復(fù)中展現(xiàn)出同理心(“我理解這一定讓你很沮喪”),,以及在數(shù)據(jù)隱私方面保持透明,。有這樣一個(gè)典型案例,一位遭遇送貨延遲問(wèn)題的客戶,,在與名為羅素的聊天機(jī)器人溝通時(shí),,羅素不僅認(rèn)可了客戶的沮喪情緒,還條理清晰地解釋了問(wèn)題產(chǎn)生的原因以及解決方案,,客戶對(duì)羅素這樣的聊天機(jī)器人更有可能保持忠誠(chéng)度(與只是單純陳述事實(shí),、連名字都沒(méi)有的聊天機(jī)器人相比),。

這一發(fā)現(xiàn)沖擊了“人工智能只需做到快速準(zhǔn)確”的普遍假設(shè)。在醫(yī)療保健,、金融服務(wù)以及客戶支持領(lǐng)域,,那些最為成功的生成式人工智能系統(tǒng),并非一定是架構(gòu)最為復(fù)雜的,,而是那些能夠與用戶建立起真正融洽關(guān)系的系統(tǒng),。它們?cè)敢饣ㄙM(fèi)時(shí)間向用戶闡釋推理過(guò)程,承認(rèn)用戶心中的顧慮,,并且始終如一地滿足用戶需求,。

然而,傳統(tǒng)衡量標(biāo)準(zhǔn)并不總能反映出這些關(guān)鍵的性能維度,。我們需要構(gòu)建新型評(píng)估框架:在評(píng)估人工智能系統(tǒng)時(shí),,不能僅僅局限于技術(shù)嫻熟程度,還需關(guān)注系統(tǒng)營(yíng)造心理安全感,、與用戶建立起真正融洽關(guān)系的能力,,而最為關(guān)鍵的是,要看它們能否助力用戶達(dá)成目標(biāo),。

全新的人工智能衡量標(biāo)準(zhǔn)

在Cleo公司,,我們致力于借助人工智能助手來(lái)提升人們的財(cái)務(wù)健康水平,當(dāng)下正探索全新的衡量標(biāo)準(zhǔn),。這可能意味著要衡量用戶信任度,、用戶參與深度和質(zhì)量等因素,同時(shí),,我們還會(huì)關(guān)注整個(gè)對(duì)話過(guò)程,。對(duì)我們而言,了解公司的人工智能財(cái)務(wù)助手Cleo能否在每次交互中幫助用戶達(dá)成目標(biāo)至關(guān)重要,。

構(gòu)建更為細(xì)致的評(píng)估框架,,并不意味著要舍棄性能指標(biāo),畢竟性能指標(biāo)仍是衡量商業(yè)與技術(shù)成功的重要指標(biāo),。只不過(guò),,它們需要與那些能夠更深入衡量對(duì)人類(lèi)產(chǎn)生影響的指標(biāo)相互平衡。但這絕非易事,,其中一個(gè)挑戰(zhàn)在于這些指標(biāo)存在主觀性,,不同個(gè)體對(duì)于“好”的評(píng)判標(biāo)準(zhǔn)往往大相徑庭。即便如此,,這些指標(biāo)仍然值得探索,。

隨著人工智能越來(lái)越深入地融入日常生活,,能夠理解這種轉(zhuǎn)變的公司才會(huì)取得成功,。過(guò)往引領(lǐng)我們至今的評(píng)估標(biāo)準(zhǔn),,已然無(wú)法滿足未來(lái)發(fā)展的需求。是時(shí)候開(kāi)始衡量真正關(guān)鍵的要素了:我們不能僅僅聚焦于人工智能的性能表現(xiàn),,更應(yīng)關(guān)注它究竟能在多大程度上助力人類(lèi)實(shí)現(xiàn)蓬勃發(fā)展,。

費(fèi)爾南達(dá)·多巴爾(Fernanda Dobal)現(xiàn)任Cleo公司產(chǎn)品總監(jiān),主要負(fù)責(zé)公司的人工智能與聊天機(jī)器人相關(guān)業(yè)務(wù),。

Fortune.com上發(fā)表的評(píng)論文章中表達(dá)的觀點(diǎn),,僅代表作者本人的觀點(diǎn),不代表《財(cái)富》雜志的觀點(diǎn)和立場(chǎng),。(財(cái)富中文網(wǎng))

譯者:中慧言-王芳

There’s a peculiar irony in how we evaluate artificial intelligence: We’ve created systems to mimic and enhance human capabilities, yet we measure their success using metrics that capture everything except what makes them truly valuable to humans.

The tech industry’s dashboards overflow with impressive numbers on AI: processing speeds, parameter counts, benchmark scores, user growth rates. Silicon Valley’s greatest minds tweak algorithms endlessly to nudge these metrics higher. But in this maze of measurements, we’ve lost sight of a fundamental truth: The most sophisticated AI in the world is worthless if it doesn’t meaningfully improve human lives.

Consider the story of early search engines. Before Google, companies competed fiercely on the sheer number of web pages indexed. Yet Google prevailed not because it had the biggest database, but because it understood something deeper about human behavior—that relevance and trustworthiness matter more than raw quantity.

AI that builds trust

Today’s AI landscape feels remarkably similar, with companies racing to build bigger models while potentially missing the more nuanced elements of human-centered design that actually drive adoption and impact.

The path to better AI evaluation begins with trust. Emerging research demonstrates that users engage more deeply and persistently with AI systems that clearly explain their reasoning, even when those systems occasionally falter. This makes intuitive sense—trust, whether in technology or humans, grows from transparency and reliability rather than pure performance metrics.

Yet trust is merely the foundation. The most effective AI systems forge genuine emotional connections with users by demonstrating true understanding of human psychology. The research reveals a compelling pattern: When AI systems adapt to users’ psychological needs rather than simply executing tasks, they become integral parts of people’s daily lives. This isn’t about programming superficial friendliness—it’s about creating systems that genuinely comprehend and respond to the human experience.

Trust matters more than technical prowess when it comes to AI adoption. A groundbreaking AI chatbot study of nearly 1,100 consumers found that people are willing to forgive service failures and maintain brand loyalty not based on how quickly an AI resolves their problem, but on whether they trust the system trying to help them.

AI that gets you

The researchers discovered three key elements that build this trust: First, the AI needs to demonstrate a genuine ability to understand and address the issue. Second, it needs to show benevolence—a sincere desire to help. Third, it must maintain integrity through consistent, honest interactions. When AI chatbots embodied these qualities, customers were significantly more likely to forgive service problems and less likely to complain to others about their experience.

How do you make an AI system trustworthy? The study found that simple things make a big difference: anthropomorphizing the AI, programming it to express empathy through its responses (“I understand how frustrating this must be”), and being transparent about data privacy. In one telling example, a customer dealing with a delayed delivery was more likely to remain loyal when a chatbot named Russell acknowledged their frustration and clearly explained both the problem and solution, compared to an unnamed bot that just stated facts.

This insight challenges the common assumption that AI just needs to be fast and accurate. In health care, financial services, and customer support, the most successful generative AI systems aren’t necessarily the most sophisticated —they’re the ones that build genuine rapport with users. They take time to explain their reasoning, acknowledge concerns, and demonstrate consistent value for the user’s needs.

And yet traditional metrics don’t always capture these crucial dimensions of performance. We need frameworks that evaluate AI systems not just on their technical proficiency, but on their ability to create psychological safety, build genuine rapport, and most importantly, help users achieve their goals.

New AI metrics

At Cleo, where we’re focused on improving financial health through an AI assistant, we’re exploring these new measurements. This might mean measuring factors like user trust and the depth and quality of user engagement, as well as looking at entire conversational journeys. It’s important for us to understand if Cleo, our AI financial assistant, can help a user with what they are trying to achieve with any given interaction.

A more nuanced evaluation framework doesn’t mean abandoning performance metrics—they remain vital indicators of commercial and technical success. But they need to be balanced with deeper measures of human impact. That’s not always easy. One of the challenges with these metrics is their subjectivity. That means reasonable humans can disagree on what good looks like. Still, they are worth pursuing.

As AI becomes more deeply woven into the fabric of daily life, the companies that understand this shift will be the ones that succeed. The metrics that got us here won’t be sufficient for where we’re going. It’s time to start measuring what truly matters: not just how well AI performs, but how well it helps humans thrive.

The opinions expressed in Fortune.com commentary pieces are solely the views of their authors and do not necessarily reflect the opinions and beliefs of Fortune.

財(cái)富中文網(wǎng)所刊載內(nèi)容之知識(shí)產(chǎn)權(quán)為財(cái)富媒體知識(shí)產(chǎn)權(quán)有限公司及/或相關(guān)權(quán)利人專(zhuān)屬所有或持有,。未經(jīng)許可,禁止進(jìn)行轉(zhuǎn)載,、摘編,、復(fù)制及建立鏡像等任何使用。
0條Plus
精彩評(píng)論
評(píng)論

撰寫(xiě)或查看更多評(píng)論

請(qǐng)打開(kāi)財(cái)富Plus APP

前往打開(kāi)
熱讀文章