倫敦人工智能公司DeepMind開(kāi)發(fā)的一種算法能夠在最開(kāi)始并不知道游戲規(guī)則的情況下學(xué)會(huì)玩游戲,,并且能達(dá)到超越人類(lèi)的水平,。該公司公布了有關(guān)這種算法的最新細(xì)節(jié),并表示該成就向創(chuàng)建人工智能系統(tǒng)解決復(fù)雜的,、不確定的現(xiàn)實(shí)狀況邁出了一大步,。
DeepMind將這種算法命名為MuZero。它已經(jīng)學(xué)會(huì)了下圍棋和日本策略游戲?qū)⑵?,還有一系列雅達(dá)利(Atari)經(jīng)典視頻游戲,,均達(dá)到了超常水平。之前,,DeepMind創(chuàng)造的許多算法能夠掌握某一種游戲,,但一直沒(méi)有一種算法能夠同時(shí)學(xué)會(huì)棋類(lèi)游戲和視頻游戲。而且DeepMind之前開(kāi)發(fā)的掌握棋類(lèi)游戲的算法阿爾法元(AlphaZero),,首先要知道游戲的規(guī)則,,但MuZero并不需要知道規(guī)則。
阿爾法元是阿爾法狗(AlphaGo)的升級(jí)版,。2016年,,DeepMind推出的圍棋算法阿爾法狗在韓國(guó)舉辦的一場(chǎng)比賽中,擊敗了世界圍棋名將李世石,,一戰(zhàn)成名,。
DeepMind隸屬于谷歌(Google)母公司Alphabet。該公司在2019年公布了MuZero,,但本周三它在知名科學(xué)期刊《自然》(Nature)上發(fā)表了一篇同行評(píng)議論文,,公布了有關(guān)該算法的更多信息。
MuZero首先會(huì)創(chuàng)建一個(gè)模型,,模擬它所理解的游戲的運(yùn)行方式,,然后利用這個(gè)模型規(guī)劃在游戲中最有利的動(dòng)作。這種算法通過(guò)重復(fù)玩游戲,,學(xué)習(xí)完善模型和計(jì)劃的行動(dòng),。在雙人游戲中,MuZero通過(guò)與其之前的版本對(duì)戰(zhàn)不斷學(xué)習(xí),。
對(duì)于真實(shí)世界狀況更重要的是,,算法創(chuàng)建的游戲規(guī)則模型并不需要100%準(zhǔn)確,甚至不一定是完整的,。模型只需要能夠幫助MuZero在游戲中進(jìn)步即可,,之后它會(huì)逐步完善模型。
DeepMind計(jì)算機(jī)科學(xué)家,、MuZero開(kāi)發(fā)團(tuán)隊(duì)的負(fù)責(zé)人戴維?西爾沃告訴《財(cái)富》雜志:“我們只是告訴系統(tǒng),,去吧,,去創(chuàng)建你自己對(duì)于世界運(yùn)行方式的內(nèi)部構(gòu)想。你在使用它的時(shí)候,,只要這種內(nèi)部構(gòu)想能夠生成實(shí)際匹配現(xiàn)實(shí)的東西,,我們就能接受?!?/p>
在《自然》上發(fā)表的論文中,,DeepMind介紹了制定計(jì)劃對(duì)于這種算法的能力的重要性:MuZero制定計(jì)劃可用的時(shí)間越多,表現(xiàn)越好,。如果在下圍棋的時(shí)候,,MuZero有50秒鐘思考一步棋,它的能力會(huì)比只有十分之一秒的情況高出數(shù)倍,,相當(dāng)于一位強(qiáng)大的業(yè)余棋手和一位強(qiáng)大的專(zhuān)業(yè)棋手之間的區(qū)別。
在雅達(dá)利視頻游戲中也存在類(lèi)似的差異,,在這些游戲中,,快速反應(yīng)時(shí)間往往比戰(zhàn)略性思考更重要。在玩這些游戲的時(shí)候,,MuZero如果獲得更多時(shí)間,,可以推算出更多可能情景中會(huì)發(fā)生的結(jié)果。研究人員注意到,,該系統(tǒng)在《吃豆人小姐》(Ms. Pac-Man)游戲中表現(xiàn)很出色,,即使該系統(tǒng)的時(shí)間只能推算出6至7種可能的動(dòng)作,這些時(shí)間并不足以使系統(tǒng)形成對(duì)所有可能性的完整理解,。
DeepMind并沒(méi)有測(cè)試MuZero玩多人游戲的表現(xiàn),,例如撲克牌或橋牌等,在這類(lèi)游戲中隱藏信息很重要,。西爾沃表示,,他認(rèn)為MuZero或許也能學(xué)會(huì)玩這類(lèi)游戲,而且公司計(jì)劃進(jìn)一步探索,??▋?nèi)基梅隆大學(xué)(Carnegie Mellon University)和Facebook的人工智能研究人員之前創(chuàng)建的人工智能系統(tǒng),曾經(jīng)戰(zhàn)勝過(guò)撲克牌冠軍,。但橋牌部分依賴溝通,,因此依舊很有挑戰(zhàn)性。
西爾沃表示,,DeepMind正在考慮MuZero的許多現(xiàn)實(shí)應(yīng)用,。他表示,到目前為止,,最有前途的應(yīng)用是視頻壓縮,。目前視頻信號(hào)壓縮有許多不同的方法,,但沒(méi)有明確的規(guī)則能判斷對(duì)于不同視頻哪一種是最佳壓縮方法。他說(shuō)使用類(lèi)似于MuZero的算法所做的初步試驗(yàn)顯示,,與之前的最佳壓縮方法相比,,算法壓縮的視頻需要的帶寬能減少5%。西爾沃還表示,,MuZero可能有助于開(kāi)發(fā)功能更強(qiáng)大的機(jī)器人和數(shù)字助手,,并且可以擴(kuò)展DeepMind最近在預(yù)測(cè)蛋白質(zhì)結(jié)構(gòu)方面取得的突破。到目前為止,,這項(xiàng)研究并沒(méi)有使用公司在游戲研究方面開(kāi)發(fā)的先進(jìn)技術(shù),。
但有些機(jī)構(gòu)已經(jīng)將MuZero應(yīng)用于不同領(lǐng)域。上周,,美國(guó)空軍表示,,其使用DeepMind去年免費(fèi)發(fā)布的MuZero的相關(guān)信息開(kāi)發(fā)了一款人工智能系統(tǒng),該系統(tǒng)能夠自動(dòng)控制U-2偵察機(jī)的雷達(dá),。美國(guó)空軍在12月14日的一次訓(xùn)練任務(wù)中模擬了一次導(dǎo)彈襲擊,,并在一架U-2蛟龍夫人偵察機(jī)上測(cè)試了這款人工智能系統(tǒng)ARTUMu。由計(jì)算機(jī)科學(xué)家,、武器控制專(zhuān)家和人權(quán)活動(dòng)人士領(lǐng)導(dǎo)的“阻止殺手機(jī)器人”(Stop Killer Robots)運(yùn)動(dòng)表示,,美國(guó)空軍的研究朝著制造自動(dòng)化致命武器邁出了危險(xiǎn)的一步。
DeepMind告訴《財(cái)富》雜志稱(chēng),,對(duì)于美國(guó)空軍的研究,,公司沒(méi)有參與也毫不知情,直到上周才看到有關(guān)此次訓(xùn)練任務(wù)的媒體報(bào)道,。DeepMind之前承諾避免參與研究進(jìn)攻性武器能力,,或者能識(shí)別和跟蹤目標(biāo)并且會(huì)在沒(méi)有人類(lèi)最終決策的情況下部署武器攻擊目標(biāo)的人工智能。(財(cái)富中文網(wǎng))
翻譯:劉進(jìn)龍
審校:汪皓
倫敦人工智能公司DeepMind開(kāi)發(fā)的一種算法能夠在最開(kāi)始并不知道游戲規(guī)則的情況下學(xué)會(huì)玩游戲,,并且能達(dá)到超越人類(lèi)的水平,。該公司公布了有關(guān)這種算法的最新細(xì)節(jié),并表示該成就向創(chuàng)建人工智能系統(tǒng)解決復(fù)雜的,、不確定的現(xiàn)實(shí)狀況邁出了一大步,。
DeepMind將這種算法命名為MuZero。它已經(jīng)學(xué)會(huì)了下圍棋和日本策略游戲?qū)⑵?,還有一系列雅達(dá)利(Atari)經(jīng)典視頻游戲,,均達(dá)到了超常水平。之前,,DeepMind創(chuàng)造的許多算法能夠掌握某一種游戲,,但一直沒(méi)有一種算法能夠同時(shí)學(xué)會(huì)棋類(lèi)游戲和視頻游戲。而且DeepMind之前開(kāi)發(fā)的掌握棋類(lèi)游戲的算法阿爾法元(AlphaZero),首先要知道游戲的規(guī)則,,但MuZero并不需要知道規(guī)則,。
阿爾法元是阿爾法狗(AlphaGo)的升級(jí)版。2016年,,DeepMind推出的圍棋算法阿爾法狗在韓國(guó)舉辦的一場(chǎng)比賽中,,擊敗了世界圍棋名將李世石,一戰(zhàn)成名,。
DeepMind隸屬于谷歌(Google)母公司Alphabet,。該公司在2019年公布了MuZero,但本周三它在知名科學(xué)期刊《自然》(Nature)上發(fā)表了一篇同行評(píng)議論文,,公布了有關(guān)該算法的更多信息,。
MuZero首先會(huì)創(chuàng)建一個(gè)模型,模擬它所理解的游戲的運(yùn)行方式,,然后利用這個(gè)模型規(guī)劃在游戲中最有利的動(dòng)作,。這種算法通過(guò)重復(fù)玩游戲,學(xué)習(xí)完善模型和計(jì)劃的行動(dòng),。在雙人游戲中,,MuZero通過(guò)與其之前的版本對(duì)戰(zhàn)不斷學(xué)習(xí)。
對(duì)于真實(shí)世界狀況更重要的是,,算法創(chuàng)建的游戲規(guī)則模型并不需要100%準(zhǔn)確,甚至不一定是完整的,。模型只需要能夠幫助MuZero在游戲中進(jìn)步即可,,之后它會(huì)逐步完善模型。
DeepMind計(jì)算機(jī)科學(xué)家,、MuZero開(kāi)發(fā)團(tuán)隊(duì)的負(fù)責(zé)人戴維?西爾沃告訴《財(cái)富》雜志:“我們只是告訴系統(tǒng),,去吧,去創(chuàng)建你自己對(duì)于世界運(yùn)行方式的內(nèi)部構(gòu)想,。你在使用它的時(shí)候,,只要這種內(nèi)部構(gòu)想能夠生成實(shí)際匹配現(xiàn)實(shí)的東西,我們就能接受,?!?/p>
在《自然》上發(fā)表的論文中,DeepMind介紹了制定計(jì)劃對(duì)于這種算法的能力的重要性:MuZero制定計(jì)劃可用的時(shí)間越多,,表現(xiàn)越好,。如果在下圍棋的時(shí)候,MuZero有50秒鐘思考一步棋,,它的能力會(huì)比只有十分之一秒的情況高出數(shù)倍,,相當(dāng)于一位強(qiáng)大的業(yè)余棋手和一位強(qiáng)大的專(zhuān)業(yè)棋手之間的區(qū)別。
在雅達(dá)利視頻游戲中也存在類(lèi)似的差異,在這些游戲中,,快速反應(yīng)時(shí)間往往比戰(zhàn)略性思考更重要,。在玩這些游戲的時(shí)候,MuZero如果獲得更多時(shí)間,,可以推算出更多可能情景中會(huì)發(fā)生的結(jié)果,。研究人員注意到,該系統(tǒng)在《吃豆人小姐》(Ms. Pac-Man)游戲中表現(xiàn)很出色,,即使該系統(tǒng)的時(shí)間只能推算出6至7種可能的動(dòng)作,,這些時(shí)間并不足以使系統(tǒng)形成對(duì)所有可能性的完整理解。
DeepMind并沒(méi)有測(cè)試MuZero玩多人游戲的表現(xiàn),,例如撲克牌或橋牌等,,在這類(lèi)游戲中隱藏信息很重要。西爾沃表示,,他認(rèn)為MuZero或許也能學(xué)會(huì)玩這類(lèi)游戲,,而且公司計(jì)劃進(jìn)一步探索??▋?nèi)基梅隆大學(xué)(Carnegie Mellon University)和Facebook的人工智能研究人員之前創(chuàng)建的人工智能系統(tǒng),,曾經(jīng)戰(zhàn)勝過(guò)撲克牌冠軍。但橋牌部分依賴溝通,,因此依舊很有挑戰(zhàn)性,。
西爾沃表示,DeepMind正在考慮MuZero的許多現(xiàn)實(shí)應(yīng)用,。他表示,,到目前為止,最有前途的應(yīng)用是視頻壓縮,。目前視頻信號(hào)壓縮有許多不同的方法,,但沒(méi)有明確的規(guī)則能判斷對(duì)于不同視頻哪一種是最佳壓縮方法。他說(shuō)使用類(lèi)似于MuZero的算法所做的初步試驗(yàn)顯示,,與之前的最佳壓縮方法相比,,算法壓縮的視頻需要的帶寬能減少5%。西爾沃還表示,,MuZero可能有助于開(kāi)發(fā)功能更強(qiáng)大的機(jī)器人和數(shù)字助手,,并且可以擴(kuò)展DeepMind最近在預(yù)測(cè)蛋白質(zhì)結(jié)構(gòu)方面取得的突破。到目前為止,,這項(xiàng)研究并沒(méi)有使用公司在游戲研究方面開(kāi)發(fā)的先進(jìn)技術(shù),。
但有些機(jī)構(gòu)已經(jīng)將MuZero應(yīng)用于不同領(lǐng)域。上周,,美國(guó)空軍表示,,其使用DeepMind去年免費(fèi)發(fā)布的MuZero的相關(guān)信息開(kāi)發(fā)了一款人工智能系統(tǒng),,該系統(tǒng)能夠自動(dòng)控制U-2偵察機(jī)的雷達(dá)。美國(guó)空軍在12月14日的一次訓(xùn)練任務(wù)中模擬了一次導(dǎo)彈襲擊,,并在一架U-2蛟龍夫人偵察機(jī)上測(cè)試了這款人工智能系統(tǒng)ARTUMu,。由計(jì)算機(jī)科學(xué)家、武器控制專(zhuān)家和人權(quán)活動(dòng)人士領(lǐng)導(dǎo)的“阻止殺手機(jī)器人”(Stop Killer Robots)運(yùn)動(dòng)表示,,美國(guó)空軍的研究朝著制造自動(dòng)化致命武器邁出了危險(xiǎn)的一步,。
DeepMind告訴《財(cái)富》雜志稱(chēng),對(duì)于美國(guó)空軍的研究,,公司沒(méi)有參與也毫不知情,,直到上周才看到有關(guān)此次訓(xùn)練任務(wù)的媒體報(bào)道。DeepMind之前承諾避免參與研究進(jìn)攻性武器能力,,或者能識(shí)別和跟蹤目標(biāo)并且會(huì)在沒(méi)有人類(lèi)最終決策的情況下部署武器攻擊目標(biāo)的人工智能,。(財(cái)富中文網(wǎng))
翻譯:劉進(jìn)龍
審校:汪皓
London A.I. company DeepMind has published new details about an algorithm that can learn to play games at superhuman levels—even when it doesn’t start out knowing the rules of the game, an achievement that the company says is a big step toward creating A.I. systems that can deal with complicated and uncertain real-world situations.
The algorithm, which DeepMind calls MuZero, has learned to play chess, Go, and the Japanese strategy game Shogi, as well as a host of classic Atari video games at superhuman levels. Previously, DeepMind had created algorithms that could master each of these games, but not a single algorithm that could handle both the board games and the video games. Also, DeepMind’s previous algorithm for mastering the board games, AlphaZero, started out knowing the rules, while MuZero does not.
AlphaZero was itself a more general variant of AlphaGo, the Go-playing algorithm DeepMind famously demonstrated in 2016, defeating Lee Sedol, at the time the world’s top-ranked Go player, in a match in South Korea.
DeepMind, which is owned by Google parent Alphabet, first unveiled MuZero in 2019, but on Wednesday it published more information about the algorithm in a peer-reviewed paper in the prestigious scientific journal Nature.
MuZero works by constructing a model of how it thinks the game it is playing works and then using that model to plan the most beneficial actions in the game. It learns to improve both the model and its planned actions by playing the game over and over again. In the case of the two player games, MuZero learns by playing against previous versions of itself.
More important for real-world situations, the model that the algorithm creates of the rules of the game doesn’t have to be 100% accurate, or even complete. It just has to be useful enough that MuZero is able to make some progress in the game from which it can begin to improve.
“We are basically saying to the system, just go and make up your own internal fiction about how the world works,” David Silver, the DeepMind computer scientist who led the team that built MuZero, told Fortune. “As long as this internal fiction leads to something that actually matches reality when you come to use it, then we’re fine with it.”
In the Nature paper, DeepMind showed the importance of planning to the algorithm’s capability: The more time MuZero was given to plan, the better it performed. MuZero was many times more capable at Go—about the difference between a strong amateur and a strong professional player—when given 50 seconds to consider a move, compared with when it was given just one-tenth of a second.
This difference held even in the Atari games, where quick reaction times are often thought to matter more than strategic thinking. Here, more time allowed MuZero to game out what might happen in more possible scenarios. The researchers noted that the system achieved very good performance in a game like Ms. Pac-Man, even when it was only given enough time to explore six or seven possible moves, which was far too few to gain a complete understanding of all the possibilities.
While DeepMind has not tested MuZero on multiplayer games where hidden information plays an important role—such as poker or bridge—Silver said he suspects MuZero might be able to learn to play these games too, and that the company plans to explore this further. A.I. researchers from Carnegie Mellon University and Facebook have previously built A.I. systems capable of beating champion poker players. Bridge, which relies in part on communication, remains a challenge.
Silver said DeepMind is considering several real-world uses for MuZero. One of the most promising so far, Silver said, is video compression, where there are many different ways to compress a video signal, but no clear rules about which one is best for different kinds of video. He said that initial experiments with MuZero-like algorithms had shown it might be possible to achieve a 5% reduction in bandwidth over the best previous compression methods. Silver also said MuZero might be useful for building more capable robots and digital assistants as well as extending DeepMind’s recent breakthrough in predicting the structure of proteins, research that has so far not relied on the techniques the company pioneered in its games research.
Others, however, are already taking MuZero in very different directions. Last week, the U.S. Air Force revealed that it had used information about MuZero that DeepMind had made freely available to the public last year to help create an A.I. system that could autonomously control the radar of a U-2 spy plane. The Air Force tested the A.I. system, which it calls ARTUMu, on a U-2 Dragon Lady spy plane during a simulated missile strike in a training mission on Dec. 14. Stop Killer Robots, a campaign led by computer scientists, arms control experts and human rights activists, said the Air Force research was a dangerous step toward creating lethal autonomous weapons.
DeepMind told Fortune it had no role in the Air Force research and was unaware of it until seeing news reports about the training mission last week. DeepMind has previously pledged to avoid work on offensive weapons capabilities or A.I. that can identify and track targets and deploy weapons against them without a human making the final decision about striking those particular targets.