下次你在Zoom上打視頻電話的時(shí)候,你可以讓對(duì)方把手指放在鼻子旁邊,,或者讓對(duì)方將側(cè)臉對(duì)著鏡頭保持一分鐘,。
這些都是專家推薦的方法,以確保你的聊天對(duì)象是真人,,而不是用深度偽造(Deepfake)技術(shù)生成的假形象,。
這種防范措施雖然顯得有些莫名其妙,但是我們本來就生活在一個(gè)奇怪的年代,。
今年8月,,加密貨幣交易所幣安(Binance)的一名高管表示,,曾經(jīng)有騙子利用深度偽造技術(shù)假冒他的形象,用來對(duì)數(shù)個(gè)虛擬幣項(xiàng)目實(shí)施電信詐騙,。幣安的溝通總監(jiān)帕特里克·希爾曼稱,,曾經(jīng)有詐騙分子在Zoom視頻電話上假冒他的形象。(希爾曼并未提供相關(guān)證據(jù)證實(shí)他的說法,,一些專家對(duì)此表示懷疑,,但網(wǎng)絡(luò)安全研究人員表示,這類事件是有可能發(fā)生的,。)美國(guó)聯(lián)邦調(diào)查局(FBI)在今年7月曾經(jīng)警告道,,有人可能會(huì)在網(wǎng)絡(luò)求職遭遇時(shí)對(duì)方使用深度造假技術(shù)詐騙。一個(gè)月前,,歐洲的幾位市長(zhǎng)表示,,他們也被假冒的烏克蘭總統(tǒng)弗拉基米爾·澤倫斯基騙了。更離譜的是,,美國(guó)的一家名叫Metaphysic的初創(chuàng)公司開發(fā)了一款深度偽造軟件,,它在電視真人秀《美國(guó)達(dá)人秀》(America’s Got Talent)的決賽中,直接在觀眾面前,,將幾名歌手的臉無縫切換成了西蒙·考威爾等幾位明星評(píng)委的臉,,讓所有人驚掉下巴。
深度偽造,,是指利用使用人工智能技術(shù),創(chuàng)建極具說服力的虛假圖像和視頻,。以前要?jiǎng)?chuàng)假這樣一個(gè)虛假形象,,需要目標(biāo)對(duì)象的大量照片,還需要很多時(shí)間和相當(dāng)高超的編程和特效技術(shù),。即便假期臉被生成出來了,,以前的AI模型的響應(yīng)速度也不夠快,無法實(shí)時(shí)生成視頻直播級(jí)的完美假臉,。
然而從幣安和《美國(guó)達(dá)人秀》的例子能夠看出,,現(xiàn)在的情況已經(jīng)不同了,人們?cè)趯?shí)時(shí)視頻傳輸中使用深度偽造技術(shù)已經(jīng)越來越容易了,,而且此類軟件現(xiàn)在也是唾手可得,,很多還是免費(fèi)的,用起來也沒有什么技術(shù)門檻,。這也為各種各樣的電信詐騙甚至政治謠言提供了可能,。
加州大學(xué)伯克利分校(University of California at Berkeley)的計(jì)算機(jī)科學(xué)家哈尼·法里德是視頻分析和認(rèn)證領(lǐng)域的專家。他感嘆道:“我對(duì)現(xiàn)在實(shí)時(shí)深度偽技術(shù)造的速度和質(zhì)量感到驚訝,?!彼硎?,現(xiàn)在至少有三種不同的開源程序可以讓人們制作實(shí)時(shí)深度造假視頻。
法里德等專家都擔(dān)心深度偽造技術(shù)會(huì)使電信詐騙發(fā)展到一個(gè)新高度,?!斑@簡(jiǎn)直就像給網(wǎng)絡(luò)釣魚詐騙打了興奮劑?!彼f,。
識(shí)別深度造假的小技巧
好在專家表示,目前還是有很多小技巧能夠幫助你拆穿騙子的畫皮,。最可靠也最簡(jiǎn)單的方法,,就是讓對(duì)方側(cè)過臉去,讓鏡頭捕捉他的完整側(cè)臉,。深度偽造技術(shù)目前還無法保證側(cè)面不露破綻,,最主要的原因就是很難獲取足夠多的側(cè)面照片來訓(xùn)練AI模型。雖然有一些方法可以通過正面圖像推導(dǎo)出側(cè)面形象,,但這會(huì)大大增加生成圖像過程的復(fù)雜性,。
另外,深度偽造軟件還利用了人臉上的“錨點(diǎn)”,,來將深度偽造的“面具”匹配到人臉上,。所以只需要讓對(duì)方轉(zhuǎn)頭90度,就會(huì)導(dǎo)致一半的錨點(diǎn)不可見,,這通常就會(huì)導(dǎo)致圖像扭曲,、模糊、變形,,非常容易注意到,。
位于以色列的本古里安大學(xué)(Ben-Gurion University)進(jìn)攻性人工智能實(shí)驗(yàn)室(Offensive AI Lab)的負(fù)責(zé)人伊斯羅爾·米爾斯基還通過試驗(yàn)發(fā)現(xiàn)了很多能夠檢測(cè)出深度偽造的方法。比如在視頻通話過程中要求人們隨便拿一個(gè)東西在臉前劃過,,讓某個(gè)東西在他面前反彈一下,,讓他們整理一下自己的襯衫,摸一下頭發(fā),,或者用手遮擋半張臉,。以上每一種辦法,都會(huì)導(dǎo)致深度造假軟件無法描繪多出來的物體,,或者導(dǎo)致人臉嚴(yán)重失真,。對(duì)于音頻深度造假,米爾斯基建議你可以要求對(duì)方吹口哨,,或者換一種口音說話,,或者隨機(jī)挑一首曲子讓對(duì)方哼唱。
米爾斯基指出:“目前所有現(xiàn)有的深度偽造技術(shù)都采用了非常類似的協(xié)議,。它們雖然接受了大量數(shù)據(jù)的訓(xùn)練,,但其模式是非常特定的。多數(shù)軟件只能模仿人的正臉,,而處理不好側(cè)面或者遮擋臉部的物體,。”
法里德也展示了一種檢測(cè)深度偽造的方法,,那就是用一個(gè)簡(jiǎn)單的軟件程序,,讓對(duì)方的電腦屏幕以某種模式閃爍,讓電腦屏幕在對(duì)方臉上投射某種模式的光線,。深度偽造技術(shù)要么無法將燈光效果展示在模擬圖像中,,要么反應(yīng)速度太慢。法里德表示,,只要讓對(duì)方使用另一個(gè)光源——例如手機(jī)的手電筒,,從另一角度照亮他們的臉,就可以達(dá)到類似的檢測(cè)效果,。
米爾斯基表示,,要真實(shí)地模擬某個(gè)人做一些不尋常的事情,人工智能軟件就需要看到幾千個(gè)某人做這種事情的例子,。但收集這么多的數(shù)據(jù)是很困難的,。即便你成功訓(xùn)練AI軟件完成了這些有挑戰(zhàn)性的任務(wù)——比如拿起一根鉛筆,從臉上劃過,,且不露破綻,,那么只要你要求對(duì)方拿另一個(gè)東西代替鉛筆(例如一個(gè)杯子),那么AI軟件還是會(huì)失敗,。而且一般的詐騙分子也不太可能把假臉做到能夠攻克“鉛筆測(cè)試”和“側(cè)臉測(cè)試”的地步,。每個(gè)不同的任務(wù)都會(huì)增加AI模型訓(xùn)練的復(fù)雜性?!澳阆M疃葌卧燔浖晟频姆矫媸怯邢薜摹,!泵谞査够f,。
深度偽造技術(shù)也在日益進(jìn)步
目前,很少有安全專家建議大家在視頻通話前先驗(yàn)證身份——就像登陸很多網(wǎng)站要先填驗(yàn)證碼一樣,。不過米爾斯基和法里德都認(rèn)為,,在一些重要場(chǎng)合,視頻通話前先“驗(yàn)明正身”是有必要的,,比如政治領(lǐng)導(dǎo)人之間的對(duì)話,,或者有可能導(dǎo)致高額金額交易的對(duì)話。另外我們尤其要警惕一些反常的情形,,例如陌生號(hào)碼打來的電話,,又或者人們的一些反常行為和要求,。
法里德建議,對(duì)于一些非常重要的電話,,你也可以使用簡(jiǎn)單的雙因素認(rèn)證,,比如你能夠同時(shí)給對(duì)方發(fā)條短信,問問他是不是正在跟你視頻通話,。
專家強(qiáng)調(diào),,深度偽造技術(shù)一直在進(jìn)步,誰(shuí)也不能保證將來它們會(huì)不會(huì)突破上面的檢測(cè)手段,,甚至是以上幾種手段的組合,。
正是考慮到了這一點(diǎn),很多研究人員試圖從另一角度解決深度偽造的問題——例如創(chuàng)建某種數(shù)字簽名或者水印,,來證明視頻通話的真實(shí)性,,而不是試圖揭露深度偽造行為。
說到這里,,就不得不提一個(gè)名叫“內(nèi)容來源和真實(shí)性聯(lián)合計(jì)劃”(Coalition for Content Provenance and Authentication,,簡(jiǎn)稱C2PA)的機(jī)構(gòu),它是一個(gè)致力于建立數(shù)字媒體認(rèn)證標(biāo)準(zhǔn)的基金會(huì),,該基金會(huì)得到了微軟(Microsoft),、Adobe、索尼(Sony)和推特(Twitter)等公司的支持,。法里德說:“我認(rèn)為內(nèi)容來源和真實(shí)性聯(lián)合計(jì)劃應(yīng)該重視這個(gè)問題,,他們已經(jīng)為視頻錄制建立了規(guī)范,將它拓展到實(shí)時(shí)視頻通話也是一件很自然的事情,?!钡ɡ锏峦瑫r(shí)也承認(rèn),實(shí)視視頻數(shù)據(jù)的驗(yàn)證并非一項(xiàng)簡(jiǎn)單的技術(shù)挑戰(zhàn),?!拔椰F(xiàn)在還不知道應(yīng)該怎么做,但它是一個(gè)值得思考的問題,?!?/p>
最后提醒大家,下次在Zoom軟件上開電話會(huì)議的時(shí)候,,記得帶上一根鉛筆,。(財(cái)富中文網(wǎng))
譯者:樸成奎
下次你在Zoom上打視頻電話的時(shí)候,你可以讓對(duì)方把手指放在鼻子旁邊,,或者讓對(duì)方將側(cè)臉對(duì)著鏡頭保持一分鐘,。
這些都是專家推薦的方法,以確保你的聊天對(duì)象是真人,,而不是用深度偽造(Deepfake)技術(shù)生成的假形象,。
這種防范措施雖然顯得有些莫名其妙,,但是我們本來就生活在一個(gè)奇怪的年代。
今年8月,,加密貨幣交易所幣安(Binance)的一名高管表示,,曾經(jīng)有騙子利用深度偽造技術(shù)假冒他的形象,用來對(duì)數(shù)個(gè)虛擬幣項(xiàng)目實(shí)施電信詐騙,。幣安的溝通總監(jiān)帕特里克·希爾曼稱,,曾經(jīng)有詐騙分子在Zoom視頻電話上假冒他的形象。(希爾曼并未提供相關(guān)證據(jù)證實(shí)他的說法,,一些專家對(duì)此表示懷疑,,但網(wǎng)絡(luò)安全研究人員表示,這類事件是有可能發(fā)生的,。)美國(guó)聯(lián)邦調(diào)查局(FBI)在今年7月曾經(jīng)警告道,,有人可能會(huì)在網(wǎng)絡(luò)求職遭遇時(shí)對(duì)方使用深度造假技術(shù)詐騙。一個(gè)月前,,歐洲的幾位市長(zhǎng)表示,,他們也被假冒的烏克蘭總統(tǒng)弗拉基米爾·澤倫斯基騙了。更離譜的是,,美國(guó)的一家名叫Metaphysic的初創(chuàng)公司開發(fā)了一款深度偽造軟件,,它在電視真人秀《美國(guó)達(dá)人秀》(America’s Got Talent)的決賽中,直接在觀眾面前,,將幾名歌手的臉無縫切換成了西蒙·考威爾等幾位明星評(píng)委的臉,,讓所有人驚掉下巴。
深度偽造,,是指利用使用人工智能技術(shù),,創(chuàng)建極具說服力的虛假圖像和視頻。以前要?jiǎng)?chuàng)假這樣一個(gè)虛假形象,,需要目標(biāo)對(duì)象的大量照片,,還需要很多時(shí)間和相當(dāng)高超的編程和特效技術(shù)。即便假期臉被生成出來了,,以前的AI模型的響應(yīng)速度也不夠快,,無法實(shí)時(shí)生成視頻直播級(jí)的完美假臉。
然而從幣安和《美國(guó)達(dá)人秀》的例子能夠看出,,現(xiàn)在的情況已經(jīng)不同了,人們?cè)趯?shí)時(shí)視頻傳輸中使用深度偽造技術(shù)已經(jīng)越來越容易了,,而且此類軟件現(xiàn)在也是唾手可得,,很多還是免費(fèi)的,用起來也沒有什么技術(shù)門檻,。這也為各種各樣的電信詐騙甚至政治謠言提供了可能,。
加州大學(xué)伯克利分校(University of California at Berkeley)的計(jì)算機(jī)科學(xué)家哈尼·法里德是視頻分析和認(rèn)證領(lǐng)域的專家,。他感嘆道:“我對(duì)現(xiàn)在實(shí)時(shí)深度偽技術(shù)造的速度和質(zhì)量感到驚訝?!彼硎?,現(xiàn)在至少有三種不同的開源程序可以讓人們制作實(shí)時(shí)深度造假視頻。
法里德等專家都擔(dān)心深度偽造技術(shù)會(huì)使電信詐騙發(fā)展到一個(gè)新高度,?!斑@簡(jiǎn)直就像給網(wǎng)絡(luò)釣魚詐騙打了興奮劑?!彼f,。
識(shí)別深度造假的小技巧
好在專家表示,目前還是有很多小技巧能夠幫助你拆穿騙子的畫皮,。最可靠也最簡(jiǎn)單的方法,,就是讓對(duì)方側(cè)過臉去,讓鏡頭捕捉他的完整側(cè)臉,。深度偽造技術(shù)目前還無法保證側(cè)面不露破綻,,最主要的原因就是很難獲取足夠多的側(cè)面照片來訓(xùn)練AI模型。雖然有一些方法可以通過正面圖像推導(dǎo)出側(cè)面形象,,但這會(huì)大大增加生成圖像過程的復(fù)雜性,。
另外,深度偽造軟件還利用了人臉上的“錨點(diǎn)”,,來將深度偽造的“面具”匹配到人臉上,。所以只需要讓對(duì)方轉(zhuǎn)頭90度,就會(huì)導(dǎo)致一半的錨點(diǎn)不可見,,這通常就會(huì)導(dǎo)致圖像扭曲,、模糊、變形,,非常容易注意到,。
位于以色列的本古里安大學(xué)(Ben-Gurion University)進(jìn)攻性人工智能實(shí)驗(yàn)室(Offensive AI Lab)的負(fù)責(zé)人伊斯羅爾·米爾斯基還通過試驗(yàn)發(fā)現(xiàn)了很多能夠檢測(cè)出深度偽造的方法。比如在視頻通話過程中要求人們隨便拿一個(gè)東西在臉前劃過,,讓某個(gè)東西在他面前反彈一下,,讓他們整理一下自己的襯衫,摸一下頭發(fā),,或者用手遮擋半張臉,。以上每一種辦法,都會(huì)導(dǎo)致深度造假軟件無法描繪多出來的物體,,或者導(dǎo)致人臉嚴(yán)重失真,。對(duì)于音頻深度造假,米爾斯基建議你可以要求對(duì)方吹口哨,或者換一種口音說話,,或者隨機(jī)挑一首曲子讓對(duì)方哼唱,。
米爾斯基指出:“目前所有現(xiàn)有的深度偽造技術(shù)都采用了非常類似的協(xié)議。它們雖然接受了大量數(shù)據(jù)的訓(xùn)練,,但其模式是非常特定的,。多數(shù)軟件只能模仿人的正臉,而處理不好側(cè)面或者遮擋臉部的物體,?!?/p>
法里德也展示了一種檢測(cè)深度偽造的方法,那就是用一個(gè)簡(jiǎn)單的軟件程序,,讓對(duì)方的電腦屏幕以某種模式閃爍,,讓電腦屏幕在對(duì)方臉上投射某種模式的光線。深度偽造技術(shù)要么無法將燈光效果展示在模擬圖像中,,要么反應(yīng)速度太慢,。法里德表示,只要讓對(duì)方使用另一個(gè)光源——例如手機(jī)的手電筒,,從另一角度照亮他們的臉,,就可以達(dá)到類似的檢測(cè)效果。
米爾斯基表示,,要真實(shí)地模擬某個(gè)人做一些不尋常的事情,,人工智能軟件就需要看到幾千個(gè)某人做這種事情的例子。但收集這么多的數(shù)據(jù)是很困難的,。即便你成功訓(xùn)練AI軟件完成了這些有挑戰(zhàn)性的任務(wù)——比如拿起一根鉛筆,,從臉上劃過,且不露破綻,,那么只要你要求對(duì)方拿另一個(gè)東西代替鉛筆(例如一個(gè)杯子),,那么AI軟件還是會(huì)失敗。而且一般的詐騙分子也不太可能把假臉做到能夠攻克“鉛筆測(cè)試”和“側(cè)臉測(cè)試”的地步,。每個(gè)不同的任務(wù)都會(huì)增加AI模型訓(xùn)練的復(fù)雜性,。“你希望深度偽造軟件完善的方面是有限的,?!泵谞査够f。
深度偽造技術(shù)也在日益進(jìn)步
目前,,很少有安全專家建議大家在視頻通話前先驗(yàn)證身份——就像登陸很多網(wǎng)站要先填驗(yàn)證碼一樣,。不過米爾斯基和法里德都認(rèn)為,在一些重要場(chǎng)合,,視頻通話前先“驗(yàn)明正身”是有必要的,,比如政治領(lǐng)導(dǎo)人之間的對(duì)話,,或者有可能導(dǎo)致高額金額交易的對(duì)話。另外我們尤其要警惕一些反常的情形,,例如陌生號(hào)碼打來的電話,又或者人們的一些反常行為和要求,。
法里德建議,,對(duì)于一些非常重要的電話,你也可以使用簡(jiǎn)單的雙因素認(rèn)證,,比如你能夠同時(shí)給對(duì)方發(fā)條短信,,問問他是不是正在跟你視頻通話。
專家強(qiáng)調(diào),,深度偽造技術(shù)一直在進(jìn)步,,誰(shuí)也不能保證將來它們會(huì)不會(huì)突破上面的檢測(cè)手段,甚至是以上幾種手段的組合,。
正是考慮到了這一點(diǎn),,很多研究人員試圖從另一角度解決深度偽造的問題——例如創(chuàng)建某種數(shù)字簽名或者水印,來證明視頻通話的真實(shí)性,,而不是試圖揭露深度偽造行為,。
說到這里,就不得不提一個(gè)名叫“內(nèi)容來源和真實(shí)性聯(lián)合計(jì)劃”(Coalition for Content Provenance and Authentication,,簡(jiǎn)稱C2PA)的機(jī)構(gòu),,它是一個(gè)致力于建立數(shù)字媒體認(rèn)證標(biāo)準(zhǔn)的基金會(huì),該基金會(huì)得到了微軟(Microsoft),、Adobe,、索尼(Sony)和推特(Twitter)等公司的支持。法里德說:“我認(rèn)為內(nèi)容來源和真實(shí)性聯(lián)合計(jì)劃應(yīng)該重視這個(gè)問題,,他們已經(jīng)為視頻錄制建立了規(guī)范,,將它拓展到實(shí)時(shí)視頻通話也是一件很自然的事情?!钡ɡ锏峦瑫r(shí)也承認(rèn),,實(shí)視視頻數(shù)據(jù)的驗(yàn)證并非一項(xiàng)簡(jiǎn)單的技術(shù)挑戰(zhàn)?!拔椰F(xiàn)在還不知道應(yīng)該怎么做,,但它是一個(gè)值得思考的問題?!?/p>
最后提醒大家,,下次在Zoom軟件上開電話會(huì)議的時(shí)候,記得帶上一根鉛筆,。(財(cái)富中文網(wǎng))
譯者:樸成奎
The next time you get on a Zoom call, you might want to ask the person you’re speaking with to push their finger into the side of their nose. Or maybe turn in complete profile to the camera for a minute.
Those are just some of the methods experts have recommended as ways to provide assurance that you are seeing a real image of the person you are speaking to and not an impersonation created with deepfake technology.
It sounds like a strange precaution, but we live in strange times.
In August, a top executive of the cryptocurrency exchange Binance said that fraudsters had used a sophisticated deepfake “hologram” of him to scam several cryptocurrency projects. Patrick Hillmann, Binance’s chief communications officer, says criminals had used the deepfake to impersonate him on Zoom calls. (Hillmann has not provided evidence to support his claim and some experts are skeptical a deepfake was used. Nonetheless, security researchers say that such incidents are now plausible.) In July, the FBI warned that people could use deepfakes in job interviews conducted over video conferencing software. A month earlier, several European mayors said they were initially fooled by a deepfake video call purporting to be with Ukrainian President Volodymyr Zelensky. Meanwhile, a startup called Metaphysic that develops deepfake software has made it to the finals of “America’s Got Talent,” by creating remarkably good deepfakes of Simon Cowell and the other celebrity judges, transforming other singers into the celebs in real-time, right before the audience’s eyes.
Deepfakes are extremely convincing fake images and videos created through the use of artificial intelligence. It once required a lot of images of someone, a lot of time, and a fair-degree of both coding skill and special effects know-how to create a believable deepfake. And even once created, the A.I. model couldn’t be run fast enough to produce a deepfake in real-time on a live video transmission.
That’s no longer the case, as both the Binance story and Metaphysics “America’s Got Talent” act highlight. In fact, it’s becoming increasingly easy for people to use deepfake software to impersonate others in live video transmissions. Software allowing someone to do this is now readily available, for free, and requires relatively little technical skill to use. And as the Binance story also shows, this opens the possibility for all kinds of fraud—and political disinformation.
“I am surprised by how fast live deepfakes have come and how good they are,” Hany Farid, a computer scientist at the University of California at Berkeley who is an expert in video analysis and authentication, says. He says there are at least three different open source programs that allow people to create live deepfakes.
Farid is among those who worry that live deepfakes could supercharge fraud. “This is going to be like phishing scams on steroids,” he says.
The “pencil test” and other tricks to catch an A.I. impostor
Luckily, experts say there are still a number of techniques a person can use to give themselves a reasonable assurance that they are not communicating with a deepfake impersonation. One of the most reliable is simply to ask a person to turn so that the camera is capturing her in complete profile. Deepfakes struggle with profiles for a number of reasons. For most people, there aren’t enough profile images available to train a deepfake model to reliably reproduce the angle. And while there are ways to use computer software to estimate a profile view from a front-facing image, using this software adds complexity to the process of creating the deepfake.
Deepfake software also uses “anchor points” on a person’s face to properly position the deepfake “mask” on top of it. Turning 90 degrees eliminates half of the anchor points, which often results in the software warping, blurring, or distorting the profile image in strange ways that are very noticeable.
Yisroel Mirsky, a researcher who heads the Offensive AI Lab at Israel’s Ben-Gurion University, has experimented with a number of other methods for detecting live deepfakes that he has compared to the CAPTCHA system used by many websites to detect software bots (you know, the one that asks you to pick out all the images of traffic lights in a photo broken up into squares). His techniques include asking people on a video call to pick up a random object and move it across their face, to bounce an object, to lift up and fold their shirt, to stroke their hair, or to mask part of their face with their hand. In each case, either the deepfake will fail to depict the object being passed in front of the face or the method will cause serious distortion to the facial image. For audio deepfakes, Mirsky suggests asking the person to whistle, or to try to speak with an unusual accent, or to hum or sing a tune chosen at random.
“All of today’s existing deepfake technologies follow a very similar protocol,” Mirsky says. “They are trained on lots and lots of data and that data has to have a particular pattern you are teaching the model.” Most A.I. software is taught to just reliably mimic a person’s face seen from the front and can’t handle oblique angles or objects that occlude the face well.
Meanwhile, Farid has shown that another way to detect possible deepfakes is to use a simple software program that causes the other person’s computer screen to flicker in a certain pattern or causes it to project a light pattern onto the face of the person using the computer. Either the deepfake will fail to transfer the lighting effect to the impersonation or it will be too slow to do so. A similar detection might be possible just by asking someone to use another light source, such as a smartphone flashlight, to illuminate their face from a different angle, Farid says.
To realistically impersonate someone doing something unusual, Mirsky says that the AI software needs to have seen thousands of examples of people doing that thing. But collecting a data set like that is difficult. And even if you could train the A.I. to reliably impersonate someone doing one of these challenging tasks—like picking up a pencil and passing it in front of their face—the deepfake is still likely to fail if you ask the person to use a very different kind of object, like a mug. And attackers using deepfakes are also unlikely to have been able to train a deepfake to overcome multiple challenges, like both the pencil test and the profile test. Each different task, Mirsky says, increases the complexity of the training the A.I. requires. “You are limited in the aspects you want the deepfake software to perfect,” he says.
Deepfakes are getting better all the time
For now, few security experts are suggesting that people will need to use these CAPTCHA-like challenges for every Zoom meeting they take. But Mirsky and Farid both said that people might be wise to use them in high-stakes situations, such as a call between political leaders, or a meeting that might result in a high-value financial transaction. And both Farid and Mirsky urged people to be attuned to other possible red flags, such as audio calls from unfamiliar numbers or people behaving strangely or making unusual requests.
Farid says that for very important calls, people might use some kind simple two-factor authentication, such as sending a text message to a mobile number you know to be the correct one for that person, asking if they are on a video call right now with you.
The researchers also emphasized that deepfakes are getting better all the time and that there is no guarantee that it won’t become much easier for them to evade any particular challenge—or even combinations of them—in the future.
That’s also why many researchers are trying to address the problem of live deepfakes from the opposite perspective—creating some sort of digital signature or watermark that would prove that a video call is authentic, rather than trying to uncover a deepfake.
One group that might work on a protocol for verifying live video calls is the Coalition for Content Provenance and Authentication (C2PA)—a foundation dedicated to digital media authentication standards that’s backed by companies including Microsoft, Adobe, Sony, Twitter. “I think the C2PA should pick this up because they have built specification for recorded video and extending it for live video is a natural thing,” Farid says. But Farid admits that trying to authenticate data that is being streamed in real-time is not an easy technological challenge. “I don’t see immediately how to do it, but it will interesting to think about,” he says.
In the meantime, remind the guests on your next Zoom call to bring a pencil to the meeting.