Facebook于周二在一系列博文中指出,該公司是多項人工智能技術的先驅,,這些技術被其用于監(jiān)管其社交網(wǎng)絡內容,。
就在Facebook發(fā)布所用技術細節(jié)的同一天,它還發(fā)布了最新的季度信息更新,,介紹了其在應對仇恨言論,、兒童色情、虛假賬戶,、政治錯誤信息,、恐怖主義惡意宣傳,以及其他社區(qū)標準違反行為方面所采取的舉措,。該報告稱,,公司自年初以來一直在遏制仇恨言論以及新冠疫情相關錯誤信息的激增。
Facebook在周二重點提到的新人工智能系統(tǒng)包括:能夠更好地理解語義及其所使用語境的系統(tǒng),以及結合圖片和語言處理以偵測有害模因的新系統(tǒng),。
為了幫助遏制與新冠疫情相關的錯誤信息,,該公司還采用了新人工智能算法來監(jiān)管新廣告政策的實施。該政策旨在禁止發(fā)布利用新冠疫情來謀取利益的廣告,,例如口罩,、洗手液和其他物品的銷售廣告。
Facebook在一篇博文中指出,,公司在4月向5000萬個帖子發(fā)出了警告,,因為這些帖子可能涉及與新冠疫情有關的錯誤信息。公司還稱,,自3月初以來,,公司已經刪除了250萬條違反個人防護設備或冠狀病毒測試包銷售禁令的內容。
Facebook稱,,得益于這項新技術,,在過去一個季度刪除的帖文中,有88.8%都是在他人看見并向公司的人工審核員舉報之前便已被自動偵測,,較前一個季度的80%有所增長,。
但Facebook稱,公司發(fā)現(xiàn)的仇恨言論總數(shù)依然在上升,,2020年前三個月共刪除了960萬條此類內容,,較上個季度多出了390萬條。
Facebook的首席技術官邁克·施洛普弗表示,,仇恨言論數(shù)量的增加源于公司檢測仇恨言論能力的增強,,而不是仇恨言論自身數(shù)量的增加。他在報告發(fā)布之前的一次記者電話會議上說:“我認為這一點明顯得益于技術進步,?!?/p>
特別值得一提的是,F(xiàn)acebook的這一技術得益于海量語言學習算法的進步,,只不過它是近三年期間才開發(fā)出來的算法,。這些模型的工作方式在于,繪制一張能夠展示貼文內容文字與貼文發(fā)布前后其他文字關聯(lián)度的統(tǒng)計圖譜,。Facebook已經開發(fā)了一個名為XLM-R的系統(tǒng),,它經過了2TB數(shù)據(jù)的訓練,相當于50萬冊300頁書籍所含的所有文字,。它會一次性學習所有這些文字的多種語言統(tǒng)計圖譜,。該算法背后的理念在于,任何語言的仇恨言論在概念上的共性意味著仇恨言論的統(tǒng)計圖譜在任何語言中看起來都是相似的,,哪怕語言本身完全不同,。
Facebook正在盡力展示自己在這一方面取得的成功,,因為首席執(zhí)行官扎克伯格曾多次承諾機器學習和人工智能將允許公司在其各大平臺遏制仇恨言論、恐怖主義言論和政治錯誤信息的散播,。在過去四年中,,這些問題已經讓Facebook成為了全球監(jiān)管方的靶子,而且也讓很多曾幾何時的粉絲站到了公司的對立面,。
施洛普弗說:“我們很現(xiàn)實,。人工智能并非是所有問題的解決方案,而且我們認為在可預見的未來,,人力依然是不可或缺的部分?!?/p>
Facebook介紹的大多數(shù)技術旨在簡化其內容管理人員和相關事實核查機構的工作,,并減少重復性。
當前,,F(xiàn)acebook和眾多國家實施的社交隔離舉措意味著其內容管理人員的工作場所將不得不關閉,,而且審核人員也都返回了家中,其中很多都是合同工,,因此這些技術在眼下這個時期尤為重要,。施羅普弗表示,在某些情況下,,公司已經通過一些方式讓這些人能夠繼續(xù)在家工作,,不過,并非所有人都可以做到這一點,。
施羅普弗說:“我們希望人們能夠做出最終決定,,尤其是在當前這個局面比較微妙的時候。但我們也希望可以為我們與之共事的人員提供日常所需的強有力工具,。”他指出,例如,,如果一名內容審核員認定一整套圖片都包含錯誤信息,,那么Facebook應該能夠自動將這一標簽應用至Facebook和Facebook旗下Instagram的類似內容,這樣便無需審核人員找出所有內容并手動刪除,。
人們嘗試規(guī)避Facebook內容黑名單的一種方式在于,,對被屏蔽的內容進行小幅修改,例如更改圖片中的某些像素區(qū)域或使用照片濾鏡,,然后嘗試再次上載,,并希望其能夠逃過Facebook的算法。為了應對這類手段,,公司已經開發(fā)了一套名為SimSearchNet的新人工智能系統(tǒng),,旨在尋找相似度很高的內容,。
為了推行其新冠疫情相關的廣告政策舉措,F(xiàn)acebook部署了另一個計算機視覺系統(tǒng),,它能識別圖像中的物體,,而不是簡單地將其所含像素匯總為統(tǒng)計圖譜。施羅普弗說,,通過這種方式,,哪怕口罩遭到扭曲或放置于讓機器學習軟件難以識別的背景當中,算法應該都能識別出來,。
最后,,F(xiàn)acebook稱其正在開發(fā)“多模式”機器學習系統(tǒng),來應對仇恨模因的散布,。該系統(tǒng)能夠分析文本和圖片,,而且在未來有望分析視頻和聲音。
為了實現(xiàn)這一目標,,F(xiàn)acebook已經打造了一個由1萬個模因構成的新數(shù)據(jù)集,,并將其作為遏制仇恨言論舉措的一部分,而且研究人員可免費使用這一資源來打造能夠成功偵測仇恨言論的人工智能系統(tǒng),。Facebook將舉辦一個獎金達10萬美元的競賽,,以尋找最佳仇恨言論偵測軟件。但參賽的前提是,,研究人員必須開放其算法的源代碼,。
作為基準,F(xiàn)acebook的人工智能研究人員自行開發(fā)了多款系統(tǒng),,并利用上述數(shù)據(jù)集來培訓這些系統(tǒng),。然而,公司目前的結果顯示了該挑戰(zhàn)的難度:盡管Facebook最好的仇恨言論偵測器已經同時接受了大量文本和圖片數(shù)據(jù)集的培訓,,但其準確率只有63%,。作為對比,人工審核的準確率約為85%,,遺漏率不到20%,。(財富中文網(wǎng))
譯者:Feb
Facebook于周二在一系列博文中指出,該公司是多項人工智能技術的先驅,,這些技術被其用于監(jiān)管其社交網(wǎng)絡內容,。
就在Facebook發(fā)布所用技術細節(jié)的同一天,它還發(fā)布了最新的季度信息更新,,介紹了其在應對仇恨言論,、兒童色情、虛假賬戶,、政治錯誤信息,、恐怖主義惡意宣傳,,以及其他社區(qū)標準違反行為方面所采取的舉措。該報告稱,,公司自年初以來一直在遏制仇恨言論以及新冠疫情相關錯誤信息的激增,。
Facebook在周二重點提到的新人工智能系統(tǒng)包括:能夠更好地理解語義及其所使用語境的系統(tǒng),以及結合圖片和語言處理以偵測有害模因的新系統(tǒng),。
為了幫助遏制與新冠疫情相關的錯誤信息,,該公司還采用了新人工智能算法來監(jiān)管新廣告政策的實施。該政策旨在禁止發(fā)布利用新冠疫情來謀取利益的廣告,,例如口罩,、洗手液和其他物品的銷售廣告。
Facebook在一篇博文中指出,,公司在4月向5000萬個帖子發(fā)出了警告,,因為這些帖子可能涉及與新冠疫情有關的錯誤信息。公司還稱,,自3月初以來,公司已經刪除了250萬條違反個人防護設備或冠狀病毒測試包銷售禁令的內容,。
Facebook稱,,得益于這項新技術,在過去一個季度刪除的帖文中,,有88.8%都是在他人看見并向公司的人工審核員舉報之前便已被自動偵測,,較前一個季度的80%有所增長。
但Facebook稱,,公司發(fā)現(xiàn)的仇恨言論總數(shù)依然在上升,,2020年前三個月共刪除了960萬條此類內容,較上個季度多出了390萬條,。
Facebook的首席技術官邁克·施洛普弗表示,,仇恨言論數(shù)量的增加源于公司檢測仇恨言論能力的增強,而不是仇恨言論自身數(shù)量的增加,。他在報告發(fā)布之前的一次記者電話會議上說:“我認為這一點明顯得益于技術進步,。”
特別值得一提的是,,F(xiàn)acebook的這一技術得益于海量語言學習算法的進步,,只不過它是近三年期間才開發(fā)出來的算法。這些模型的工作方式在于,,繪制一張能夠展示貼文內容文字與貼文發(fā)布前后其他文字關聯(lián)度的統(tǒng)計圖譜,。Facebook已經開發(fā)了一個名為XLM-R的系統(tǒng),它經過了2TB數(shù)據(jù)的訓練,,相當于50萬冊300頁書籍所含的所有文字,。它會一次性學習所有這些文字的多種語言統(tǒng)計圖譜,。該算法背后的理念在于,任何語言的仇恨言論在概念上的共性意味著仇恨言論的統(tǒng)計圖譜在任何語言中看起來都是相似的,,哪怕語言本身完全不同,。
Facebook正在盡力展示自己在這一方面取得的成功,因為首席執(zhí)行官扎克伯格曾多次承諾機器學習和人工智能將允許公司在其各大平臺遏制仇恨言論,、恐怖主義言論和政治錯誤信息的散播,。在過去四年中,這些問題已經讓Facebook成為了全球監(jiān)管方的靶子,,而且也讓很多曾幾何時的粉絲站到了公司的對立面,。
施洛普弗說:“我們很現(xiàn)實。人工智能并非是所有問題的解決方案,,而且我們認為在可預見的未來,,人力依然是不可或缺的部分?!?/p>
Facebook介紹的大多數(shù)技術旨在簡化其內容管理人員和相關事實核查機構的工作,,并減少重復性。
當前,,F(xiàn)acebook和眾多國家實施的社交隔離舉措意味著其內容管理人員的工作場所將不得不關閉,,而且審核人員也都返回了家中,其中很多都是合同工,,因此這些技術在眼下這個時期尤為重要,。施羅普弗表示,在某些情況下,,公司已經通過一些方式讓這些人能夠繼續(xù)在家工作,,不過,并非所有人都可以做到這一點,。
施羅普弗說:“我們希望人們能夠做出最終決定,,尤其是在當前這個局面比較微妙的時候。但我們也希望可以為我們與之共事的人員提供日常所需的強有力工具,?!彼赋觯?,如果一名內容審核員認定一整套圖片都包含錯誤信息,,那么Facebook應該能夠自動將這一標簽應用至Facebook和Facebook旗下Instagram的類似內容,這樣便無需審核人員找出所有內容并手動刪除,。
人們嘗試規(guī)避Facebook內容黑名單的一種方式在于,,對被屏蔽的內容進行小幅修改,例如更改圖片中的某些像素區(qū)域或使用照片濾鏡,,然后嘗試再次上載,,并希望其能夠逃過Facebook的算法,。為了應對這類手段,公司已經開發(fā)了一套名為SimSearchNet的新人工智能系統(tǒng),,旨在尋找相似度很高的內容,。
為了推行其新冠疫情相關的廣告政策舉措,F(xiàn)acebook部署了另一個計算機視覺系統(tǒng),,它能識別圖像中的物體,,而不是簡單地將其所含像素匯總為統(tǒng)計圖譜。施羅普弗說,,通過這種方式,,哪怕口罩遭到扭曲或放置于讓機器學習軟件難以識別的背景當中,算法應該都能識別出來,。
最后,,F(xiàn)acebook稱其正在開發(fā)“多模式”機器學習系統(tǒng),來應對仇恨模因的散布,。該系統(tǒng)能夠分析文本和圖片,,而且在未來有望分析視頻和聲音。
為了實現(xiàn)這一目標,,F(xiàn)acebook已經打造了一個由1萬個模因構成的新數(shù)據(jù)集,,并將其作為遏制仇恨言論舉措的一部分,而且研究人員可免費使用這一資源來打造能夠成功偵測仇恨言論的人工智能系統(tǒng),。Facebook將舉辦一個獎金達10萬美元的競賽,以尋找最佳仇恨言論偵測軟件,。但參賽的前提是,,研究人員必須開放其算法的源代碼。
作為基準,,F(xiàn)acebook的人工智能研究人員自行開發(fā)了多款系統(tǒng),,并利用上述數(shù)據(jù)集來培訓這些系統(tǒng)。然而,,公司目前的結果顯示了該挑戰(zhàn)的難度:盡管Facebook最好的仇恨言論偵測器已經同時接受了大量文本和圖片數(shù)據(jù)集的培訓,,但其準確率只有63%。作為對比,,人工審核的準確率約為85%,,遺漏率不到20%。(財富中文網(wǎng))
譯者:Feb
It has pioneered a number of artificial intelligence techniques to help it police content across its social networks, Facebook said Tuesday in a series of blog posts.
The details about the technology Facebook is using came on the same day the company released its latest quarterly update on its efforts to combat hate speech, child pornography, fake accounts, political misinformation, terrorist propaganda, and other violations of its community standards. The report showed the company was combating a big surge in hate speech and COVID-19 related misinformation since the start of the year.
Among the new A.I. systems Facebook highlighted on Tuesday are systems that better understand the meaning of language and the context in which it is used, as well as nascent systems that combine image and language processing in order to detect harmful memes.
As well as helping to combat misinformation related to COVID-19, Facebook has also turned to new A.I. algorithms to police its new policy banning ads selling face masks, hand sanitizer, and other items that seek to exploit the pandemic for profit.
The company put warning labels on 50 million posts in April for possible misinformation around COVID-19, the company said in a blog. It also said that since the beginning of March it has removed 2.5 million pieces of content that violated rules about selling personal protective equipment or coronavirus test kits.
Facebook said that thanks to the new techniques, 88.8% of the hate speech the social network took down in the past quarter was detected automatically before someone saw and flagged the offensive material for review by the company's human reviewers. This is up from about 80% in the previous quarter.
But the company said that the total amount of hate speech it's finding continues to rise—9.6 million pieces of content were removed in the first three months of 2020, 3.9 million more than in the previous three months.
Mike Schroepfer, Facebook's chief technology officer, said the increase was due to the company getting better at finding hateful content, not a surge in hate speech itself. "I think this is clearly attributable to technological advances," he said on a call with reporters ahead of the release of the report.
In particular, Facebook has built on advances in very large language learning algorithms that have only been developed in the past three years. These models work by building a statistical picture of how the words in posted content relate to the other words that come both before and after it. Facebook has developed a system called XLM-R, trained on two terrabytes of data, or about the equivalent of all the words in half a million 300-page books. It learns the statistical map of all of those words across multiple languages at once. The idea is that conceptual commonalities between hate speech in any language will mean the statistical maps of hate speech will look similar across every language even if the words themselves are completely different.
Facebook is at pains to show it is making good at CEO Mark Zuckerberg's repeated promises that machine learning and A.I. will enable the company to combat the spread of hate speech, terrorist propaganda, and political misinformation across its platforms—problems that have put Facebook in the crosshairs of regulators globally and turned many one-time fans against the company in the past four years.
"We are not naive," Schroepfer said. "A.I. is not the solution to every single problem and we believe that humans will be in the loop for the foreseeable future."
Much of the tech Facebook highlighted is designed to make the job of its human content moderators and associated fact-checking organizations easier and less repetitive.
That is especially important at a time when social distancing measures instituted by the company as well as by various countries have meant that the centers where many of its human content moderators work have had to close, and the reviewers, many of whom are contractors, have been sent home. In some cases, Schroepfer said, the company has found ways for these people to continue their work from home, although that has not been possible in all cases.
"We want people making the final decisions, especially when the situation is nuanced," Schroepfer said. "But we want to give people we work with every day power tools." For instance, he said, if a human reviewer decided that a whole class of images constituted misinformation, Facebook should be able to automatically apply that label to similar content across both Facebook and Facebook-owned Instagram without the human reviewers having to find and manually remove all of it.
One way people try to evade Facebook's content blacklists is by making small modifications to blocked content—altering some pixels in an image or using a photo filter, for instance—and then trying to upload it again and hope it sneaks past Facebook's algorithms. To battle these tactics, the company has developed a new A.I. system, called SimSearchNet, trained to find pieces of nearly identical content.
Another computer vision system the company has deployed to enforce its new COVID-19 ad policy works by identifying the objects present in an image, not simply forming a statistical map of all of the pixels it contains. This way the algorithm should be able to determine that the image has a face mask in it, even if that face mask is rotated at a funny angle or shown against a background designed to make it harder for machine learning software to recognize it, Schroepfer said.
Finally, the company said it was also working on "multimodal" machine learning systems—ones that can simultaneously analyze text and imagery, and in the future, possibly video and sound too—to combat the spread of hateful memes.
To that end, the company has created a new dataset consisting of 10,000 memes that were determined to be part of hate speech campaigns and it is making it freely available for researchers to use to build A.I. systems capable of successfully detecting them. The company is creating a competition with a $100,000 prize pool to find the best hateful meme detection software, with the condition that in order to enter the contest, researchers must commit to open-sourcing their algorithms.
As a benchmark, Facebook's A.I. researchers created several systems of their own and trained them on this dataset. But the company's results so far indicate how difficult the challenge is: Facebook's best hateful meme detector, which was pre-trained on very large dataset of both text and images simultaneously, was only 63% accurate. Human reviewers, by contrast, were about 85% accurate and missed less than 20% of the memes it should have caught.