今年1月,,我為《財(cái)富》雜志撰寫(xiě)了一篇特別報(bào)道,縱論方興未艾的自然語(yǔ)言處理(NLP)革命。這些人工智能系統(tǒng)不僅可以操縱語(yǔ)言,,而且在一定程度上還能夠“理解”語(yǔ)言,。
語(yǔ)言處理正在進(jìn)入一個(gè)黃金時(shí)代,曾經(jīng)不可能完成的任務(wù)越來(lái)越觸手可及,。這些新系統(tǒng)已經(jīng)開(kāi)始改變企業(yè)的運(yùn)作方式,,并且有望在未來(lái)幾年以一種更為戲劇化的方式實(shí)現(xiàn)這一轉(zhuǎn)變。
今年夏天出現(xiàn)了一些令人震驚的例證,,充分彰顯了這些方法所能取得的成就,。人們討論最多的是研究機(jī)構(gòu)OpenAI開(kāi)發(fā)的GPT-3系統(tǒng)。它可以從人類(lèi)書(shū)寫(xiě)的一兩行文字提示中生成條理清楚的長(zhǎng)篇大論,。在許多情況下,,系統(tǒng)生成的內(nèi)容與人類(lèi)書(shū)寫(xiě)的文本沒(méi)有什么區(qū)別。
目前,,GPT-3仍然只是一種派對(duì)把戲,。例如,它很難控制系統(tǒng)生成的內(nèi)容是否符合事實(shí),,是否過(guò)濾掉它可能從龐大的訓(xùn)練集(其中不僅包括莎士比亞的所有作品,,還包括像Reddit這樣的人類(lèi)美德庫(kù))中獲取的種族主義或歧視女性想法。但一些公司已經(jīng)開(kāi)始圍繞它開(kāi)發(fā)真正的產(chǎn)品,。比如,,一家公司正在創(chuàng)建的系統(tǒng)能夠從幾個(gè)要點(diǎn)中生成完整的電子郵件;一家法律技術(shù)公司正在試驗(yàn)GPT-3,,看它能否在訴訟發(fā)現(xiàn)和合規(guī)方面有所幫助,。
舊金山另一家人工智能公司Primer開(kāi)發(fā)了一款文檔分析軟件。多家美國(guó)情報(bào)機(jī)構(gòu)都是它的客戶(hù),。8月18日,,該公司發(fā)布了一個(gè)名為Primer Labs的網(wǎng)站,意在展示它在過(guò)去一年中創(chuàng)建的三個(gè)NLP系統(tǒng),,并允許任何人上傳任何文本來(lái)使用這項(xiàng)技術(shù),。
早在去年12月,我就為撰寫(xiě)那篇談?wù)揘LP的特別報(bào)道采訪(fǎng)過(guò)Primer科學(xué)總監(jiān)約翰·博漢農(nóng),。上周,,我又通過(guò)Zoom對(duì)他進(jìn)行了采訪(fǎng)。博漢農(nóng)告訴我,,自從我們第一次談話(huà)以來(lái),,這項(xiàng)技術(shù)一直在加速發(fā)展。
他將NLP領(lǐng)域正在發(fā)生的事情描述為“一場(chǎng)工業(yè)革命”?,F(xiàn)在,,將多個(gè)NLP工具組合在一起成為可能——就像機(jī)械工程師將鍋爐、飛輪、傳送帶和壓力機(jī)結(jié)合起來(lái)一樣——從而創(chuàng)造出一些能夠在實(shí)際業(yè)務(wù)中從事實(shí)際工作的系統(tǒng),。建立這些系統(tǒng)變得越來(lái)越容易,。“過(guò)去需要幾個(gè)月,,現(xiàn)在只需要一周時(shí)間,。”他說(shuō),。
博漢農(nóng)給了我提前進(jìn)入Primer Labs的訪(fǎng)問(wèn)權(quán),,讓我用自己選擇的文本進(jìn)行試驗(yàn)。
第一個(gè)工具:?jiǎn)柎?/strong>
上傳任何文檔,,然后你就可以用自然語(yǔ)言提問(wèn),,提示系統(tǒng)在文本中找到答案。系統(tǒng)還會(huì)提示一些你可能想問(wèn)的問(wèn)題,。
·對(duì)于一篇關(guān)于民主黨總統(tǒng)候選人喬·拜登選擇賀錦麗作為副總統(tǒng)人選的新聞簡(jiǎn)報(bào),,這款軟件在回答一系列問(wèn)題時(shí)表現(xiàn)得非常出色。
·然而,,當(dāng)我上傳制藥巨頭默沙東公司在2012年提交給美國(guó)證券交易委員會(huì)的一份長(zhǎng)達(dá)159頁(yè),,約10萬(wàn)字的文件時(shí),這款軟件的表現(xiàn)卻差強(qiáng)人意,,缺乏連貫性,。我問(wèn)它默沙東公司2011年的銷(xiāo)售額是多少,它給出了正確的答案:480億美元,。但當(dāng)我問(wèn)它該公司的營(yíng)業(yè)利潤(rùn)是多少時(shí),,我收到的信息是,軟件“在回答這個(gè)特定問(wèn)題時(shí)遇到了困難,?!?對(duì)于該公司奉行什么樣的收入確認(rèn)政策這一問(wèn)題,我收到了一個(gè)不準(zhǔn)確但很搞笑的回答:“非公認(rèn)會(huì)計(jì)原則每股收益是公司的收入確認(rèn)政策,?!?/p>
下一個(gè)Primer工具:“實(shí)體識(shí)別”
它的任務(wù)是識(shí)別文檔中所有專(zhuān)有名稱(chēng),并弄清楚文本中哪些代詞指的是哪些人或哪些組織,。對(duì)人類(lèi)來(lái)說(shuō),這項(xiàng)任務(wù)相對(duì)簡(jiǎn)單,,盡管有時(shí)候非常耗時(shí),。但它向來(lái)都讓計(jì)算機(jī)望而卻步。這個(gè)例子表明,,NLP革命現(xiàn)在幫助軟件掌握了新技能,。在Primer 發(fā)布的基準(zhǔn)測(cè)試中,其系統(tǒng)的表現(xiàn)優(yōu)于谷歌和Facebook開(kāi)發(fā)的類(lèi)似軟件。
·為了給Primer軟件出難題,,我特意上傳了一篇談?wù)?9世紀(jì)法國(guó)作家喬治·桑和維克多·雨果的文章,。需要指出的是,喬治·桑其實(shí)是一位女作家的筆名,,盡管它聽(tīng)起來(lái)很像男性的名字(她的原名是阿曼蒂娜-露西-奧蘿爾·杜班),。我的如意算盤(pán)是,系統(tǒng)可能會(huì)發(fā)蒙,,無(wú)法判斷代詞“他”究竟指的是桑,,還是雨果。但令我驚訝的是,,其表現(xiàn)完美無(wú)缺,,它明白這段話(huà)中的每一個(gè)“他 ”都是指雨果,而“她 ”指的是桑,。
Primer Labs工具執(zhí)行的最后一項(xiàng),,或許也是最困難的任務(wù):總結(jié)
對(duì)人類(lèi)來(lái)說(shuō),準(zhǔn)確地總結(jié)長(zhǎng)文也是很困難的,。而衡量一段摘要的有用程度往往是一件非常主觀的事情,。但Primer想出了一個(gè)聰明的辦法——根據(jù)BERT自動(dòng)判斷摘要的質(zhì)量。BERT是谷歌創(chuàng)建并免費(fèi)提供的一個(gè)非常龐大的語(yǔ)言模型,,以“屏蔽語(yǔ)言模型”著稱(chēng),,因?yàn)樗挠?xùn)練包括學(xué)習(xí)如何正確猜出文本中隱藏的單詞。Primer開(kāi)發(fā)的BLANC系統(tǒng)通過(guò)評(píng)估BERT在摘要填空游戲中的表現(xiàn)有多好來(lái)評(píng)判摘要,。BERT做得越好,,摘要的質(zhì)量就越高。借助于BLANC系統(tǒng),,Primer能夠訓(xùn)練出一個(gè)可以生成非常流暢的摘要的總結(jié)工具,。
·我給Primer的總結(jié)工具上傳了一篇我為《財(cái)富》雜志8 / 9月刊撰寫(xiě)的專(zhuān)題報(bào)道,內(nèi)容是阿斯利康制藥如何在追尋新冠病毒疫苗的過(guò)程中,,成功地走在了制藥巨頭的前列,。這款軟件在總結(jié)這篇長(zhǎng)文方面的出色表現(xiàn)給我留下了深刻印象。它抓住了阿斯利康制藥成功轉(zhuǎn)型的關(guān)鍵點(diǎn),,以及新冠疫苗的極端重要性,。
·但這個(gè)系統(tǒng)還遠(yuǎn)遠(yuǎn)不夠完美。該工具的另一個(gè)部分試圖將文本精簡(jiǎn)為幾個(gè)要點(diǎn),,而不是整個(gè)段落,。在這里,它輸出的結(jié)果非常奇怪地偏離了文本要旨:這款軟件專(zhuān)注于文章開(kāi)頭提及的一則軼事所包含的事實(shí)信息(盡管這些信息無(wú)關(guān)主旨,,并不重要),,但忽略了文章正文中包含的關(guān)鍵點(diǎn),。
·出于搞笑的目的,我給系統(tǒng)輸入了T.S. ·艾略特的經(jīng)典作品《J. ·阿爾弗瑞德·普魯弗洛克的情歌》,。博漢農(nóng)事先警告我說(shuō),,這款軟件很難總結(jié)創(chuàng)意文字,尤其是詩(shī)歌,,而結(jié)果也確實(shí)不夠理想,。除了對(duì)“房間里的女人們來(lái)往穿梭,談?wù)撝组_(kāi)朗基羅”這行詩(shī)歌理解到位之外,,系統(tǒng)并不確定到底發(fā)生了什么,。很多高中生大概都能感同身受。但沒(méi)有一個(gè)英語(yǔ)老師會(huì)給Primer的成績(jī)打高分,。(有趣的是,,GPT-3在寫(xiě)詩(shī)方面表現(xiàn)得還不錯(cuò)。但這并不意味著它能真正理解自己在寫(xiě)什么,。)
話(huà)又說(shuō)回來(lái),,詩(shī)歌可能不是最迫切需要Primer產(chǎn)品施以援手的商業(yè)領(lǐng)域??偨Y(jié)是一個(gè)巨大的潛在市場(chǎng),。1995年,對(duì)一位負(fù)責(zé)追蹤某個(gè)國(guó)家動(dòng)態(tài)的美國(guó)情報(bào)分析師每天的閱讀量要求只有2萬(wàn)字(大約相當(dāng)于《紐約客》雜志的兩篇長(zhǎng)文),。到2016年,,同一位分析師的每日閱讀量估計(jì)要達(dá)到20萬(wàn)字——超過(guò)了看書(shū)最快的讀者在24小時(shí)內(nèi)的閱讀極限。這種現(xiàn)象也在影響著金融和法律領(lǐng)域的分析師,。而對(duì)于那些試圖跟上爆炸式增長(zhǎng)的學(xué)術(shù)論文的科學(xué)界人士來(lái)說(shuō),,這同樣是一個(gè)大問(wèn)題。(事實(shí)上,,為了幫助廣大公眾應(yīng)對(duì)疫情,,Primer專(zhuān)門(mén)創(chuàng)建了一個(gè)網(wǎng)站來(lái)總結(jié)每天發(fā)布的關(guān)于新冠病毒的新論文。)
因此,,NLP革命來(lái)得正是時(shí)候,。能夠簡(jiǎn)縮、總結(jié),,以及從文本中提取信息的自動(dòng)化工具正變得越來(lái)越重要,。如今的NLP技術(shù)還不夠完美,但它已經(jīng)足夠好,,完全有能力在廣泛的領(lǐng)域一展身手,。(財(cái)富中文網(wǎng))
譯者:任文科
今年1月,我為《財(cái)富》雜志撰寫(xiě)了一篇特別報(bào)道,,縱論方興未艾的自然語(yǔ)言處理(NLP)革命,。這些人工智能系統(tǒng)不僅可以操縱語(yǔ)言,而且在一定程度上還能夠“理解”語(yǔ)言,。
語(yǔ)言處理正在進(jìn)入一個(gè)黃金時(shí)代,,曾經(jīng)不可能完成的任務(wù)越來(lái)越觸手可及。這些新系統(tǒng)已經(jīng)開(kāi)始改變企業(yè)的運(yùn)作方式,,并且有望在未來(lái)幾年以一種更為戲劇化的方式實(shí)現(xiàn)這一轉(zhuǎn)變,。
今年夏天出現(xiàn)了一些令人震驚的例證,充分彰顯了這些方法所能取得的成就,。人們討論最多的是研究機(jī)構(gòu)OpenAI開(kāi)發(fā)的GPT-3系統(tǒng),。它可以從人類(lèi)書(shū)寫(xiě)的一兩行文字提示中生成條理清楚的長(zhǎng)篇大論。在許多情況下,,系統(tǒng)生成的內(nèi)容與人類(lèi)書(shū)寫(xiě)的文本沒(méi)有什么區(qū)別,。
目前,GPT-3仍然只是一種派對(duì)把戲,。例如,,它很難控制系統(tǒng)生成的內(nèi)容是否符合事實(shí),是否過(guò)濾掉它可能從龐大的訓(xùn)練集(其中不僅包括莎士比亞的所有作品,,還包括像Reddit這樣的人類(lèi)美德庫(kù))中獲取的種族主義或歧視女性想法,。但一些公司已經(jīng)開(kāi)始圍繞它開(kāi)發(fā)真正的產(chǎn)品。比如,,一家公司正在創(chuàng)建的系統(tǒng)能夠從幾個(gè)要點(diǎn)中生成完整的電子郵件,;一家法律技術(shù)公司正在試驗(yàn)GPT-3,看它能否在訴訟發(fā)現(xiàn)和合規(guī)方面有所幫助,。
舊金山另一家人工智能公司Primer開(kāi)發(fā)了一款文檔分析軟件,。多家美國(guó)情報(bào)機(jī)構(gòu)都是它的客戶(hù)。8月18日,,該公司發(fā)布了一個(gè)名為Primer Labs的網(wǎng)站,,意在展示它在過(guò)去一年中創(chuàng)建的三個(gè)NLP系統(tǒng),并允許任何人上傳任何文本來(lái)使用這項(xiàng)技術(shù),。
早在去年12月,,我就為撰寫(xiě)那篇談?wù)揘LP的特別報(bào)道采訪(fǎng)過(guò)Primer科學(xué)總監(jiān)約翰·博漢農(nóng)。上周,,我又通過(guò)Zoom對(duì)他進(jìn)行了采訪(fǎng),。博漢農(nóng)告訴我,自從我們第一次談話(huà)以來(lái),,這項(xiàng)技術(shù)一直在加速發(fā)展,。
他將NLP領(lǐng)域正在發(fā)生的事情描述為“一場(chǎng)工業(yè)革命”。現(xiàn)在,,將多個(gè)NLP工具組合在一起成為可能——就像機(jī)械工程師將鍋爐,、飛輪,、傳送帶和壓力機(jī)結(jié)合起來(lái)一樣——從而創(chuàng)造出一些能夠在實(shí)際業(yè)務(wù)中從事實(shí)際工作的系統(tǒng)。建立這些系統(tǒng)變得越來(lái)越容易,?!斑^(guò)去需要幾個(gè)月,現(xiàn)在只需要一周時(shí)間,?!彼f(shuō)。
博漢農(nóng)給了我提前進(jìn)入Primer Labs的訪(fǎng)問(wèn)權(quán),,讓我用自己選擇的文本進(jìn)行試驗(yàn),。
第一個(gè)工具:?jiǎn)柎?/strong>
上傳任何文檔,然后你就可以用自然語(yǔ)言提問(wèn),,提示系統(tǒng)在文本中找到答案,。系統(tǒng)還會(huì)提示一些你可能想問(wèn)的問(wèn)題。
·對(duì)于一篇關(guān)于民主黨總統(tǒng)候選人喬·拜登選擇賀錦麗作為副總統(tǒng)人選的新聞簡(jiǎn)報(bào),,這款軟件在回答一系列問(wèn)題時(shí)表現(xiàn)得非常出色,。
·然而,當(dāng)我上傳制藥巨頭默沙東公司在2012年提交給美國(guó)證券交易委員會(huì)的一份長(zhǎng)達(dá)159頁(yè),,約10萬(wàn)字的文件時(shí),,這款軟件的表現(xiàn)卻差強(qiáng)人意,缺乏連貫性,。我問(wèn)它默沙東公司2011年的銷(xiāo)售額是多少,,它給出了正確的答案:480億美元。但當(dāng)我問(wèn)它該公司的營(yíng)業(yè)利潤(rùn)是多少時(shí),,我收到的信息是,,軟件“在回答這個(gè)特定問(wèn)題時(shí)遇到了困難?!?對(duì)于該公司奉行什么樣的收入確認(rèn)政策這一問(wèn)題,,我收到了一個(gè)不準(zhǔn)確但很搞笑的回答:“非公認(rèn)會(huì)計(jì)原則每股收益是公司的收入確認(rèn)政策?!?/p>
下一個(gè)Primer工具:“實(shí)體識(shí)別”
它的任務(wù)是識(shí)別文檔中所有專(zhuān)有名稱(chēng),,并弄清楚文本中哪些代詞指的是哪些人或哪些組織。對(duì)人類(lèi)來(lái)說(shuō),,這項(xiàng)任務(wù)相對(duì)簡(jiǎn)單,,盡管有時(shí)候非常耗時(shí)。但它向來(lái)都讓計(jì)算機(jī)望而卻步,。這個(gè)例子表明,,NLP革命現(xiàn)在幫助軟件掌握了新技能。在Primer 發(fā)布的基準(zhǔn)測(cè)試中,,其系統(tǒng)的表現(xiàn)優(yōu)于谷歌和Facebook開(kāi)發(fā)的類(lèi)似軟件,。
·為了給Primer軟件出難題,,我特意上傳了一篇談?wù)?9世紀(jì)法國(guó)作家喬治·桑和維克多·雨果的文章。需要指出的是,,喬治·桑其實(shí)是一位女作家的筆名,,盡管它聽(tīng)起來(lái)很像男性的名字(她的原名是阿曼蒂娜-露西-奧蘿爾·杜班)。我的如意算盤(pán)是,,系統(tǒng)可能會(huì)發(fā)蒙,無(wú)法判斷代詞“他”究竟指的是桑,,還是雨果,。但令我驚訝的是,其表現(xiàn)完美無(wú)缺,,它明白這段話(huà)中的每一個(gè)“他 ”都是指雨果,,而“她 ”指的是桑。
Primer Labs工具執(zhí)行的最后一項(xiàng),,或許也是最困難的任務(wù):總結(jié)
對(duì)人類(lèi)來(lái)說(shuō),,準(zhǔn)確地總結(jié)長(zhǎng)文也是很困難的。而衡量一段摘要的有用程度往往是一件非常主觀的事情,。但Primer想出了一個(gè)聰明的辦法——根據(jù)BERT自動(dòng)判斷摘要的質(zhì)量,。BERT是谷歌創(chuàng)建并免費(fèi)提供的一個(gè)非常龐大的語(yǔ)言模型,以“屏蔽語(yǔ)言模型”著稱(chēng),,因?yàn)樗挠?xùn)練包括學(xué)習(xí)如何正確猜出文本中隱藏的單詞,。Primer開(kāi)發(fā)的BLANC系統(tǒng)通過(guò)評(píng)估BERT在摘要填空游戲中的表現(xiàn)有多好來(lái)評(píng)判摘要。BERT做得越好,,摘要的質(zhì)量就越高,。借助于BLANC系統(tǒng),Primer能夠訓(xùn)練出一個(gè)可以生成非常流暢的摘要的總結(jié)工具,。
·我給Primer的總結(jié)工具上傳了一篇我為《財(cái)富》雜志8 / 9月刊撰寫(xiě)的專(zhuān)題報(bào)道,,內(nèi)容是阿斯利康制藥如何在追尋新冠病毒疫苗的過(guò)程中,成功地走在了制藥巨頭的前列,。這款軟件在總結(jié)這篇長(zhǎng)文方面的出色表現(xiàn)給我留下了深刻印象,。它抓住了阿斯利康制藥成功轉(zhuǎn)型的關(guān)鍵點(diǎn),以及新冠疫苗的極端重要性,。
·但這個(gè)系統(tǒng)還遠(yuǎn)遠(yuǎn)不夠完美,。該工具的另一個(gè)部分試圖將文本精簡(jiǎn)為幾個(gè)要點(diǎn),而不是整個(gè)段落,。在這里,,它輸出的結(jié)果非常奇怪地偏離了文本要旨:這款軟件專(zhuān)注于文章開(kāi)頭提及的一則軼事所包含的事實(shí)信息(盡管這些信息無(wú)關(guān)主旨,并不重要),,但忽略了文章正文中包含的關(guān)鍵點(diǎn),。
·出于搞笑的目的,,我給系統(tǒng)輸入了T.S. ·艾略特的經(jīng)典作品《J. ·阿爾弗瑞德·普魯弗洛克的情歌》。博漢農(nóng)事先警告我說(shuō),,這款軟件很難總結(jié)創(chuàng)意文字,,尤其是詩(shī)歌,而結(jié)果也確實(shí)不夠理想,。除了對(duì)“房間里的女人們來(lái)往穿梭,,談?wù)撝组_(kāi)朗基羅”這行詩(shī)歌理解到位之外,系統(tǒng)并不確定到底發(fā)生了什么,。很多高中生大概都能感同身受,。但沒(méi)有一個(gè)英語(yǔ)老師會(huì)給Primer的成績(jī)打高分。(有趣的是,,GPT-3在寫(xiě)詩(shī)方面表現(xiàn)得還不錯(cuò),。但這并不意味著它能真正理解自己在寫(xiě)什么。)
話(huà)又說(shuō)回來(lái),,詩(shī)歌可能不是最迫切需要Primer產(chǎn)品施以援手的商業(yè)領(lǐng)域,。總結(jié)是一個(gè)巨大的潛在市場(chǎng),。1995年,,對(duì)一位負(fù)責(zé)追蹤某個(gè)國(guó)家動(dòng)態(tài)的美國(guó)情報(bào)分析師每天的閱讀量要求只有2萬(wàn)字(大約相當(dāng)于《紐約客》雜志的兩篇長(zhǎng)文)。到2016年,,同一位分析師的每日閱讀量估計(jì)要達(dá)到20萬(wàn)字——超過(guò)了看書(shū)最快的讀者在24小時(shí)內(nèi)的閱讀極限,。這種現(xiàn)象也在影響著金融和法律領(lǐng)域的分析師。而對(duì)于那些試圖跟上爆炸式增長(zhǎng)的學(xué)術(shù)論文的科學(xué)界人士來(lái)說(shuō),,這同樣是一個(gè)大問(wèn)題,。(事實(shí)上,為了幫助廣大公眾應(yīng)對(duì)疫情,,Primer專(zhuān)門(mén)創(chuàng)建了一個(gè)網(wǎng)站來(lái)總結(jié)每天發(fā)布的關(guān)于新冠病毒的新論文,。)
因此,NLP革命來(lái)得正是時(shí)候,。能夠簡(jiǎn)縮,、總結(jié),以及從文本中提取信息的自動(dòng)化工具正變得越來(lái)越重要,。如今的NLP技術(shù)還不夠完美,,但它已經(jīng)足夠好,完全有能力在廣泛的領(lǐng)域一展身手,。(財(cái)富中文網(wǎng))
譯者:任文科
Back in January, I wrote a big story for Fortune about the ongoing revolution in natural language processing. These are A.I. systems that can manipulate and, to some degree, “understand” language.
Language processing is now entering a kind of golden age, in which once impossible tasks are increasingly within reach. These new systems are already starting to transform how businesses operate—and they stand poised to do so in a much bigger way in the coming years.
This summer has seen some startling examples of what these methods can accomplish. The most discussed breakthrough has been OpenAI's GPT-3, which can generate long passages of coherent prose from a human-written prompt of just a line or two. In many cases, what the system generates is indistinguishable from human-written text.
GPT-3 is, for the moment, still something of a party trick—it is difficult to control, for instance, whether what the system generates is factually accurate, or to filter out racist or misogynistic ideas that it might have picked up from its large training set (which included not only the complete works of Shakespeare, but such repositories of human virtue as Reddit). But some companies are starting to build real products around it: One is creating a system that will generate complete emails from just a few bullet points. And a legal technology firm is experimenting with GPT-3 to see if it can aid in litigation discovery and compliance.
Another San Francisco A.I. company, Primer, creates software that helps analyze documents. It counts a number of U.S. intelligence agencies among its customers. It unveils a website on August 18, Primer Labs, that showcases three NLP systems it built in the past year and allows anyone to upload any text to play around with the tech.
I had interviewed John Bohannon, Primer’s Director of Science, back in December for that feature about NLP. Last week, I caught up with him again by Zoom. Bohannon told me things have only accelerated since we first talked.
He describes what is happening in NLP as “an industrial revolution,” where it is now becoming possible to string together multiple NLP tools—much the same way a mechanical engineer might combine boilers, flywheels, conveyor belts and presses—to create systems that can do real work in real businesses. And building these systems is getting easier and easier. “What used to take months,” he says, “now takes a week.”
Bohannon gave me early access to Primer Labs to let me experiment on texts of my own choosing.
The first tool: question-answering.
Upload any document and you can then ask questions in natural language to prompt the system to find an answer in the text. The system also suggests questions that you might want to ask.
·The software was fantastic at answering a series of questions about a simple news story on Joe Biden’s selection of Kamala Harris as his veep pick.
·However, when I uploaded a 2012 Securities and Exchange Commission filing from the pharmaceutical giant Merck that runs to 159 pages and about 100,000 words, its performance was hit-and-miss. When I asked it what Merck's sales were in 2011, it returned the correct answer: $48 billion. But when I asked it what the company’s operating profit was, I received a message that the software “was having trouble answering that particular question.” And when I asked it what the company’s revenue recognition policies were, I received the inaccurate but hilarious reply that “non-GAAP EPS is the company's revenue recognition policies.”
The next Primer tool: “named entity recognition.”
This is the task of identifying all the proper names in a document and figuring out which pronouns in the text refer to which people or which organizations. This task is relatively easy—if time-consuming—for humans, but it's historically stumped computers. It is a good example of a skill that is now within software’s grasp thanks to the NLP revolution. In benchmark tests Primer has published, its system has outperformed similar software created by Google and Facebook.
·I tried to stump Primer’s software by giving it a passage about the 19th-century French authors George Sand and Victor Hugo. I was hoping that the fact Sand is the male nom de plume of a female writer (her real name was Amantine Lucile Aurore Dupin) would confuse the system when it had to decide whether the pronoun “he” belonged to Sand or Hugo. But, to my surprise, the system performed flawlessly, understanding that every “he” in the passage referred to Hugo while “she” referred to Sands.
The final and perhaps most difficult task Primer Labs’ tools perform: summarization.
Accurately summarizing long documents is difficult for humans too. And gauging how useful a summary is can be highly subjective. But Primer came up with a clever way to automatically judge summary quality based on BERT, a very large language model that Google created and has made freely available. BERT is what is known as a “masked language model,” because its training consists of learning how to correctly guess what a hidden word in a text is. Primer's BLANC judges summaries by assessing how much better BERT performs in this fill-in-the-blank game after having accessed the summary. The better BERT does, the better the summary. Thanks to BLANC, Primer was able to train a summarization tool that can generate pretty fluent summaries.
·I fed Primer’s summarization tool a feature story I wrote for Fortune’s August/September double-issue about how AstraZeneca has managed to leap ahead of its Big Pharma rivals in the quest for a COVID-19 vaccine. I was impressed at how well the software did in abstracting the lengthy article. It captured key points about AstraZeneca’s corporate turnaround as well as the importance of a COVID-19 vaccine.
·But the system is still far from perfect. Another part of the tool tries to reduce the text to just a handful of key bullet points instead of whole paragraphs. Here the results were strangely off-base: The software fixated on factual information from an anecdote at the beginning of the article that was not essential, and yet missed crucial points contained further down in the body of the piece.
·For a laugh, I fed the system T.S. Eliot’s “The Love Song of J. Alfred Prufrock.” Bohannon had warned me that the software would struggle to summarize more creative writing, particularly poetry, and the results were not pretty. Other than the fact that “the women come and go, speaking of Michelangelo,” the system wasn’t really sure what was happening. A lot of high school students could probably sympathize. But no English teacher would give Primer’s results high marks. (Interestingly, GPT-3 isn't half bad at writing?poetry. But that doesn't mean it has any real understanding of what it's writing.)
Then again, poetry is probably not the most pressing business case for Primer’s products. Summarization is a huge potential market. In 1995, the average daily reading requirement of a U.S. intelligence analyst assigned to follow the events in one country was just 20,000 words (or about the equivalent of two New Yorker longreads). By 2016, the same analyst’s daily reading load was estimated at 200,000 words—more than the most capable speed reader could possibly skim in 24 hours. This phenomenon is affecting analysts in finance and law too, and is a huge issue for people in the sciences trying to keep up with the explosion in published research. (In fact, to help out during the pandemic, Primer has created a site that summarizes each day’s new research papers on COVID-19.)
So the NLP revolution has arrived not a moment too soon. Automated tools that help condense and summarize and extract information from written text are becoming more and more essential. Today’s NLP isn’t perfect—but it is getting good enough to make a difference.