研究人員利用人工智能技術(shù)取得了巨大突破,可能為新藥研發(fā)帶來革命,。
科學(xué)家開發(fā)的一款人工智能軟件,,利用蛋白質(zhì)的DNA序列預(yù)測其三維結(jié)構(gòu),,準(zhǔn)確度誤差不超過一個(gè)原子的寬度。
這項(xiàng)成就解決了困擾分子生物學(xué)領(lǐng)域50年的挑戰(zhàn),。它來自于倫敦人工智能公司DeepMind的研究團(tuán)隊(duì),。目前,DeepMind隸屬于谷歌(Google)母公司Alphabet旗下,。到目前為止,,DeepMind最為人所知的是其創(chuàng)造的人工智能在圍棋比賽中打敗了人類選手,創(chuàng)下了計(jì)算機(jī)科學(xué)領(lǐng)域的一個(gè)重要里程碑,。
DeepMind在兩年一次預(yù)測蛋白質(zhì)結(jié)構(gòu)的算法競賽中取得了該項(xiàng)突破,。該競賽要求參賽者根據(jù)一個(gè)蛋白質(zhì)的DNA序列,確定該蛋白質(zhì)的三維形狀,。
馬里蘭大學(xué)(University of Maryland)的分子生物學(xué)家約翰·莫爾特是“結(jié)構(gòu)預(yù)測關(guān)鍵評(píng)估”(Critical Assessment of Structure Prediction)競賽的負(fù)責(zé)人,。他表示,在100多種蛋白質(zhì)中,,DeepMind的人工智能軟件AlphaFold 2預(yù)測蛋白質(zhì)結(jié)構(gòu)的準(zhǔn)確度,,有三分之二的偏差在一個(gè)原子寬度以內(nèi),剩余三分之一大部分的預(yù)測結(jié)果也非常準(zhǔn)確,。他表示,,AlphaFold 2的準(zhǔn)確度遠(yuǎn)高于參加競賽的任何其他方法。
DeepMind的聯(lián)合創(chuàng)始人及首席執(zhí)行官德米斯·哈薩比斯表示,,公司希望“利用這些技術(shù)最大程度造福社會(huì),。”但他表示,,DeepMind尚未確定通過哪種方式將該蛋白質(zhì)結(jié)構(gòu)預(yù)測軟件提供給學(xué)術(shù)研究人員使用,,或者是否向制藥公司和生物科技公司尋求商業(yè)合作。他說公司將在明年某個(gè)時(shí)間“詳細(xì)說明我們?nèi)绾我阅軌蛞?guī)?;姆绞教峁┰撓到y(tǒng),。”
結(jié)構(gòu)生物學(xué)家,、諾貝爾獎(jiǎng)得主文卡特拉曼·拉馬克里希南評(píng)價(jià)AlphaFold 2稱:“這款軟件的計(jì)算結(jié)果代表蛋白質(zhì)折疊問題取得了驚人的進(jìn)步,。”拉馬克里希南是英國最負(fù)盛名的科研機(jī)構(gòu)皇家學(xué)會(huì)(Royal Society)的會(huì)長,,即將卸任,。
蛋白質(zhì)結(jié)構(gòu)專家、歐洲分子生物學(xué)實(shí)驗(yàn)室(European Molecular Biology Laboratory)歐洲生物信息研究所(European Bioinformatics Institute)的前負(fù)責(zé)人珍妮特·桑頓表示,,DeepMind的突破為繪制完整的“人類蛋白質(zhì)組圖譜”開辟了道路,。人類蛋白質(zhì)組圖譜中將包含人體內(nèi)的所有蛋白質(zhì)。她表示,,目前只有約四分之一的人類蛋白質(zhì)被用作藥物靶點(diǎn)?,F(xiàn)在可以將更多蛋白質(zhì)作為藥物靶點(diǎn),,為發(fā)明新藥創(chuàng)造了巨大的機(jī)會(huì)。
桑頓還表示,,DeepMind的人工智能系統(tǒng)對(duì)于研究合成蛋白質(zhì)的科學(xué)家同樣有著深遠(yuǎn)的意義,,也可能產(chǎn)生巨大的影響:例如培養(yǎng)更有營養(yǎng)的新型轉(zhuǎn)基因作物品種,開發(fā)能夠通過消化塑料來清潔環(huán)境的新型酶等,。
蛋白質(zhì)是生物學(xué)過程的基本機(jī)制,。蛋白質(zhì)由氨基酸長鏈組成,氨基酸長鏈又稱DNA,。但細(xì)胞生成蛋白質(zhì)之后,,蛋白質(zhì)會(huì)立即折疊成復(fù)雜的形狀,類似于一團(tuán)繩子纏繞在一起,,有條狀結(jié)構(gòu)和類似于花飾的附著結(jié)構(gòu),。蛋白質(zhì)的具體結(jié)構(gòu)決定了它的功能。蛋白質(zhì)結(jié)構(gòu)對(duì)于小分子設(shè)計(jì)也至關(guān)重要,。小分子可以與蛋白質(zhì)結(jié)合,,并修改蛋白質(zhì)的功能,這就是新藥研發(fā)的過程,。
到目前為止,,為獲取一種蛋白質(zhì)結(jié)構(gòu)的高分辨率模型,使用的主要方法是X射線晶體學(xué),。這種技術(shù)能夠?qū)⒁环N蛋白質(zhì)溶液變成晶體,,這個(gè)過程極其復(fù)雜并且要耗費(fèi)大量時(shí)間。然后用X射線連續(xù)照射晶體,,通常會(huì)使用一種名為同步加速器的環(huán)形粒子加速器,。研究人員可以通過X射線的繞射圖繪制出蛋白質(zhì)的內(nèi)部結(jié)構(gòu)圖。據(jù)多倫多大學(xué)(University of Toronto)估計(jì),,通過X射線晶體學(xué)這種方法獲取一個(gè)蛋白質(zhì)的結(jié)構(gòu),,需要耗時(shí)一年,,成本約為12萬美元,。
最近,還有兩種實(shí)驗(yàn)方法也被用于預(yù)測蛋白質(zhì)結(jié)構(gòu),,它們分別是核磁共振和低溫電子顯微技術(shù),。這兩種方法的速度更快,成本更低,,但其生成的模型精確度不及X射線晶體學(xué),。
而按照DeepMind蛋白質(zhì)折疊團(tuán)隊(duì)的首席研究員約翰·江珀的說法,AlphaFold 2使用“適度的”計(jì)算資源,,只需要“幾天時(shí)間”就能夠計(jì)算出蛋白質(zhì)的每一種結(jié)構(gòu),。江珀表示,,訓(xùn)練該系統(tǒng)需要在16個(gè)芯片上使用由谷歌開發(fā)的128個(gè)專用人工智能計(jì)算單元,連續(xù)運(yùn)行“大約幾周”,。這種人工智能計(jì)算單元被稱為張量處理單元,。他表示,該系統(tǒng)需要的計(jì)算能力,,比公司最近的多項(xiàng)人工智能突破要少得多,,包括之前的AlphaGo。
1972年,,諾貝爾獎(jiǎng)得主,、化學(xué)家克里斯蒂安·安芬森曾經(jīng)假設(shè),DNA應(yīng)該完全能夠決定蛋白質(zhì)的最終結(jié)構(gòu),。為了證明安芬森提出的設(shè)想,,科學(xué)界數(shù)十年來一直在尋找數(shù)學(xué)模型。但問題是,,即使物理定律可以決定蛋白質(zhì)的折疊方式,,蛋白質(zhì)折疊可能存在大量其他排列,因此正如生物學(xué)家賽勒斯·利文索爾提出的一種著名的說法,,通過隨機(jī)試錯(cuò)法確定一個(gè)蛋白質(zhì)的結(jié)構(gòu)所需要的時(shí)間,,可能比已知宇宙的年齡更長。
但DeepMind的AlphaFold 2現(xiàn)在已經(jīng)基本實(shí)現(xiàn)了安芬森的設(shè)想,。莫爾特表示,,在“結(jié)構(gòu)預(yù)測關(guān)鍵評(píng)估”競賽中,對(duì)于超過三分之二的蛋白質(zhì),,AlphaFold 2和X射線晶體學(xué)的準(zhǔn)確度不相上下?,F(xiàn)在希望研究人員能夠利用AlphaFold 2,或者至少用相同的方法,,直接根據(jù)蛋白質(zhì)的DNA序列得出其3D形狀,,不需要使用X射線晶體學(xué)或其他物理實(shí)驗(yàn)。獲取蛋白質(zhì)的DNA序列相對(duì)容易,,并且成本低廉,。
位于德國蒂賓根的馬克斯·普朗克發(fā)育生物學(xué)研究所(Max Planck Institute for Developmental Biology)的蛋白質(zhì)進(jìn)化系主任安德烈·盧帕斯是今年“結(jié)構(gòu)預(yù)測關(guān)鍵評(píng)估”競賽的評(píng)審之一。他說DeepMind的結(jié)果“令人震驚,?!?/p>
在“結(jié)構(gòu)預(yù)測關(guān)鍵評(píng)估”競賽過程中,為了驗(yàn)證DeepMind系統(tǒng)的能力,,盧帕斯利用AlphaFold 2的預(yù)測結(jié)果,,以確認(rèn)它能否預(yù)測出一種蛋白質(zhì)結(jié)構(gòu)的最后一部分。10多年來,,他利用X射線晶體學(xué)一直無法完成這部分結(jié)構(gòu)的繪制,。盧帕斯說利用AlphaFold 2生成的預(yù)測,,他可以在短短半個(gè)小時(shí)內(nèi)確定最后一個(gè)蛋白質(zhì)區(qū)段的形狀。
AlphaFold 2已經(jīng)被用于準(zhǔn)確預(yù)測一種名為ORF3a的蛋白質(zhì)的結(jié)構(gòu),,這種蛋白質(zhì)存在于導(dǎo)致新冠肺炎的SARS-CoV-2病毒當(dāng)中,。未來,科學(xué)家能夠根據(jù)其預(yù)測的結(jié)果,,將這種蛋白質(zhì)作為靶點(diǎn),,開發(fā)治療藥物。
盧帕斯表示,,他認(rèn)為對(duì)于從事蛋白質(zhì)研究的科學(xué)家而言,,這款人工智能軟件將“帶來顛覆性的變化”。目前已知約2億種蛋白質(zhì)的DNA序列,,并且每年可以發(fā)現(xiàn)數(shù)千萬個(gè)新的蛋白質(zhì),。但已經(jīng)繪制出3D結(jié)構(gòu)的蛋白質(zhì)不足20萬種。
AlphaFold 2是唯一一款專門用于預(yù)測單個(gè)蛋白質(zhì)結(jié)構(gòu)的人工智能,。但蛋白質(zhì)的性質(zhì)決定了一種蛋白質(zhì)通常會(huì)與其他蛋白質(zhì)組成復(fù)雜的結(jié)構(gòu),。江珀表示,下一步的目標(biāo)是開發(fā)一種能夠預(yù)測蛋白質(zhì)之間的復(fù)雜動(dòng)態(tài)的人工智能系統(tǒng),,例如蛋白質(zhì)之間如何結(jié)合,,或者相鄰的蛋白質(zhì)如何改變彼此的形狀等。
DeepMind兩年前參加了“結(jié)構(gòu)預(yù)測關(guān)鍵評(píng)估”競賽并獲獎(jiǎng),。但當(dāng)時(shí)所使用的人工智能系統(tǒng)AlphaFold配置不同,,在最難預(yù)測的一類蛋白質(zhì)中,其平均“全局距離完全測試得分”(global distance test total score)只有58分,。全局距離完全測試得分相當(dāng)于其準(zhǔn)確繪制的每一種蛋白質(zhì)的百分比,。
雖然這個(gè)分?jǐn)?shù)比第二名的團(tuán)隊(duì)高了約6分,但無法與X射線晶體學(xué)等實(shí)證研究方法相媲美,。今年,,即使是最難預(yù)測的蛋白質(zhì),DeepMind的全局距離完全測試得分中位數(shù)也達(dá)到了87分,,接近于X射線晶體學(xué)的分?jǐn)?shù),,比緊隨其后的團(tuán)隊(duì)高出約26分。(財(cái)富中文網(wǎng))
翻譯:劉進(jìn)龍
審校:汪皓
研究人員利用人工智能技術(shù)取得了巨大突破,,可能為新藥研發(fā)帶來革命,。
科學(xué)家開發(fā)的一款人工智能軟件,,利用蛋白質(zhì)的DNA序列預(yù)測其三維結(jié)構(gòu),,準(zhǔn)確度誤差不超過一個(gè)原子的寬度。
這項(xiàng)成就解決了困擾分子生物學(xué)領(lǐng)域50年的挑戰(zhàn),。它來自于倫敦人工智能公司DeepMind的研究團(tuán)隊(duì),。目前,,DeepMind隸屬于谷歌(Google)母公司Alphabet旗下。到目前為止,,DeepMind最為人所知的是其創(chuàng)造的人工智能在圍棋比賽中打敗了人類選手,,創(chuàng)下了計(jì)算機(jī)科學(xué)領(lǐng)域的一個(gè)重要里程碑。
DeepMind在兩年一次預(yù)測蛋白質(zhì)結(jié)構(gòu)的算法競賽中取得了該項(xiàng)突破,。該競賽要求參賽者根據(jù)一個(gè)蛋白質(zhì)的DNA序列,,確定該蛋白質(zhì)的三維形狀。
馬里蘭大學(xué)(University of Maryland)的分子生物學(xué)家約翰·莫爾特是“結(jié)構(gòu)預(yù)測關(guān)鍵評(píng)估”(Critical Assessment of Structure Prediction)競賽的負(fù)責(zé)人,。他表示,,在100多種蛋白質(zhì)中,DeepMind的人工智能軟件AlphaFold 2預(yù)測蛋白質(zhì)結(jié)構(gòu)的準(zhǔn)確度,,有三分之二的偏差在一個(gè)原子寬度以內(nèi),,剩余三分之一大部分的預(yù)測結(jié)果也非常準(zhǔn)確。他表示,,AlphaFold 2的準(zhǔn)確度遠(yuǎn)高于參加競賽的任何其他方法,。
DeepMind的聯(lián)合創(chuàng)始人及首席執(zhí)行官德米斯·哈薩比斯表示,公司希望“利用這些技術(shù)最大程度造福社會(huì),?!钡硎荆珼eepMind尚未確定通過哪種方式將該蛋白質(zhì)結(jié)構(gòu)預(yù)測軟件提供給學(xué)術(shù)研究人員使用,,或者是否向制藥公司和生物科技公司尋求商業(yè)合作,。他說公司將在明年某個(gè)時(shí)間“詳細(xì)說明我們?nèi)绾我阅軌蛞?guī)模化的方式提供該系統(tǒng),?!?/p>
結(jié)構(gòu)生物學(xué)家、諾貝爾獎(jiǎng)得主文卡特拉曼·拉馬克里希南評(píng)價(jià)AlphaFold 2稱:“這款軟件的計(jì)算結(jié)果代表蛋白質(zhì)折疊問題取得了驚人的進(jìn)步,?!崩R克里希南是英國最負(fù)盛名的科研機(jī)構(gòu)皇家學(xué)會(huì)(Royal Society)的會(huì)長,即將卸任,。
蛋白質(zhì)結(jié)構(gòu)專家,、歐洲分子生物學(xué)實(shí)驗(yàn)室(European Molecular Biology Laboratory)歐洲生物信息研究所(European Bioinformatics Institute)的前負(fù)責(zé)人珍妮特·桑頓表示,DeepMind的突破為繪制完整的“人類蛋白質(zhì)組圖譜”開辟了道路,。人類蛋白質(zhì)組圖譜中將包含人體內(nèi)的所有蛋白質(zhì),。她表示,目前只有約四分之一的人類蛋白質(zhì)被用作藥物靶點(diǎn)?,F(xiàn)在可以將更多蛋白質(zhì)作為藥物靶點(diǎn),,為發(fā)明新藥創(chuàng)造了巨大的機(jī)會(huì)。
桑頓還表示,DeepMind的人工智能系統(tǒng)對(duì)于研究合成蛋白質(zhì)的科學(xué)家同樣有著深遠(yuǎn)的意義,,也可能產(chǎn)生巨大的影響:例如培養(yǎng)更有營養(yǎng)的新型轉(zhuǎn)基因作物品種,,開發(fā)能夠通過消化塑料來清潔環(huán)境的新型酶等。
蛋白質(zhì)是生物學(xué)過程的基本機(jī)制,。蛋白質(zhì)由氨基酸長鏈組成,,氨基酸長鏈又稱DNA。但細(xì)胞生成蛋白質(zhì)之后,,蛋白質(zhì)會(huì)立即折疊成復(fù)雜的形狀,,類似于一團(tuán)繩子纏繞在一起,有條狀結(jié)構(gòu)和類似于花飾的附著結(jié)構(gòu),。蛋白質(zhì)的具體結(jié)構(gòu)決定了它的功能,。蛋白質(zhì)結(jié)構(gòu)對(duì)于小分子設(shè)計(jì)也至關(guān)重要。小分子可以與蛋白質(zhì)結(jié)合,,并修改蛋白質(zhì)的功能,,這就是新藥研發(fā)的過程。
到目前為止,,為獲取一種蛋白質(zhì)結(jié)構(gòu)的高分辨率模型,,使用的主要方法是X射線晶體學(xué)。這種技術(shù)能夠?qū)⒁环N蛋白質(zhì)溶液變成晶體,,這個(gè)過程極其復(fù)雜并且要耗費(fèi)大量時(shí)間,。然后用X射線連續(xù)照射晶體,通常會(huì)使用一種名為同步加速器的環(huán)形粒子加速器,。研究人員可以通過X射線的繞射圖繪制出蛋白質(zhì)的內(nèi)部結(jié)構(gòu)圖,。據(jù)多倫多大學(xué)(University of Toronto)估計(jì),通過X射線晶體學(xué)這種方法獲取一個(gè)蛋白質(zhì)的結(jié)構(gòu),,需要耗時(shí)一年,,成本約為12萬美元。
最近,,還有兩種實(shí)驗(yàn)方法也被用于預(yù)測蛋白質(zhì)結(jié)構(gòu),,它們分別是核磁共振和低溫電子顯微技術(shù)。這兩種方法的速度更快,,成本更低,,但其生成的模型精確度不及X射線晶體學(xué)。
而按照DeepMind蛋白質(zhì)折疊團(tuán)隊(duì)的首席研究員約翰·江珀的說法,,AlphaFold 2使用“適度的”計(jì)算資源,,只需要“幾天時(shí)間”就能夠計(jì)算出蛋白質(zhì)的每一種結(jié)構(gòu)。江珀表示,,訓(xùn)練該系統(tǒng)需要在16個(gè)芯片上使用由谷歌開發(fā)的128個(gè)專用人工智能計(jì)算單元,,連續(xù)運(yùn)行“大約幾周”。這種人工智能計(jì)算單元被稱為張量處理單元。他表示,,該系統(tǒng)需要的計(jì)算能力,比公司最近的多項(xiàng)人工智能突破要少得多,,包括之前的AlphaGo,。
1972年,諾貝爾獎(jiǎng)得主,、化學(xué)家克里斯蒂安·安芬森曾經(jīng)假設(shè),,DNA應(yīng)該完全能夠決定蛋白質(zhì)的最終結(jié)構(gòu)。為了證明安芬森提出的設(shè)想,,科學(xué)界數(shù)十年來一直在尋找數(shù)學(xué)模型,。但問題是,即使物理定律可以決定蛋白質(zhì)的折疊方式,,蛋白質(zhì)折疊可能存在大量其他排列,,因此正如生物學(xué)家賽勒斯·利文索爾提出的一種著名的說法,通過隨機(jī)試錯(cuò)法確定一個(gè)蛋白質(zhì)的結(jié)構(gòu)所需要的時(shí)間,,可能比已知宇宙的年齡更長,。
但DeepMind的AlphaFold 2現(xiàn)在已經(jīng)基本實(shí)現(xiàn)了安芬森的設(shè)想。莫爾特表示,,在“結(jié)構(gòu)預(yù)測關(guān)鍵評(píng)估”競賽中,,對(duì)于超過三分之二的蛋白質(zhì),AlphaFold 2和X射線晶體學(xué)的準(zhǔn)確度不相上下?,F(xiàn)在希望研究人員能夠利用AlphaFold 2,,或者至少用相同的方法,直接根據(jù)蛋白質(zhì)的DNA序列得出其3D形狀,,不需要使用X射線晶體學(xué)或其他物理實(shí)驗(yàn),。獲取蛋白質(zhì)的DNA序列相對(duì)容易,并且成本低廉,。
位于德國蒂賓根的馬克斯·普朗克發(fā)育生物學(xué)研究所(Max Planck Institute for Developmental Biology)的蛋白質(zhì)進(jìn)化系主任安德烈·盧帕斯是今年“結(jié)構(gòu)預(yù)測關(guān)鍵評(píng)估”競賽的評(píng)審之一,。他說DeepMind的結(jié)果“令人震驚?!?/p>
在“結(jié)構(gòu)預(yù)測關(guān)鍵評(píng)估”競賽過程中,,為了驗(yàn)證DeepMind系統(tǒng)的能力,盧帕斯利用AlphaFold 2的預(yù)測結(jié)果,,以確認(rèn)它能否預(yù)測出一種蛋白質(zhì)結(jié)構(gòu)的最后一部分,。10多年來,他利用X射線晶體學(xué)一直無法完成這部分結(jié)構(gòu)的繪制,。盧帕斯說利用AlphaFold 2生成的預(yù)測,,他可以在短短半個(gè)小時(shí)內(nèi)確定最后一個(gè)蛋白質(zhì)區(qū)段的形狀。
AlphaFold 2已經(jīng)被用于準(zhǔn)確預(yù)測一種名為ORF3a的蛋白質(zhì)的結(jié)構(gòu),這種蛋白質(zhì)存在于導(dǎo)致新冠肺炎的SARS-CoV-2病毒當(dāng)中,。未來,,科學(xué)家能夠根據(jù)其預(yù)測的結(jié)果,將這種蛋白質(zhì)作為靶點(diǎn),,開發(fā)治療藥物,。
盧帕斯表示,他認(rèn)為對(duì)于從事蛋白質(zhì)研究的科學(xué)家而言,,這款人工智能軟件將“帶來顛覆性的變化”,。目前已知約2億種蛋白質(zhì)的DNA序列,并且每年可以發(fā)現(xiàn)數(shù)千萬個(gè)新的蛋白質(zhì),。但已經(jīng)繪制出3D結(jié)構(gòu)的蛋白質(zhì)不足20萬種,。
AlphaFold 2是唯一一款專門用于預(yù)測單個(gè)蛋白質(zhì)結(jié)構(gòu)的人工智能。但蛋白質(zhì)的性質(zhì)決定了一種蛋白質(zhì)通常會(huì)與其他蛋白質(zhì)組成復(fù)雜的結(jié)構(gòu),。江珀表示,,下一步的目標(biāo)是開發(fā)一種能夠預(yù)測蛋白質(zhì)之間的復(fù)雜動(dòng)態(tài)的人工智能系統(tǒng),例如蛋白質(zhì)之間如何結(jié)合,,或者相鄰的蛋白質(zhì)如何改變彼此的形狀等,。
DeepMind兩年前參加了“結(jié)構(gòu)預(yù)測關(guān)鍵評(píng)估”競賽并獲獎(jiǎng)。但當(dāng)時(shí)所使用的人工智能系統(tǒng)AlphaFold配置不同,,在最難預(yù)測的一類蛋白質(zhì)中,,其平均“全局距離完全測試得分”(global distance test total score)只有58分。全局距離完全測試得分相當(dāng)于其準(zhǔn)確繪制的每一種蛋白質(zhì)的百分比,。
雖然這個(gè)分?jǐn)?shù)比第二名的團(tuán)隊(duì)高了約6分,,但無法與X射線晶體學(xué)等實(shí)證研究方法相媲美。今年,,即使是最難預(yù)測的蛋白質(zhì),,DeepMind的全局距離完全測試得分中位數(shù)也達(dá)到了87分,接近于X射線晶體學(xué)的分?jǐn)?shù),,比緊隨其后的團(tuán)隊(duì)高出約26分,。(財(cái)富中文網(wǎng))
翻譯:劉進(jìn)龍
審校:汪皓
Researchers have made a major breakthrough using artificial intelligence that could revolutionize the hunt for new medicines.
The scientists have created A.I. software that uses a protein’s DNA sequence to predict its three-dimensional structure to within an atom’s width of accuracy.
The achievement, which solves a 50-year-old challenge in molecular biology, was accomplished by a team from DeepMind, the London-based artificial intelligence company that is part of Google parent Alphabet. Until now, DeepMind was best known for creating A.I. that could beat the best human players at the strategy game Go, a major milestone in computer science.
DeepMind achieved the protein shape breakthrough in a biennial competition for algorithms that can be used to predict protein structures. The competition asks participants to take a protein’s DNA sequence and then use it to determine the protein’s three-dimensional shape.
Across more than 100 proteins, DeepMind’s A.I. software, which it called AlphaFold 2, was able to predict the structure to within about an atom’s width of accuracy in two-thirds of cases and was highly accurate in most of the remaining one-third of cases, according to John Moult, a molecular biologist at the University of Maryland who is director of the competition, called the Critical Assessment of Structure Prediction, or CASP. It was far better than any other method in the competition, he said.
Demis Hassabis, DeepMind’s cofounder and chief executive officer, said the company wants “to make the maximal positive societal impact with these technologies.” But he said DeepMind had not yet determined how it would provide academic researchers with access to the protein structure prediction software or whether it would seek commercial collaborations with pharmaceutical and biotechnology firms. He said the company would announce “further details on how we’re going to be able to give access to the system in a scalable way” sometime next year.
“This computational work represents a stunning advance on the protein-folding problem,” Venki Ramakrishnan, a Nobel Prize–winning structural biologist who is also the outgoing president of the Royal Society, Britain’s most prestigious scientific body, said of AlphaFold 2.
Janet Thornton, an expert in protein structure and former director of the European Molecular Biology Laboratory’s European Bioinformatics Institute, said that DeepMind’s breakthrough opened up the way to mapping the entire “human proteome”—the set of all proteins found within the human body. Currently, only about a quarter of human proteins have been used as targets for medicines, she said. Now, many more proteins could be targeted, creating a huge opportunity to invent new medicines.
Thornton also said that DeepMind’s A.I. system would have profound implications for scientists who create synthetic proteins and that these could have big impacts too: everything from creating new genetically modified crop strains that will be far more nutritious to new enzymes that could help clean up the environment by digesting plastics.
Proteins are the basic mechanisms of biological processes. They are formed from long chains of amino acids, coded for in DNA, but once manufactured by a cell, they fold themselves spontaneously into complex shapes that often resemble a tangle of cord, with ribbons and curlicue-like appendages. The exact structure of a protein is essential to its function. It is also critical for designing small molecules that might be able to bind with the protein and alter this function, which is how new medicines are created.
Until now, the primary way to obtain a high-resolution model of a protein’s structure was through a method called X-ray crystallography. In this technique, a solution of proteins is turned into a crystal, itself a difficult and time-consuming process, and then this crystal is bombarded with X-rays, often from a large circular particle accelerator called a synchrotron. The diffraction pattern of the X-rays allows researchers to build up a picture of the internal structure of the protein. It takes about a year and costs about $120,000 to obtain the structure of a single protein through X-ray crystallography, according to an estimate from the University of Toronto.
More recently, two other experimental methods—nuclear magnetic resonance and cryogenic electron microscopy—have also been used. They can be faster and less expensive but tend to produce models that are less precise than X-ray crystallography.
It takes AlphaFold 2 “a matter of days” to calculate each protein structure using what John Jumper, the researcher who leads the protein-folding team at DeepMind, characterized as “modest” computing resources. Training the system required 128 specialized A.I. computing units on 16 chips created by Google, called tensor processing units, running continuously for “roughly a few weeks,” Jumper said. He noted that this is much less computing power than has been required for many other recent A.I. breakthroughs, including DeepMind’s previous work on Go.
In 1972, Nobel Prize–winning chemist Christian Anfinsen postulated that DNA alone should fully determine what final structure a protein takes—a supposition that set off the decades-long quest to find a mathematical model that could do what Anfinsen was proposing. The problem was, however, that even though the laws of physics control how a protein folds, there are so many possible permutations that biologist Cyrus Levinthal famously estimated it would take longer than the age of the known universe to puzzle out a single protein’s structure through random trial and error.
But DeepMind’s AlphaFold 2 has now essentially done what Anfinsen suggested. AlphaFold 2 is “on par” with X-ray crystallography across more than two-thirds of the proteins in the CASP competition, Moult said. Now the hope is that researchers will be able to use AlphaFold 2, or at least the same method, to go directly from a protein’s DNA sequence, which has become relatively easy and inexpensive to obtain, to knowing its 3D shape, without having to use X-ray crystallography or other physical experiments at all.
Andrei Lupas, director of the department of protein evolution at the Max Planck Institute for Developmental Biology in Tübingen, Germany, who served as one of the assessors for this year’s CASP competition, called DeepMind’s results “astonishing.”
As part of CASP’s efforts to verify the capabilities of DeepMind’s system, Lupas used the predictions from AlphaFold 2 to see if it could solve the final portion of a protein’s structure that he had been unable to complete using X-ray crystallography for more than a decade. With the predictions generated by AlphaFold 2, Lupas said he was able to determine the shape of the final protein segment in just half an hour.
AlphaFold 2 has also already been used to accurately predict the structure of a protein called ORF3a that is found in SARS-CoV-2, the virus that causes COVID-19, which scientists might be able to use as a target for future treatments.
Lupas said he thought the A.I. software would “change the game entirely” for those who work on proteins. Currently, DNA sequences are known for about 200 million proteins, and tens of millions more are being discovered every year. But 3D structures have been mapped for less than 200,000 of them.
AlphaFold 2 was only trained to predict the structure of single proteins. But in nature, proteins are often present in complex arrangements with other proteins. Jumper said the next step was to develop an A.I. system that could predict complicated dynamics between proteins—such as how two proteins will bind to one another or the way that proteins in close proximity morph one another’s shapes.
DeepMind had entered and won the CASP competition two years ago. But at the time, using an A.I. system called AlphaFold that was configured differently, it was only able to achieve an average “global distance test total score” (GDT) —a measure that is approximately equivalent to the percentage of each protein that it accurately maps—of 58 on the hardest class of proteins.
Although this was about six points better than the next best team, it was not a result that was competitive with empirical methods like X-ray crystallography. This year, even on these hardest proteins, DeepMind achieved a median GDT of 87, which is close to being as good as crystallography and was about 26 points better than its nearest competitor.