吳恩達(Andrew Ng)是深度學習技術(shù)的先驅(qū)者之一。所謂深度學習,,就是將大型神經(jīng)網(wǎng)絡(luò)應用于人工智能領(lǐng)域,。就廣大企業(yè)應該如何利用人工智能技術(shù)的問題,,吳恩達也是最有發(fā)言權(quán)的專家,。吳恩達創(chuàng)辦了一家名為Landing AI的公司并自任首席執(zhí)行官。這家公司的軟件,,可以讓即使不懂編程的人,,也能夠輕松構(gòu)建和維護AI系統(tǒng)。這樣的話,,幾乎所有企業(yè)都可以使用AI技術(shù)了——尤其是計算機視覺應用,。目前,一些大型生產(chǎn)商,,例如工具制造商史丹利百德(StanleyBlack & Decker),、電子產(chǎn)品制造商富士康(Foxconn),以及汽車零部件制造商電裝公司(Denso)都已經(jīng)成了Landing AI的客戶,。
吳恩達是所謂“數(shù)據(jù)中心型AI”的倡導者,。他認為,,隨著開源數(shù)據(jù)的普及和先進人工智能研究的發(fā)表,尖端人工智能技術(shù)也變得越來越普及,。企業(yè)就算請不頂尖院校的計算機博士,,也并不難獲得尖端的人工智能軟件代碼,而且這些程序與谷歌(Google)或者美國國家航空航天局(NASA)使用的AI程序可能是一樣的,。那么,,為什么有些公司能夠成功應用AI技術(shù),有些公司則不能,?最大的區(qū)別在于,,你用什么數(shù)據(jù)來訓練這個AI算法,這些數(shù)據(jù)又是如何收集,、處理和管理的,?吳恩達告訴我,所謂的“數(shù)據(jù)中心型AI”,,就是要對數(shù)據(jù)進行“智能量化”,,用盡量最少的數(shù)據(jù)來構(gòu)建一個成功的AI系統(tǒng)。他認為:“向數(shù)據(jù)中心型AI的轉(zhuǎn)型”是當今企業(yè)需要進行的最重要的轉(zhuǎn)型,,只有這樣才能充分發(fā)揮人工智能的優(yōu)勢,。其重要性不亞于上一個10年向深度學習技術(shù)的轉(zhuǎn)型。
吳恩達指出,,如果數(shù)據(jù)準備得當,,那么一家企業(yè)實際需要的數(shù)據(jù),就可能遠遠少于它們的想象,。有了正確的數(shù)據(jù),,哪怕企業(yè)只有幾十或者幾百個事例,訓練出的AI系統(tǒng)也將十分好用,,絲毫不亞于那些消費互聯(lián)網(wǎng)巨頭用幾十億個事例訓練出來的系統(tǒng),。他表示,將AI技術(shù)拓展到互聯(lián)網(wǎng)巨頭以外的企業(yè)的好處之一,,就是可以使用更小的數(shù)據(jù)集進行有效訓練,。
那么,什么樣的數(shù)據(jù)才是正確的數(shù)據(jù),?吳恩達認為,,首先要確保數(shù)據(jù)的“y系一致性”。也就是說,,某個事物是否會收到某個明確的分類標簽,,對此必須有十分明確的界限。(比如,,某家制藥公司如果想用AI程序?qū)ふ宜幤系蔫Υ?,那么,,這家公司就應該將小于一定長度的劃痕明確定義為“無缺陷”,超過這個閾值的劃痕則被標記為“有缺陷”,,那么這個系統(tǒng)只需要少的訓練數(shù)據(jù)就能夠表現(xiàn)得很好,。)
吳恩達表示,要想減少數(shù)據(jù)不一致的情況,,企業(yè)可以將一個訓練數(shù)據(jù)集里的同樣圖像分配給不同的人來標記,,如果他們的標記結(jié)果不一致,設(shè)計系統(tǒng)的人就能夠進行更正,,或者干脆從訓練數(shù)據(jù)集里撤掉這個事例,。吳恩達還建議,那些編制數(shù)據(jù)集的人應該對標記方法做好說明,,并特別要對一些模楞兩可的事例做好追蹤,,因為它們有可能導致標記不一致的情況。任何不清晰或者容易導致混淆的事例都應該從數(shù)據(jù)集里剔除,。最后,,企業(yè)應該分析人工智能系統(tǒng)的錯誤,看看哪些子集中的事例最容易讓系統(tǒng)出錯,。有的時候只要在關(guān)鍵子集里添加一些事例,,比“大水漫灌”似的添加數(shù)據(jù)更容易提高系統(tǒng)的表現(xiàn)。他還指出,,AI用戶應該把數(shù)據(jù)編制,、數(shù)據(jù)改進和利用新數(shù)據(jù)反復訓練AI作為一個持續(xù)的循環(huán)過程,而不是一個一勞永逸的過程,。
咨詢公司埃森哲(Accenture)最近發(fā)布的一份關(guān)于人工智能應用的報告,,也將AI模型的構(gòu)建與訓練看作一個持續(xù)的循環(huán),而不是一個一勞永逸的過程,。該研究發(fā)現(xiàn),,在它調(diào)查的全球1200家公司中,只有12%的公司將它們的AI系統(tǒng)升級到了提高增長和業(yè)務轉(zhuǎn)型速度所需的程度,。(還有25%的企業(yè)也推進了AI系統(tǒng)的部署,,其他公司基本上還處于試點階段,。)這12%的公司與其他公司的區(qū)別在哪里呢,?首先在于它們有“工業(yè)化”的AI工具和流程,而且打造了強有力的AI核心團隊,。此外還有一些組織上的因素,,例如公司高管將AI作為戰(zhàn)略重點、大量投資于AI人才,、從一開始就負責任地設(shè)計了AI程序,,以及充分重視短期和長期AI項目,,等等。(財富中文網(wǎng))
譯者:樸成奎
吳恩達(Andrew Ng)是深度學習技術(shù)的先驅(qū)者之一,。所謂深度學習,,就是將大型神經(jīng)網(wǎng)絡(luò)應用于人工智能領(lǐng)域。就廣大企業(yè)應該如何利用人工智能技術(shù)的問題,,吳恩達也是最有發(fā)言權(quán)的專家,。吳恩達創(chuàng)辦了一家名為Landing AI的公司并自任首席執(zhí)行官。這家公司的軟件,,可以讓即使不懂編程的人,,也能夠輕松構(gòu)建和維護AI系統(tǒng)。這樣的話,,幾乎所有企業(yè)都可以使用AI技術(shù)了——尤其是計算機視覺應用,。目前,一些大型生產(chǎn)商,,例如工具制造商史丹利百德(StanleyBlack & Decker),、電子產(chǎn)品制造商富士康(Foxconn),以及汽車零部件制造商電裝公司(Denso)都已經(jīng)成了Landing AI的客戶,。
吳恩達是所謂“數(shù)據(jù)中心型AI”的倡導者,。他認為,隨著開源數(shù)據(jù)的普及和先進人工智能研究的發(fā)表,,尖端人工智能技術(shù)也變得越來越普及,。企業(yè)就算請不頂尖院校的計算機博士,也并不難獲得尖端的人工智能軟件代碼,,而且這些程序與谷歌(Google)或者美國國家航空航天局(NASA)使用的AI程序可能是一樣的,。那么,為什么有些公司能夠成功應用AI技術(shù),,有些公司則不能,?最大的區(qū)別在于,你用什么數(shù)據(jù)來訓練這個AI算法,,這些數(shù)據(jù)又是如何收集,、處理和管理的?吳恩達告訴我,,所謂的“數(shù)據(jù)中心型AI”,,就是要對數(shù)據(jù)進行“智能量化”,用盡量最少的數(shù)據(jù)來構(gòu)建一個成功的AI系統(tǒng),。他認為:“向數(shù)據(jù)中心型AI的轉(zhuǎn)型”是當今企業(yè)需要進行的最重要的轉(zhuǎn)型,,只有這樣才能充分發(fā)揮人工智能的優(yōu)勢。其重要性不亞于上一個10年向深度學習技術(shù)的轉(zhuǎn)型,。
吳恩達指出,,如果數(shù)據(jù)準備得當,,那么一家企業(yè)實際需要的數(shù)據(jù),就可能遠遠少于它們的想象,。有了正確的數(shù)據(jù),,哪怕企業(yè)只有幾十或者幾百個事例,訓練出的AI系統(tǒng)也將十分好用,,絲毫不亞于那些消費互聯(lián)網(wǎng)巨頭用幾十億個事例訓練出來的系統(tǒng),。他表示,將AI技術(shù)拓展到互聯(lián)網(wǎng)巨頭以外的企業(yè)的好處之一,,就是可以使用更小的數(shù)據(jù)集進行有效訓練,。
那么,什么樣的數(shù)據(jù)才是正確的數(shù)據(jù),?吳恩達認為,,首先要確保數(shù)據(jù)的“y系一致性”。也就是說,,某個事物是否會收到某個明確的分類標簽,,對此必須有十分明確的界限。(比如,,某家制藥公司如果想用AI程序?qū)ふ宜幤系蔫Υ?,那么,這家公司就應該將小于一定長度的劃痕明確定義為“無缺陷”,,超過這個閾值的劃痕則被標記為“有缺陷”,,那么這個系統(tǒng)只需要少的訓練數(shù)據(jù)就能夠表現(xiàn)得很好。)
吳恩達表示,,要想減少數(shù)據(jù)不一致的情況,,企業(yè)可以將一個訓練數(shù)據(jù)集里的同樣圖像分配給不同的人來標記,如果他們的標記結(jié)果不一致,,設(shè)計系統(tǒng)的人就能夠進行更正,,或者干脆從訓練數(shù)據(jù)集里撤掉這個事例。吳恩達還建議,,那些編制數(shù)據(jù)集的人應該對標記方法做好說明,,并特別要對一些模楞兩可的事例做好追蹤,因為它們有可能導致標記不一致的情況,。任何不清晰或者容易導致混淆的事例都應該從數(shù)據(jù)集里剔除,。最后,企業(yè)應該分析人工智能系統(tǒng)的錯誤,,看看哪些子集中的事例最容易讓系統(tǒng)出錯,。有的時候只要在關(guān)鍵子集里添加一些事例,比“大水漫灌”似的添加數(shù)據(jù)更容易提高系統(tǒng)的表現(xiàn),。他還指出,,AI用戶應該把數(shù)據(jù)編制、數(shù)據(jù)改進和利用新數(shù)據(jù)反復訓練AI作為一個持續(xù)的循環(huán)過程,,而不是一個一勞永逸的過程,。
咨詢公司埃森哲(Accenture)最近發(fā)布的一份關(guān)于人工智能應用的報告,也將AI模型的構(gòu)建與訓練看作一個持續(xù)的循環(huán),,而不是一個一勞永逸的過程,。該研究發(fā)現(xiàn),在它調(diào)查的全球1200家公司中,,只有12%的公司將它們的AI系統(tǒng)升級到了提高增長和業(yè)務轉(zhuǎn)型速度所需的程度,。(還有25%的企業(yè)也推進了AI系統(tǒng)的部署,其他公司基本上還處于試點階段,。)這12%的公司與其他公司的區(qū)別在哪里呢,?首先在于它們有“工業(yè)化”的AI工具和流程,而且打造了強有力的AI核心團隊,。此外還有一些組織上的因素,,例如公司高管將AI作為戰(zhàn)略重點、大量投資于AI人才,、從一開始就負責任地設(shè)計了AI程序,,以及充分重視短期和長期AI項目,等等,。(財富中文網(wǎng))
譯者:樸成奎
Andrew Ng is among the pioneers of deep learning—the use of large neural networks in A.I. He’s also one of the most thoughtful A.I. experts on how real businesses are using the technology. His company, Landing AI, where Ng is founder and CEO, is building software that makes it easy for people, even without coding skills, to build and maintain A.I. systems. This should allow almost any business adopt A.I. —especially computer vision applications. Landing AI’s customers include major manufacturing firms such as toolmaker StanleyBlack & Decker, electronics manufacturer Foxconn, and automotive parts maker Denso.
Ng has become an evangelist for what he calls “data-centric A.I.” The basic premise is that state-of-the-art A.I. algorithms are increasingly ubiquitous thanks to open-source repositories and the publication of cutting edge A.I. research. Companies that would struggle to hire PhDs from top computer science schools can nonetheless access the same software code that Google or NASA might use. The real differentiator between businesses that are successful at A.I. and those that aren’t, Ng argues, is down to data: What data is used to train the algorithm, how it is gathered and processed, and how it is governed? Data-centric A.I., Ng tells me, is the practice of “smartsizing” data so that a successful A.I. system can be built using the least amount of data possible. And he says that “the shift to data-centric A.I.” is the most important shift businesses need to make today to take full advantage of A.I.—calling it as important as the shift to deep learning that has occurred in the past decade.
Ng says that if data is carefully prepared, a company may need far less of it than they think. With the right data, he says companies with just a few dozen examples or few hundred examples can have A.I. systems that work as well as those built by consumer internet giants that have billions of examples. He says one of the keys to extending the benefits of A.I. to companies beyond the online giants is to use techniques that enable A.I. systems to be trained effectively from much smaller datasets.
What’s the right data? Well, Ng has some tips that include making sure that data is what he calls “y consistent.” In essence this means there should be some clear boundary between when something receives a particular classification label and when it doesn’t. (For example, take an A.I. designed to find defects in pills for a pharma company. This system will perform better from less training data if any scratch below a certain length is labelled “not defective,” and any scratch longer than that threshold is labelled “defective" than if there is no consistency in which scratch lengths are labelled defective.)
He says that one way to spot data inconsistencies is to assign the same images in a training set to multiple people to label. If their labels don’t agree, the person designing the system can make a call on the correct label or that example can be discarded from the training set. Ng also urges those curating data sets to clarify labeling instructions by tracking down ambiguous examples. These are tricky cases that are likely to lead to inconsistent labels. Any examples that are unclear or confusing should be eliminated from the data set altogether, he says. Finally, he says people should analyze the errors an A.I. system makes to figure out which subset of examples tend to trip the system up. Adding just a few additional examples in key data subsets leads to faster performance improvements than adding additional examples where the software is already doing well. He also says that A.I. users should see data curation, data improvement, and retraining the A.I. on updated data, as an on-going cycle, not something a user does only once.
The idea of thinking of the building and training of A.I. models as a continuous cycle, not a one-off project, also comes across in a recent report on A.I. adoption from consulting firm Accenture. It found that only 12% of 1,200 companies it looked at globally have advanced their A.I. maturity to the stage where they are seeing superior growth and business transformation. (Another 25% are somewhat advanced in their deployment of A.I., while the rest are still just running pilot projects if anything.) What sets that 12% apart? Well, one factor Accenture identifies is that they have “industrialized” A.I. tools and processes, and that they have created a strong A.I. core team. Other key factors are organizational too: they have top executives who champion A.I. as a strategic priority; they invest heavily in A.I. talent; they design A.I. responsibly from the start; and they prioritize both long- and short-term A.I. projects.