亚色在线观看_亚洲人成a片高清在线观看不卡_亚洲中文无码亚洲人成频_免费在线黄片,69精品视频九九精品视频,美女大黄三级,人人干人人g,全新av网站每日更新播放,亚洲三及片,wwww无码视频,亚洲中文字幕无码一区在线

首頁 500強 活動 榜單 商業(yè) 科技 商潮 專題 品牌中心
雜志訂閱

這家德國初創(chuàng)AI公司專注于分析表格數(shù)據(jù)

Jeremy Kahn
2025-02-23

Prior Labs融資900萬歐元,,用于構(gòu)建突破性AI模型,可以處理表格和電子數(shù)據(jù)表中的數(shù)據(jù),。

文本設(shè)置
小號
默認
大號
Plus(0條)

圖片來源:Photo courtesy of Prior Labs

企業(yè)內(nèi)部的許多信息都是以行和列呈現(xiàn)的所謂的“表格數(shù)據(jù)”,。例如報告中的電子數(shù)據(jù)表,、數(shù)據(jù)庫條目與大量圖表等。

事實證明,,由于多個原因,,人工智能模型很難處理表格數(shù)據(jù)。表格中有時是文本,,有時是數(shù)字,,而且數(shù)字還有不同的計量單位,,可以說是令人困惑的大雜燴。此外,,表格中不同單元格之間的關(guān)系有時候并不明確,。要了解各單元格之間的相互影響,需要具備專業(yè)知識,。

多年來,,機器學(xué)習(xí)研究人員一直在努力解決表格數(shù)據(jù)的分析問題。現(xiàn)在,,一組研究人員聲稱他們找到了一個優(yōu)雅的解決方案:一個大型基礎(chǔ)模型,。這個模型類似于支持OpenAI的ChatGPT等產(chǎn)品的大語言模型,但專門使用表格數(shù)據(jù)進行訓(xùn)練,。這個預(yù)訓(xùn)練模型可以應(yīng)用于任何表格數(shù)據(jù)集,,只需幾個示例,就能準(zhǔn)確推斷各單元格數(shù)據(jù)之間的關(guān)系,,并且比以往任何機器學(xué)習(xí)方法都能更好地預(yù)測缺失數(shù)據(jù),。

弗蘭克·哈特和諾亞·霍爾曼是兩位來自德國的計算機科學(xué)家,他們幫助開創(chuàng)了這種技術(shù),,并最近在著名的科學(xué)期刊《自然》(Nature)上發(fā)表了一篇論文,。他們選擇與有金融從業(yè)經(jīng)驗的蘇拉吉·甘比爾合作,創(chuàng)辦了一家名為Prior Labs的初創(chuàng)公司,,致力于將該技術(shù)商業(yè)化,。

近期,總部位于德國弗萊堡的Prior Labs宣布已獲得900萬歐元(930萬美元)種子前融資,。這輪融資由總部位于倫敦的風(fēng)險投資公司Balderton Capital領(lǐng)投,參投方包括XTX Ventures,、SAP創(chuàng)始人漢斯·沃納-赫克托的赫克托基金(Hector Foundation),、Atlantic Labs和Galion.exe。Hugging Face聯(lián)合創(chuàng)始人兼首席科學(xué)家托馬斯·沃爾夫,、Snyk和Tessl的創(chuàng)始人蓋伊·伯德扎尼,,以及著名的DeepMind研究員艾德·格里芬斯泰特等知名天使投資人也參與了此次融資。

Balderton Capital合伙人詹姆斯·懷斯在解釋為什么決定投資Prior Labs的一份聲明中表示:“表格數(shù)據(jù)是科學(xué)和商業(yè)的支柱,,但顛覆了文本,、圖像和視頻領(lǐng)域的AI革命對表格數(shù)據(jù)的影響微乎其微——直到現(xiàn)在?!?

Prior Labs在《自然》雜志上發(fā)表的研究報告中使用的模型被稱為Tabular Prior-Fitted Network(簡稱 TabPFN),。但 TabPFN的訓(xùn)練僅使用了表格中的數(shù)值數(shù)據(jù),而不是文本數(shù)據(jù),。Prior Labs公司的AI研究員弗蘭克·哈特曾任職于弗萊堡大學(xué)(University of Freiburg)和圖賓根埃利斯研究所(Ellis Institute Tubingen),。他表示,,Prior Labs希望將這個模型變成多模態(tài),使它既能理解數(shù)字,,也能理解文本,。然后該模型將能夠理解列標(biāo)題并進行推理,用戶也可以像使用基于大語言模型的聊天機器人一樣,,用自然語言提示與AI系統(tǒng)互動,。

目前的大語言模型,即使是如OpenAI 的o3等更先進的推理模型,,雖然可以回答一些關(guān)于表格內(nèi)容的問題,,但它們無法根據(jù)對表格數(shù)據(jù)的分析做出準(zhǔn)確預(yù)測。哈特表示:“大語言模型在這方面表現(xiàn)得非常糟糕,。它們在這方面的效果遠不及預(yù)期,,且分析速度緩慢?!苯Y(jié)果,,大多數(shù)需要分析這類數(shù)據(jù)的人都使用了舊的統(tǒng)計方法,這些方法速度快,,但并不總是最準(zhǔn)確的,。

但Prior Labs的TabPFN能夠做出精準(zhǔn)預(yù)測,包括處理所謂的"時間序列"數(shù)據(jù)——這類預(yù)測基于復(fù)雜模式,,利用歷史數(shù)據(jù)推斷下一個最可能的數(shù)據(jù)點,。根據(jù)Prior Labs團隊1月發(fā)布在非同行評審研究平臺arxiv.org上的新論文顯示,TabPFN在時間序列預(yù)測方面的表現(xiàn)優(yōu)于現(xiàn)有模型:較同類最佳小型AI模型預(yù)測準(zhǔn)確率提升7.7%,,甚至超越比其大65倍的模型3%,。

時間序列預(yù)測在各行各業(yè)應(yīng)用廣泛,尤其是醫(yī)療和金融等領(lǐng)域,。哈特透露:“對沖基金對我們青睞有加,。”(事實上,,一家對沖基金已成為其首批客戶(因保密協(xié)議無法透露名稱),,另一家正在概念驗證階段的客戶是軟件巨頭SAP。)

Prior Labs以開源形式發(fā)布TabPFN模型,,唯一許可要求是使用者必須公開聲明模型來源,。哈特稱,該模型下載量已達約百萬次,。與多數(shù)開源AI公司類似,,Prior Labs計劃的盈利模式聚焦于針對客戶的用例定制模型,并為特定市場開發(fā)工具和應(yīng)用,。

Prior Labs并不是唯一致力于突破AI在表格數(shù)據(jù)方面限制的公司,。由麻省理工學(xué)院(MIT)數(shù)據(jù)科學(xué)家德瓦弗拉特·沙阿創(chuàng)立的Ikigai Labs和法國初創(chuàng)公司Neuralk AI等正嘗試將深度學(xué)習(xí)(包括生成式AI)應(yīng)用于表格數(shù)據(jù),,谷歌(Google)和微軟(Microsoft)的研究團隊也在攻克這一難題。谷歌云的表格數(shù)據(jù)解決方案部分基于AutoML技術(shù)(該技術(shù)使用機器學(xué)習(xí),,將創(chuàng)建有效AI模型所需的步驟自動化,,哈特曾是該領(lǐng)域的先驅(qū))。

哈特表示,,Prior將持續(xù)升級模型:重點開發(fā)關(guān)系型數(shù)據(jù)庫支持,、增強時間序列分析能力,構(gòu)建“因果發(fā)現(xiàn)”功能(識別表格數(shù)據(jù)間的因果關(guān)系),,并推出可通過聊天界面回答表格問題的交互功能,。他表示:“我們將在第一年實現(xiàn)這些目標(biāo)?!保ㄘ敻恢形木W(wǎng))

譯者:劉進龍

審校:汪皓

企業(yè)內(nèi)部的許多信息都是以行和列呈現(xiàn)的所謂的“表格數(shù)據(jù)”,。例如報告中的電子數(shù)據(jù)表、數(shù)據(jù)庫條目與大量圖表等,。

事實證明,,由于多個原因,人工智能模型很難處理表格數(shù)據(jù),。表格中有時是文本,,有時是數(shù)字,而且數(shù)字還有不同的計量單位,,可以說是令人困惑的大雜燴,。此外,表格中不同單元格之間的關(guān)系有時候并不明確,。要了解各單元格之間的相互影響,,需要具備專業(yè)知識。

多年來,,機器學(xué)習(xí)研究人員一直在努力解決表格數(shù)據(jù)的分析問題?,F(xiàn)在,一組研究人員聲稱他們找到了一個優(yōu)雅的解決方案:一個大型基礎(chǔ)模型,。這個模型類似于支持OpenAI的ChatGPT等產(chǎn)品的大語言模型,但專門使用表格數(shù)據(jù)進行訓(xùn)練,。這個預(yù)訓(xùn)練模型可以應(yīng)用于任何表格數(shù)據(jù)集,,只需幾個示例,就能準(zhǔn)確推斷各單元格數(shù)據(jù)之間的關(guān)系,,并且比以往任何機器學(xué)習(xí)方法都能更好地預(yù)測缺失數(shù)據(jù),。

弗蘭克·哈特和諾亞·霍爾曼是兩位來自德國的計算機科學(xué)家,他們幫助開創(chuàng)了這種技術(shù),,并最近在著名的科學(xué)期刊《自然》(Nature)上發(fā)表了一篇論文,。他們選擇與有金融從業(yè)經(jīng)驗的蘇拉吉·甘比爾合作,,創(chuàng)辦了一家名為Prior Labs的初創(chuàng)公司,致力于將該技術(shù)商業(yè)化,。

近期,,總部位于德國弗萊堡的Prior Labs宣布已獲得900萬歐元(930萬美元)種子前融資。這輪融資由總部位于倫敦的風(fēng)險投資公司Balderton Capital領(lǐng)投,,參投方包括XTX Ventures,、SAP創(chuàng)始人漢斯·沃納-赫克托的赫克托基金(Hector Foundation)、Atlantic Labs和Galion.exe,。Hugging Face聯(lián)合創(chuàng)始人兼首席科學(xué)家托馬斯·沃爾夫,、Snyk和Tessl的創(chuàng)始人蓋伊·伯德扎尼,以及著名的DeepMind研究員艾德·格里芬斯泰特等知名天使投資人也參與了此次融資,。

Balderton Capital合伙人詹姆斯·懷斯在解釋為什么決定投資Prior Labs的一份聲明中表示:“表格數(shù)據(jù)是科學(xué)和商業(yè)的支柱,,但顛覆了文本、圖像和視頻領(lǐng)域的AI革命對表格數(shù)據(jù)的影響微乎其微——直到現(xiàn)在,?!?

Prior Labs在《自然》雜志上發(fā)表的研究報告中使用的模型被稱為Tabular Prior-Fitted Network(簡稱 TabPFN)。但 TabPFN的訓(xùn)練僅使用了表格中的數(shù)值數(shù)據(jù),,而不是文本數(shù)據(jù),。Prior Labs公司的AI研究員弗蘭克·哈特曾任職于弗萊堡大學(xué)(University of Freiburg)和圖賓根埃利斯研究所(Ellis Institute Tubingen)。他表示,,Prior Labs希望將這個模型變成多模態(tài),,使它既能理解數(shù)字,也能理解文本,。然后該模型將能夠理解列標(biāo)題并進行推理,,用戶也可以像使用基于大語言模型的聊天機器人一樣,用自然語言提示與AI系統(tǒng)互動,。

目前的大語言模型,,即使是如OpenAI 的o3等更先進的推理模型,雖然可以回答一些關(guān)于表格內(nèi)容的問題,,但它們無法根據(jù)對表格數(shù)據(jù)的分析做出準(zhǔn)確預(yù)測,。哈特表示:“大語言模型在這方面表現(xiàn)得非常糟糕。它們在這方面的效果遠不及預(yù)期,,且分析速度緩慢,。”結(jié)果,,大多數(shù)需要分析這類數(shù)據(jù)的人都使用了舊的統(tǒng)計方法,,這些方法速度快,但并不總是最準(zhǔn)確的。

但Prior Labs的TabPFN能夠做出精準(zhǔn)預(yù)測,,包括處理所謂的"時間序列"數(shù)據(jù)——這類預(yù)測基于復(fù)雜模式,,利用歷史數(shù)據(jù)推斷下一個最可能的數(shù)據(jù)點。根據(jù)Prior Labs團隊1月發(fā)布在非同行評審研究平臺arxiv.org上的新論文顯示,,TabPFN在時間序列預(yù)測方面的表現(xiàn)優(yōu)于現(xiàn)有模型:較同類最佳小型AI模型預(yù)測準(zhǔn)確率提升7.7%,,甚至超越比其大65倍的模型3%。

時間序列預(yù)測在各行各業(yè)應(yīng)用廣泛,,尤其是醫(yī)療和金融等領(lǐng)域,。哈特透露:“對沖基金對我們青睞有加?!保ㄊ聦嵣?,一家對沖基金已成為其首批客戶(因保密協(xié)議無法透露名稱),另一家正在概念驗證階段的客戶是軟件巨頭SAP,。)

Prior Labs以開源形式發(fā)布TabPFN模型,,唯一許可要求是使用者必須公開聲明模型來源。哈特稱,,該模型下載量已達約百萬次,。與多數(shù)開源AI公司類似,Prior Labs計劃的盈利模式聚焦于針對客戶的用例定制模型,,并為特定市場開發(fā)工具和應(yīng)用,。

Prior Labs并不是唯一致力于突破AI在表格數(shù)據(jù)方面限制的公司。由麻省理工學(xué)院(MIT)數(shù)據(jù)科學(xué)家德瓦弗拉特·沙阿創(chuàng)立的Ikigai Labs和法國初創(chuàng)公司Neuralk AI等正嘗試將深度學(xué)習(xí)(包括生成式AI)應(yīng)用于表格數(shù)據(jù),,谷歌(Google)和微軟(Microsoft)的研究團隊也在攻克這一難題,。谷歌云的表格數(shù)據(jù)解決方案部分基于AutoML技術(shù)(該技術(shù)使用機器學(xué)習(xí),將創(chuàng)建有效AI模型所需的步驟自動化,,哈特曾是該領(lǐng)域的先驅(qū)),。

哈特表示,Prior將持續(xù)升級模型:重點開發(fā)關(guān)系型數(shù)據(jù)庫支持,、增強時間序列分析能力,,構(gòu)建“因果發(fā)現(xiàn)”功能(識別表格數(shù)據(jù)間的因果關(guān)系),并推出可通過聊天界面回答表格問題的交互功能,。他表示:“我們將在第一年實現(xiàn)這些目標(biāo),。”(財富中文網(wǎng))

譯者:劉進龍

審校:汪皓

A lot of information inside companies is what’s known as “tabular data,” or data that is presented in rows and columns. Think spreadsheets and database entries and lots of figures in reports.

Well, it turns out that artificial intelligence models have difficulty working with tabular data, for several reasons. It’s often a confusing jumble—sometimes text and sometimes numbers, as well as numbers in different units of measurement. What’s more, the relationship between different cells in the table is sometimes unclear. Knowing which cells influence which other cells in a table often requires domain expertise.

For years, machine learning researchers have been trying to crack this tabular data problem. Now, a group of researchers has found what they claim is an elegant solution: A large foundation model—similar to the large language models that underpin products like OpenAI’s ChatGPT—but specifically trained on tabular data. This pre-trained model can then be applied to any tabular data set, and with just a few examples, make accurate inferences about the relationship between data in various cells and also predict missing data better than any prior machine learning method.

Frank Hutter and Noah Hollman, two Germany-based computer scientists who helped pioneer this technique and recently published a paper on it in the prestigious scientific journal Nature, have teamed with Sauraj Gambhir, who has experience in finance, on a startup called Prior Labs dedicated to commercializing this technology.

Today Prior Labs, which is based in Freiburg, Germany, announced it has received 9 million euros ($9.3 million) in pre-seed funding. The round is led by London-based venture capital firm Balderton Capital along with XTX Ventures, SAP founder Hans Werner-Hector’s Hector Foundation, Atlantic Labs, and Galion.exe. A number of prominent angel investors, including Hugging Face cofounder and chief scientist Thomas Wolf, Guy Podjarny, who founded Snyk and Tessl, and Ed Grefenstette, a well-known DeepMind researcher, also participated in the funding.

“Tabular data is the backbone of science and business, yet the AI revolution transforming text, images and video has had only a marginal impact on tabular data–until now,” James Wise, a partner at Balderton Capital, said in a statement, explaining why the firm decided to invest in Prior Labs.

The model Prior Labs used for its Nature study is called a Tabular Prior-Fitted Network (TabPFN for short.) But TabPFN is trained only on the numerical data in tables, not the text. Hutter, a well-known AI researcher formerly at the University of Freiburg and the Ellis Institute Tubingen, said Prior Labs wants to take this model and make it multimodal, so that it can understand both numbers and text. Then the model will be able to understand column headings and reason about them, and users will be able to interact with the AI system using natural language prompts, just like an LLM-based chatbot.

Today’s LLM’s, even the more advanced reasoning models, such as OpenAI’s o3 model, can answer some questions about what a table says, but they can’t make accurate predictions based on an analysis of the data in the table. “LLMs are just horrible at that,” Hutter said. “It’s like, it’s nowhere close. It’s not only that, it’s also super slow.” As a result, most people who needed to analyze this kind of data used older statistical methods that were fast, but not always the most accurate.

But Prior Labs’ TabPFN can make accurate predictions, including on what are called time series, where past data is used to predict the next most likely data point based on complex patterns. In a new paper the Prior Labs team published in January on the non-peer reviewed research repository arxiv.org, the team found that TabPFN outperformed existing time series prediction models. It beat the best previous small AI model for such predictions by 7.7% and beat a model that is 65 times larger than TabPFN by 3%.

Time series prediction has many applications across industries, but especially in medical and financial domains. “Hedge funds love us,” Hutter said. (One of Prior Labs’ initial customers is, in fact, a hedge fund, but Hutter said he was contractually barred from saying which one. Another initial customer with which Hutter is doing a proof of concept is software giant SAP.)

Prior Labs is offering TabPFN as an open source model—with the only license requirement being that if people use the model, they must publicly say so. So far, it has been downloaded about one million times, according to Hutter. Like most open source AI companies, Prior Labs plans to make money by working with specific customers to help them tailor the models to their use case and also by building tools and applications for specific market segments.

Prior Labs is not the only company working to crack AI’s limits when it comes to tabular data. Startups Ikigai Labs, which was founded by MIT data scientist Devarat Shah, and French startup Neuralk AI are among others working on applying deep learning methods, including generative AI, to tabular data. Researchers at Google and Microsoft have also been working on this problem. Google Cloud’s tabular data solutions are built in part on AutoML, a process that uses machine learning to automate the steps needed to create effective AI models, an area that Hutter helped pioneer.

Hutter said Prior intends to keep improving its models, working more on relational databases, time series, and building the ability to do what is called “causal discovery”—where a user asks which data points in a table have a causal relationship with other data in the table. Then there’s the chat feature that will let users ask questions of the tables using a chat-like interface. “All of this we will build in the first year,” he said.

財富中文網(wǎng)所刊載內(nèi)容之知識產(chǎn)權(quán)為財富媒體知識產(chǎn)權(quán)有限公司及/或相關(guān)權(quán)利人專屬所有或持有,。未經(jīng)許可,禁止進行轉(zhuǎn)載,、摘編,、復(fù)制及建立鏡像等任何使用,。
0條Plus
精彩評論
評論

撰寫或查看更多評論

請打開財富Plus APP

前往打開
熱讀文章