大數據的預測盲區(qū)
????預測大選的時候,,把你的個人政治理念從工作中拋開會不會很困難? ????無論我們干哪一個行業(yè),,都很難保持客觀,。沒有人能左右現實,我們多多少少有些厭世的觀點,。不過我認為在體育上的訓練對我是有幫助的,,比如我雖然可以像小時候一樣做底特律猛虎隊(Detroit Tigers)的粉絲,但是我仍然認為洛杉磯天使隊(Los Angeles Angels)的麥克?特勞特才應該當選為去年的最有價值球員,。不過我認為政治有一點不同,,這個行業(yè)里的很多人不光有自己的觀點,且而還習慣于左右大眾的觀點,。他們習慣性地認為,,他們可以創(chuàng)造他們自己的現實。這就是為什么我認為有時候正確理解政治語言有困難,。 ??? 有些人會想,,如果我編出一個事實,或是編造一個民調數據,,問題就解決了,。而政治媒體圈里雖然有好人,但是也有人非常聽話,,而且樂于把政客在拉票活動上說的鬼話傳播出來,。我認為這就是問題所在。跟體育相比,,人們在政治問題上不習慣檢查一下現實,。 ????那么你是怎樣篩選信息,挑出那些“鬼話”的,? ????重點是忽略政治人物說的話,,堅持使用能公開獲得的數據,。記錄顯示,大多數政治觀察家一般愛把政治人物的一次失態(tài)或一場辯論看得太重了——當然總有例外,,不過大體上民意調查數據還是提供了一個較為可靠的標準,。至于老百姓,,他們有自己的生活,,也不總是消費政治新聞,。他們衡量事物的方式非常復雜,比如他們會考慮經濟問題,,或者政府是不是讓我們卷入了一場愚蠢的戰(zhàn)爭,又或者政府是不是出了什么大丑聞,。這些因素才能幫助我們解釋最終是誰贏得了大選,,而不是政治評論家們關注的那些勁爆花邊。 ????現在的數據比以前多了,。你在選擇數據的時候,,怎樣確定哪些數據才能正確回答你的問題? ????其中一點是,,你需要一個系統(tǒng),,而不是一次性的做法。我們在2008年設計了一個模型,,在2012年進行了升級,,我們用它來對每次民意調查進行分析。如果有些民調機構以往的信用很好,,它在系統(tǒng)中就會占有更大的權重,。并不是說其它民調就會被忽視。不是說我們只盯著一份民調,,然后伸出手指說:“這份民調很重要,,那份不重要?!被旧纤械碾y題和所有的決策過程都來自設計模型的過程,。根據理論、實際和以往的經驗,,怎樣設計一系列好的規(guī)則來處理這些信息,?這個問題最重要,然后堅持這些標準,。我們在每年6月推出這個模型后,,就不會再更改了,除非模型里有bug,,幸運的是到現在還沒有發(fā)現,。我們的基本原則始終不變,,然后你再在這個規(guī)矩方圓里分析數據。 |
????Is it hard to keep your own political beliefs separate from your work predicting elections? ????It's always hard for us to be objective in any walk of life. None of us has a monopoly on reality, we all have rather jaded points of view. I do think the sports training helps though, where I can be a Detroit Tigers fan as I am [and was] growing up, I still thought Mike Trout [Los Angeles Angels] should have won the MVP award last year. What I think differentiates politics a bit is that you have an industry full of people who not only have views but are [also] used to manipulating public opinion. They're used to thinking they can create their own reality. That's why I think you have such trouble on the uptake there. ????People think that, well, if I can spin a fact a certain way or spin polls a certain way, [the problem] goes away. When you have a political press where some people are very good, but some other people are very compliant and happy to pass along spin from the campaigns, I think that's the issue. People aren't used to getting a reality check in politics as much as in sports. ????So how are you able to sift through that information then to pick out the BS? ????The idea is to ignore what the politicians say and stick with publically available data. The record shows that in general, most political observers tend to overrate the importance of a gaffe or a debate -- there are always exceptions -- but in general the polls provide a pretty reliable benchmark. And the public, who have real lives and are not constantly consuming political news, are [sometimes] weighing things in a very sophisticated way where they're looking at things like the economy or are we involved in any stupid wars or major scandals from the administration. Those are the things that explain a lot about who wins the elections and not so much the petty stuff that the political pundits can focus on. ????There is more data now than ever before. How are you able to determine which information to pull in order to properly answer your question? ????Part of it is that you do need -- as Vegas might say -- you do need a system instead of an ad hoc way of doing it. So we have a model that we designed in 2008 that was updated for 2012 that was designed to account for every single poll. Some polls, if they're from a pollster that has a better track record, get more weight in the system. It doesn't mean that others are ignored. So it's not like we're just looking at a poll and sticking our fingers up in the air and saying, "Oh that poll is important, and that poll's not." Basically all the hard work and all the decision-making process comes from designing this model before the fact. Based on theory and practice and past experience, what are a good set of rules for processing this information? And then sticking to that. We don't make any alterations to the model once we launch it in June every year, unless there's a bug, which fortunately there hasn't been. But the principles are always the same, and then you have a disciplined way to analyze data in that context. |