Scoring my 2019 Predictions

Here’s my Annual Review.

I mark each True / False, and sometimes include [notes at the end.] Obviously, not all predictions can be scored yet, but this is for the items I do know. (I will remove /modify this line after I’ve updated again.)

My PRELIMINARY overall brier score for the year is: 0.1487

My PRELIMINARY calibration for the year is:

[50–60%) / (40–50%] 2 Correct, 0 Incorrect (100%)
[60–70%) / (30–40%] 10 Correct, 3 Incorrect (77%)
[70–80%) / (20–30%] 8 Correct, 2 Incorrect (80%)
[80–90%) / (10–20%] 8 Correct, 1 Incorrect (89%)
[90–95%) / (5–10%] 1 Correct, 0 Incorrect (100%)
[95–99%) / (1–5%] 4 Correct, 1 Incorrect (80%)
<99% \ >1% 0 Correct, 0 Incorrect (N/A)

My 3-year (2017–2019) calibration curve:


True/True/True/False/False — Trump’s RCP average approval rating on 1/1/20 is above 30%/35%/40%/45%/50%, respectively: 95% / 85% / 50% / 40% / 5%. [Low of 40.8 in February, High of 45.3 in December. I’d give myself more credit if I thought this hadn’t been fairly obvious.]

True — Trump still president at end of year: 96% {90%} (Note: I was predicting this question before VFP, but they included it.) [Impeachment happened, removal is unlikely, but I win by default due to timing.]

False — VFP: No Democratic presidential candidate will become a clear frontrunner (Predictwise probability of nomination >50%) in the political prediction markets at any point in 2019: 75% {60%} [Warren got there before crashing, which is really annoying. But she isn’t anymore. Still, Dylan already admitted defeat here.]

True — VFP: The US will not enter a recession: 65% {80%} (My scoring assumes we use NBER’s retrospective peak month. They usually delay announcing for about a year, so this likely can’t be scored until 2020.) [Edit: Vox says no, which is likely true, so I’ll score it.]

True — VFP: Congress will not authorize funding for a full-length border wall: 98% {95%} (“Full length” is cheating.)

False — Added Q: Congress will authorize funding for a border walls of at least $5.7bn: 15%

True — VFP: US homicides will decline: 75% {80%} [We won’t know this with high reliability for a while (World bank Global development) — Edit: Vox says they did, which is likely true, so I’ll score it.]


True — Added: Brexit will be delayed past March 29th (or cancelled): 51%

True — VFP: Narendra Modi will continue as Indian prime minister after the 2019 elections 70% {60%} (I’m not better informed than Dylan and Kelsey, but I have a stronger trust in polls + stronger prior that dislike of the opposition will translate into a win.)

True — VFP: Neither India nor China will enter a recession: 80% {70%} (Similar to Dylan’s reasoning, but stronger. But joint questions are annoying.)

True — Added: India will not enter a recession: 85%

True — Added: China will not enter a recession: 85%


True[-ish] — Netanyahu is prime minister after the Israeli elections: 80% (N/A)

True — Netanyahu’s party gets the most votes: 85%. [26.4% to 26.1% — I was fairly lucky here. I didn’t really think about coalition formation and how votes get split. If Meretz of Labor had joined Blue and White, his coalition could have won easily, and his party still could have gotten beaten handily.]

True/False — Jewish home passes threshold / gets 6 seats: 60% / 35% [5 Seats]

True — Arab parties (total) seats decline from 11: 70%. (Splitting is dumb, but seems inevitable.) [They split, ended up with a combined 7.82%, and 10 seats. In the second election they got 10.60% of the vote, and 13 seats.]


True — VFP: No additional countries will adopt a universal basic income: 80% {90%} (There are lots of countries that might do something, and the idea is gaining traction, so I’m hedging.)

(Likely) True — VFP: More animals will be killed for US human consumption in 2019 than in 2018: 75% {60%} (The trend is strong, the economy is fine. I’m confused that they have their probability so low.) [I don’t know where to find the information to resolve this. Edit: Vox says “Likely Correct”, so I’ll score it.]


True — VFP: Fully autonomous self-driving cars will not be commercially available as taxis or for sale: 70% {90%} (Even if they aren’t price competitive, there’s a huge cachet in being first to market. Someone wants to do it, even if the tech is still too expensive. But they say “real commercial product,” so they might hedge if it’s offered but far too expensive, etc.) [Not there yet. I was optimistic.]

True — VFP: DeepMind will release an AlphaZero update, or new app, capable of beating humans and existing computer programs at a task in a new domain: 60% {50%} (AlphaGo was October 2015, Alphazero was Dec. 2017. I assume they have more projects that are in the works — unclear if they will release them.) [Resolution: Yes— and this was quick! AlphaStar was released Jan 24.]


True — VFP: Global carbon emissions will increase: 80% {80%} [Growth rate of atmospheric CO2 is up since 2018, but awaiting data on emissions. Seems clearly to be True, but waiting for now. Edit: Vox called it, so I’ll score it.]]

My earlier long-term predictions:

(2017) The stock market [Edit: S&P] will go down under President Trump (Conditional on him having a 4 year term, Inauguration-Inauguration) — 60%(no change, was 60%. But I’m affirming because which split congress usually means markets go up, I have greater concerns. I’m updating based partly on results so far, with markets up, and partly on my suspicions that the current gyrations will get worse, and that the current economic mismanagement really is a big problem.)
(Update — This looks very unlikely.) [Yeah, I was wrong here too.]

(2018) The retrospective consensus of economists about the 2017 tax bill will be;
Unresolved — …didn’t increase GDP growth more than 0.2%: 96% (was: 95%)
Unresolved — …that, after accounting for growth, it increased the 10-year deficit more than $1tr / $1.2tr / $1.5tr, respectively: 93% / 80% / 45% (was: 90% / 70% / 40%. [But see recent article about how poorly it’s working out. Here also, “The Fed spent most of last year concerned that Trump’s tax cuts would spur a hot economy and rising inflation. Fed leaders anticipated they would need to raise interest rates twice in 2019 to tap the brakes on the economy. None of that came to pass.”]

True — (2018) The House will vote to impeach Trump before the end of his current term: 75% (was 65%) Note: 50% vote needed.
[Update: Called it.]

Unresolved — (2018) Conditional on impeachment, the senate will convict: 10% (was 20%) Note: 67% vote needed. (Most uncertainty is if he does something additionally crazy, crazy enough to prompt short term worries about safety/stability.) [We’ll see. But I was underestimating the crazy that people would be OK with, and also underestimating how likely he was to do something that’s objectively and clearly illegal misuse of power.]

(2018 — I neglected to include this before, but now carried over for scoring.) AI wins a Real Time Strategy game (RTS — Starcraft, etc.) in full-mode against the best human players before end of;
False — 2019–45% [OpenAI Five for DOTA didn’t play full mode. And the early-2019 “victory” of AlphaStar in StarCraft was an automation win based on superspeed, not strategy. That was possible years ago. The more reasonable play in the Starcraft ladder clearly showed that it can’t beat the *best* human players quite yet — though it’s VERY close. But I’m biased here, so maybe this should be true?]
Unresolved — 2020–60% [I now think this is too low. I’ll update for next years’ predictions.]
Unresolved — Within Byun Hyun Woo’s Lifetime: 98%


True — I have (some) official academic affiliation: 60%.

True/False/False/True — I have an affiliation with: F-O/CC-C/Te-I/Other: 40% / 40% / 30% / 20%

True — My multi-Agent Goodhart paper is accepted into the special issue: 60%
[Eventually. This took FOREVER.]

True/True/True — I publish or submit pre-prints of at least 1/2/3 more papers: 90%/80%/60%.
[4 have been submitted, two rejected, another already submitted to a new journal. I forgot how ridiculously long this takes.]

False/False/False — My Google Scholar H-index hits 7 / 8 / 9: 65% / 35% / 5% [No. Argh. Scholar Coverage is annoying, and citations come in slowly — I’m a single citation on one paper away from 7. (I could Goodhart this and combine 2 different publications on the same project to hit 7, but I won’t.)]

True/True — My actual (no-self cites, includes non-google sources) H-index hits 6 / 7 : 70% / 30%