I Tried the Thing

I Let the Model Ruin My Betting Card

The Participatory Lab Rat

A first-person experiment in letting a model challenge favorite bets, expose lazy assumptions, and leave one human slightly annoyed but better organized.

A first-person experiment in letting a model challenge favorite bets, expose lazy assumptions, and leave one human slightly annoyed but better organized.

George Plimptonic The Participatory Lab Rat 7 min read

I opened the workflow with three bets I liked and the noble hope that the numbers would admire my taste. A model is most useful when it creates friction before the bet, not when it decorates the recap afterward. This is George Plimptonic's corner of the Desk: useful opinion with the workflow exposed before it becomes a receipt. The goal is not to make the bet sound cooler. The goal is to make the decision easier to repeat when the market, the app, or the group chat starts acting theatrical.

The noble plan, the three bets I liked, and the model that disagreed with all of them

I opened the workflow with three bets I liked and the polite hope that the model would admire my taste. The first bet was a road favorite I had been writing about all week; the second was an over on what felt like a fun primetime game; the third was a teaser with two key numbers I had decided constituted research. The model returned a score for each. Two of the three sat in the red. The third was tagged neutral, which was the model s polite way of telling me the bet was a coin flip with a hold attached.

My first instinct was to assume the model was confused. I checked the inputs. The inputs were correct. I checked the version. The version was current. I made a coffee and came back to the screen, hoping the numbers had reconsidered. They had not. This was the first useful lesson of the experiment, which I did not yet recognize as a lesson because it felt like an irritant. A tool that you ask to confirm you is a tool you have downgraded into a co-worker. The point of building a tool is to have a colleague who will occasionally tell you no.

83
Model-vs-me disagreements logged in one NFL season

Across the 2024 NFL regular season I logged 83 disagreements between my pregame card and the model s output. Each disagreement was recorded with the bet, my reasoning, the model s reasoning, and the action I took.

Source: personal disagreement log (1 user, 2024) + model_predictions

Where the model and I disagreed, who was rightDistribution of 83 model-vs-me disagreements logged across the 2024 NFL season. The model was right more often than I was, by a clear margin.07.7515.523.2531Model right, I was wrong (passed bet)31Both wrong (passed)17Model wrong, I was right (took bet)14Disagreement was noise (no clear winner)12Both right (took bet)9OUTCOME OF DISAGREEMENTCOUNTmodel_predictions + personal disagreement log (1 user, 2024)

Eighty-three disagreements; thirty-one of them were bets I would have placed and the model talked me out of. That row is the experiment.

The mechanic of friction, and why it works

The model is not a wise teacher. It is a friction device. Most of what it does, when integrated into a betting workflow, is slow the bettor down at the moment of placement. It asks for an explicit input where the bettor would otherwise have used a vibe. It prints a number where the bettor would otherwise have used a feeling. It shows a probability where the bettor would otherwise have used a story. None of these substitutions is magic. They are the small inconveniences that prevent the bettor from clicking through her own gut while it is still warm.

The point of the friction is not that the model is always right. It is that the bettor is always interruptable. The first useful output of any model, in any field, is the interruption itself. The interruption produces a sentence — what does the model see that I do not? — and the sentence produces a small audit. Most of the bets I passed on, after the audit, I do not regret passing on. Some of the bets I took anyway, after the audit, I do not regret taking. The audit is the product. The pass or the placement is the result.

How I responded to the disagreement mattered more than who won itEffective ROI across the 83 disagreements, grouped by my response strategy. Passing was a quiet winner; inverting was the loud one.-100.0%-50.0%0.0%50.0%100.0%Inverted (took model side)8.3%Passed on disagreement4.1%Sized down by 50%2.2%Took my side regardless-5.8%RESPONSE STRATEGYROI PER DISAGREEMENTpersonal disagreement log (1 user, 2024)

The difference between passing on disagreements (+4.1% ROI) and powering through them (-5.8% ROI) is most of the season s P&L gap.

What the disagreement log recorded that my memory would not have

I kept the disagreement log in a single tab with six columns: date, bet, my reasoning, model output, response taken, and a result column updated at settlement. The first observation, after about three weeks, was that the model was right considerably more often than my pre-experiment intuition had estimated. I had assumed the disagreements would split roughly fifty-fifty. The actual split, when I cleaned the log at the end of the year, was closer to thirty-one to fourteen in the model s favor on the disagreements where one of us was clearly wrong. The other thirty-eight disagreements were noise or shared errors, which is itself a lesson — most disagreements are not the binary I had imagined.

The second observation was about my response strategy. The disagreements I passed on returned a small positive ROI, on average. The disagreements I took my side on anyway returned a small negative ROI. The disagreements I sized down on were essentially flat. The disagreements I inverted — took the model s side instead of my own — returned the highest ROI of any response, but those were rare enough that I do not consider the number statistically meaningful. The robust finding was simpler: when the model and I disagreed, the pass was usually the right action, and the click was usually the wrong one.

31 of 45
Disagreements where the model was right and I was wrong

Of the 45 disagreements with a clear winner, the model was correct 31 times and I was correct 14 times — roughly a two-to-one ratio in the model s favor. The remaining 38 disagreements were either shared errors or unscoreable noise.

Source: personal disagreement log (1 user, 2024)

The bets I would have placed without the friction

The bets the model talked me out of were not bad bets in the spectacular sense. They were bets at slightly worse prices than I would have admitted at placement, on theses I had constructed with more affection than research. The road favorite from week three was a team I had spent the week defending in the group chat, and the price had drifted two points toward the favorite while I was busy defending it. The over on the primetime game depended on a pace assumption I had not actually checked. The teaser was, in retrospect, a teaser, which is a category of bet I tend to enjoy more than the bankroll does.

None of these failures were egregious. All of them were small. The point of the workflow is not that any one of them was decisive. The point is that the workflow caught about two-thirds of them across the season, and the bets I avoided produced, in aggregate, a meaningful improvement in the per-bet ROI. The improvement was not the model finding edges I had missed. It was the model letting fewer of my own marginal bets slip through the placement step.

Where the model and I disagreed, who was rightDistribution of 83 model-vs-me disagreements logged across the 2024 NFL season. The model was right more often than I was, by a clear margin.07.7515.523.2531Model right, I was wrong (passed bet)31Both wrong (passed)17Model wrong, I was right (took bet)14Disagreement was noise (no clear winner)12Both right (took bet)9OUTCOME OF DISAGREEMENTCOUNTmodel_predictions + personal disagreement log (1 user, 2024)

A second look at the same distribution, this time read as a list of bets I did not place and am grateful not to have placed.

The useful thing about a tool is not that it makes the decision for you. The useful thing is that it interrupts the part of your brain already writing the celebration tweet.

— George Plimptonic

The discipline I want to keep for next season

I have decided to keep the disagreement log as a standing feature of my workflow. The discipline is small: every bet I am about to place runs through the model first. If the model agrees, I place the bet at the originally intended stake. If the model disagrees, I write the disagreement into the log with a single sentence on my reasoning and a single sentence on what I read of the model s reasoning. The action defaults to a pass; placing the bet anyway requires me to write a third sentence explaining why my reasoning specifically beats the model s on this case.

This is not heroic. It is paperwork. It is also the only thing I have done in two years of betting that I can confidently attribute to a measurable improvement in my per-bet ROI. The improvement is modest in any single week and meaningful across a season. The cost is about thirty seconds per bet and the occasional bruised feeling when the model declines to admire a thesis I am proud of. The bruised feeling has, over time, become useful information in its own right. The bets I most resent the model for questioning have turned out, with frustrating regularity, to be the bets that most deserved questioning.

+9.9 pts
Per-bet ROI gap between passing on and forcing through disagreements

Passing on model disagreements returned +4.1% per bet across the log. Forcing the bet through anyway returned -5.8%. The gap of roughly ten percentage points accounted for most of the season s improvement in per-bet ROI.

Source: personal disagreement log (1 user, 2024)

+8.3%
Inverted-disagreement ROI (took the model side instead of mine)

The small subset of disagreements where I inverted my position and took the model s side returned +8.3% per bet. The sample is small enough to be unreliable on its own, but it directionally supports passing as the default action.

Source: personal disagreement log (1 user, 2024)

3 sentences
Friction protocol per disagreement

Each model disagreement requires three sentences before placement is allowed to override the pass: my reasoning, my reading of the model s reasoning, and a specific argument for why my reasoning beats the model s on this case.

Source: personal disagreement-log protocol (1 user, 2024)

The closing argument

The model did not make me a sharper handicapper. It made me a slower one. The slowness produced a smaller card of bets per week and a higher per-bet ROI on the bets that survived the workflow. None of this is the kind of result you can quote at a tailgate. It is the kind of result you can defend in a spreadsheet, which is a different but more durable form of correctness. The bets I most enjoyed talking about were, with painful regularity, the bets the model most wanted me to pass on. The bets the model approved of were, with reassuring regularity, the bets I could still defend in March.

I will keep the workflow. I will keep the log. I will keep the three-sentence rule for overriding the model, and I will keep being mildly annoyed when the rule prevents me from clicking on a bet I am sentimentally attached to. The annoyance is the feature. The bet I most regret not placing, across the entire season, lost cleanly by the way, which is the kind of detail the workflow makes available to me and which the version of me without the workflow would have spent the whole season insisting was the one that got away.

Takeaways

  • Use tools before finalizing opinions.
  • Track disagreements explicitly.
  • Friction is a feature when it prevents lazy bets.
  • A pass can be the best model output.

Field guide

WatchBets the user loves before opening the tool.
AvoidTreating disagreement as a bug instead of a prompt.
Use it whenThe tool changes either the bet, the stake, or the pass reason.
Desk actionRun the model before writing final picks and preserve every disagreement in the tracking sheet.

Closing argument

The workflow did not make me smarter in a cinematic way. It made me slower in a useful way. That is less exciting than genius and much easier to repeat. Keep the note, not just the feeling. The next similar decision will arrive with a new uniform and the same old pressure, and the useful bettor will recognize the pattern before paying for it twice.

Sources