Polymarket Arbitrage Bible

2026-03-13 10:56:05
Intermediate
Blockchain
This article provides a comprehensive technical roadmap for arbitrage in Polymarket prediction markets. The author explains that genuine arbitrage competition has shifted from basic additive calculations to a sophisticated mathematical arms race. This involves leveraging Integer Programming to address the $2^{63}$-scale explosion of logical dependencies, employing Bregman projection and KL divergence to identify information asymmetry under extreme probability scenarios, and utilizing the Frank-Wolfe algorithm to rapidly approximate optimal trading paths within milliseconds. The article offers a detailed breakdown of the entire system architecture, from data acquisition and preliminary logical screening to risk control for non-atomic execution.

Preface

While building @insidersdotbot, I engaged in extensive conversations with numerous high-frequency market-making and arbitrage teams. The most pressing question I encountered was how to design effective arbitrage strategies.

Our users, friends, and partners are all exploring the complex, multidimensional path of arbitrage on @Polymarket. If you're an active Twitter user, you've likely come across posts like, “I made X amount from prediction markets using Y arbitrage strategy.”

Yet, most articles drastically oversimplify the core logic of arbitrage, reducing it to “anyone can do it” or “just use Clawdbot,” without providing a systematic explanation for building your own arbitrage system.

If you want to understand how Polymarket arbitrage tools actually generate profits, I recommend this article by @RohOnChain—the most comprehensive analysis I've seen so far.

As with my previous article, because the original English text contains highly technical sections requiring further study, I've restructured and supplemented the content so you can grasp all key points without pausing to look up additional information.

Arbitrage on Polymarket: Not Just a Math Problem

Suppose you see a market on Polymarket:

YES price $0.62, NO price $0.33.

You think: 0.62 + 0.33 = 0.95, less than 1—there’s an arbitrage opportunity! Buy both YES and NO for $0.95, and no matter the outcome, you get back $1.00, netting $0.05.

That's correct.

But here's the issue—while you're manually adding those numbers, quantitative systems are performing an entirely different process.

They're simultaneously scanning 17,218 conditions across 2^63 possible outcome combinations, identifying all pricing inconsistencies within milliseconds. By the time you place your orders, the spread is gone. The system has already found similar inefficiencies in dozens of related markets, calculated optimal position sizes factoring in order book depth and fees, executed all trades in parallel, and moved capital to the next opportunity [1].

The gap isn’t just speed. It’s mathematical infrastructure.

Chapter 1: Why “Addition” Falls Short—The Marginal Polytope Problem

The Single-Market Fallacy

Consider a simple example.

Market A: “Will Trump win Pennsylvania?”

YES price $0.48, NO price $0.52. Totals exactly $1.00.

Looks perfect—no arbitrage, right?

Wrong.

Add another market, and things change.

Now, Market B: “Will the Republican Party win Pennsylvania by more than 5 percentage points?”

YES price $0.32, NO price $0.68. Again, totals $1.00.

Each market seems “normal.” But there's a logical dependency:

The US presidential election is decided state by state. Each state is a separate “battleground,” and whoever gets more votes wins all its electoral votes (winner-takes-all). Trump is the Republican candidate. So “Republicans win Pennsylvania” and “Trump wins Pennsylvania” are the same event. If the Republicans win by more than 5 points, Trump not only wins Pennsylvania but does so decisively.

In other words, Market B’s YES (Republican landslide) is a subset of Market A’s YES (Trump wins)—a landslide always means a win, but a win doesn't always mean a landslide.

This logical dependency creates arbitrage opportunities.

It’s like betting on “Will it rain tomorrow?” and “Will there be a thunderstorm tomorrow?” If there’s a thunderstorm, it must be raining (thunderstorm ⊆ rain). So “Thunderstorm YES” should never be priced higher than “Rain YES.” If the market misprices this, you can buy low and sell high simultaneously for risk-free profit. That’s arbitrage.

Exponential Explosion: Why Brute-Force Search Fails

For any market with n conditions, there are 2^n possible price combinations.

Sounds manageable? Consider a real example.

2010 NCAA Tournament market [2]: 63 games, each with win/loss outcomes. That’s 2^63 = 9,223,372,036,854,775,808 possible combinations—over 9 quintillion. The market had more than 5,000 order books.

How big is 2^63? Even checking 1 billion combinations per second would take about 292 years to exhaust them all. That’s why brute-force search is entirely impractical here.

Checking each combination individually? Computationally impossible.

Now, look at the 2024 US election. Researchers found 1,576 market pairs with possible dependencies [2]. If each pair has 10 conditions, that’s 2^20 = 1,048,576 combinations per pair. Multiply by 1,576 pairs. By the time your laptop finishes, the election is long over.

Integer Programming: Using Constraints Instead of Enumeration

Quantitative systems don’t solve this by “faster enumeration,” but by not enumerating at all.

They use integer programming [3] to describe “which outcomes are valid.”

A real example: Duke vs. Cornell market—each team has 7 order books (0 to 6 wins), totaling 14 conditions, or 2^14 = 16,384 possible combinations.

But there’s a constraint: both teams can’t win more than 5 games, since they’d meet in the semifinals (only one can advance).

How does integer programming solve this? Just three constraints:

Constraint 1: Exactly one of Duke’s 7 order books is true (Duke can have only one final win count).

Constraint 2: Exactly one of Cornell’s 7 order books is true.

Constraint 3: Duke wins 5 + Duke wins 6 + Cornell wins 5 + Cornell wins 6 ≤ 1 (they can’t both win that many).

Three linear constraints instead of 16,384 brute-force checks.

Brute-Force Search vs. Integer Programming

In other words, brute-force search is like reading every word in the dictionary to find one. Integer programming is like flipping directly to the right page. You don’t need to check every possibility—just describe what “valid answers” look like and let the algorithm find mispriced outcomes.

Real Data: 41% of Markets Offer Arbitrage

The original article notes that the research team analyzed data from April 2024 to April 2025 [2]:

• 17,218 conditions checked

• 7,051 conditions had single-market arbitrage (41%)

• Median price deviation: $0.60 (should be $1.00)

• 13 confirmed cross-market arbitrage opportunities

A median deviation of $0.60 means the market often deviates by 40%. This isn’t “nearly efficient”—it’s “widely exploitable.”

Chapter 2: Bregman Projection—Calculating the Optimal Arbitrage Trade

Finding arbitrage is one thing. Calculating the optimal trade is another.

You can’t just “average” or “tweak prices.” You need to project the current market state onto the arbitrage-free space, preserving the information structure in the prices.

Why “Straight-Line Distance” Doesn’t Work

The intuitive approach is to find the “closest valid price” and trade the difference.

Mathematically, this means minimizing the Euclidean distance: ||μ - θ||²

But this treats all price changes as equal.

A move from $0.50 to $0.60 and from $0.05 to $0.15 are both $0.10 changes—but their informational content differs drastically.

Why? Because prices represent implied probabilities. Moving from 50% to 60% is a moderate adjustment. Moving from 5% to 15% is a major shift—a near-impossible event becomes “somewhat possible.”

Imagine weighing yourself: going from 70 kg to 80 kg is “gained a bit.” But from 30 kg to 40 kg (for an adult), that’s “from near death to severe malnutrition.” Same 10 kg change, but completely different meaning. Price changes near 0 or 1 carry much more information.

Bregman Divergence: The Right “Distance”

Polymarket’s market maker uses LMSR (Logarithmic Market Scoring Rule) [4], where prices represent probability distributions.

Here, the correct distance metric isn’t Euclidean, but Bregman divergence [5].

For LMSR, Bregman divergence becomes KL divergence (Kullback-Leibler divergence) [6]—a measure of “information distance” between two probability distributions.

You don’t need to memorize the formula. Just know:

KL divergence automatically gives greater weight to changes near extreme prices. A move from $0.05 to $0.15 is “farther” under KL divergence than from $0.50 to $0.60. This matches our intuition—extreme price changes mean bigger information shocks.

A good example: in @zachxbt’s prediction market, Axiom overtook Meteora at the last minute, driven by extreme price movement.

Bregman Projection vs. Euclidean Projection

Arbitrage Profit = Bregman Projection Distance

A core conclusion from the referenced paper:

The maximum guaranteed profit from any trade equals the Bregman projection distance from the current market state to the arbitrage-free space.

Put simply: the further market prices deviate from the “valid space,” the more you can earn. The Bregman projection tells you:

  1. What to buy or sell (projection direction = trade direction)
  1. How much to buy or sell (factoring in order book depth)
  1. How much you can earn (projection distance = max profit)

The top arbitrageur earned $2,009,631.76 in one year [2]. Their strategy was simply to solve this optimization problem faster and more accurately than anyone else.

Marginal Polytope and Arbitrage

Imagine standing on a mountain, with a river (arbitrage-free space) at the foot. Your current position (market prices) is some distance from the river.

The Bregman projection helps you find the “shortest path from your position to the river”—not straight-line distance, but the shortest path accounting for the terrain (market structure). The length of this path is your maximum possible profit.

Chapter 3: Frank-Wolfe Algorithm—Making Theory Executable

Now you know: to compute optimal arbitrage, you need a Bregman projection.

But here’s the problem—directly computing the Bregman projection isn’t feasible.

Why? Because the arbitrage-free space (the marginal polytope M) has exponentially many vertices. Standard convex optimization requires access to the full constraint set, i.e., enumerating every valid outcome. That’s impossible at scale.

The Frank-Wolfe Core Idea

The brilliance of the Frank-Wolfe algorithm [7] is that it doesn’t try to solve everything at once, but converges step by step.

Here’s how it works:

Step 1: Start with a small set of known valid outcomes.

Step 2: Optimize over this set to find the current best solution.

Step 3: Use integer programming to find a new valid outcome and add it to the set.

Step 4: Check if you’re close enough to optimal. If not, return to Step 2.

Each iteration adds only one vertex. Even after 100 iterations, you’re only tracking 100 vertices—not 2^63.

Frank-Wolfe Iteration

Imagine searching for an exit in a massive maze.

The brute-force method is to walk every path. The Frank-Wolfe method is to pick a random path, then at each fork, ask a “guide” (integer programming solver): “From here, which direction is most likely to lead to the exit?” Then take a step in that direction. You don’t need to explore the entire maze—just make the right choice at each key junction.

Integer Programming Solver: The “Guide” at Every Step

Each Frank-Wolfe iteration requires solving an integer linear programming problem. In theory, this is NP-hard (no known fast general algorithm).

But modern solvers like Gurobi [8] can efficiently handle well-structured problems.

The research team used Gurobi 5.5. Actual solve times [2]:

• Early iterations (few games finished): under 1 second

• Mid-stage (30–40 games finished): 10–30 seconds

• Late stage (50+ games finished): under 5 seconds

Why is it faster later? Because as more results are determined, the feasible solution space shrinks. Fewer variables, tighter constraints, faster solves.

Gradient Explosion and Barrier Frank-Wolfe

Standard Frank-Wolfe has a technical issue: as prices approach 0, the LMSR gradient tends toward negative infinity, causing instability.

The solution is Barrier Frank-Wolfe: optimize not on the full polytope M, but on a slightly “shrunken” version M’. The shrinkage parameter ε decreases adaptively with each iteration—starting further from the boundary (for stability), then gradually approaching the true boundary (for accuracy).

Research shows that in practice, 50 to 150 iterations are sufficient for convergence [2].

Real-World Performance

The paper reveals a key finding [2]:

In the first 16 games of the NCAA tournament, the Frank-Wolfe market maker (FWMM) and a simple linear-constraint market maker (LCMM) performed similarly—because the integer programming solver was still too slow.

But after 45 games, the first successful 30-minute projection was completed.

From then on, FWMM outperformed LCMM in pricing by 38%.

The turning point: when the outcome space shrank enough for integer programming to solve within the trading window.

FWMM is like a student warming up in the first half of the exam, but once in the zone, starts dominating. LCMM is the steady but limited student. The key difference: FWMM has a more powerful “weapon” (Bregman projection), but needs time to “load” (wait for the solver).

Chapter 4: Execution—Why You Can Still Lose Money After the Calculation

You’ve detected arbitrage. You’ve used Bregman projection to calculate the optimal trade.

Now you need to execute.

This is where most strategies fail.

Non-Atomic Execution

Polymarket uses a CLOB (Central Limit Order Book) [9]. Unlike decentralized exchanges, trades on a CLOB execute sequentially—you can’t guarantee all your orders fill at once.

Your arbitrage plan:

Buy YES at $0.30. Buy NO at $0.30. Total cost: $0.60. Regardless of outcome, receive $1.00. Profit: $0.40.

Reality:

Submit YES order → filled at $0.30 ✓

Your order moves the market price.

Submit NO order → filled at $0.78 ✗

Total cost: $1.08. Payout: $1.00. Actual result: lose $0.08.

One leg fills, the other doesn't. You're exposed.

That’s why the paper only counts opportunities with profit margins over $0.05 [2]. Smaller spreads are wiped out by execution risk.

Non-Atomic Execution Risk

VWAP: The Real Execution Price

Don’t assume you can always fill at the quoted price. You must calculate the Volume Weighted Average Price (VWAP) [10].

The research team’s method: for each block on the Polygon chain (about every 2 seconds), compute the VWAP for all YES trades and all NO trades in that block. If |VWAP_yes + VWAP_no - 1.0| > 0.02, it’s logged as an arbitrage opportunity [2].

VWAP is the “average price you actually pay.” If you want to buy 10,000 tokens but the order book has $0.30 for 2,000, $0.32 for 3,000, $0.35 for 5,000—your VWAP is (2000×0.30 + 3000×0.32 + 5000×0.35) / 10000 = $0.326. That’s much higher than the “best price” of $0.30 you saw.

Liquidity Constraints: Profit Depends on Order Book Depth

Even if prices are misaligned, your profit is capped by available liquidity.

A real example [2]:

Market shows arbitrage: YES prices sum to $0.85. Potential profit: $0.15 per dollar. But order book depth at those prices is only $234. Maximum extractable profit: $234 × 0.15 = $35.10.

For cross-market arbitrage, you need liquidity on all positions simultaneously. The smallest available liquidity sets your ceiling.

This is why, on current quantitative platforms, the impact of order prices on execution price is critical.

insiders.bot

This is the answer to this issue during the public Beta period.

Chapter 5: The Complete System—What’s Actually Deployed

Theory is clean. Production is messy.

Here’s what a real, operational arbitrage system looks like [2].

Data Pipeline

Real-time data: WebSocket connection to Polymarket’s API [9], receiving order book updates (price/quantity changes), trade notifications, and market creation/settlement events.

Historical data: Query contract events via Alchemy Polygon node API—OrderFilled (trade executed), PositionSplit (new token minted), PositionsMerge (token burned).

The research team analyzed 86 million trades [2]. This scale requires infrastructure, not just scripts.

P.S. Currently, we

insiders.bot

plan to open source a high-speed trading API. If you plan to use a similar trading model and want early access, feel free to DM me.

Dependency Detection Layer

For 305 US election markets, there are 46,360 possible pairs to check. Manual analysis is impossible.

The research team used a clever approach: leveraging the DeepSeek-R1-Distill-Qwen-32B large language model for initial screening [2].

Input: condition descriptions of two markets. Output: JSON of valid outcome combinations.

Then, three layers of validation: does each market have exactly one true condition? Is the number of valid combinations less than n × m (indicating dependency)? Does the dependent subset meet arbitrage conditions?

Result: 40,057 independent pairs (no arbitrage) → 1,576 dependent pairs (potential arbitrage) → 374 pairs meeting strict criteria → 13 pairs confirmed exploitable by humans [2].

LLM accuracy on complex multi-condition markets is 81.45%. Good enough for initial screening, but manual verification is needed before execution.

Three-Layer Optimization Engine

Layer 1: Simple linear constraints (LCMM). Quick checks for basic rules—“sum of probabilities equals 1,” “if A implies B, then P(A) cannot exceed P(B).” Completes in milliseconds, eliminating obvious pricing errors.

Layer 2: Integer programming projection (Frank-Wolfe + Gurobi). This is the core. Parameters: Alpha = 0.9 (extract at least 90% of available arbitrage), initial ε = 0.1 (10% shrinkage), convergence threshold = 1e-6, time limit = 30 minutes. Typical iterations: 50–150. Solve time per iteration: 1–30 seconds [2].

Layer 3: Execution validation. Before submitting orders, simulates fills on the current order book. Checks: is liquidity sufficient? What’s the expected slippage? What’s the guaranteed profit after slippage? Does profit exceed the minimum threshold ($0.05)? Only if all criteria are met does it execute.

Position Management: Modified Kelly Formula

The standard Kelly formula [11] tells you what fraction of capital to allocate to a trade. For arbitrage, you must adjust for execution risk:

f = (b×p - q) / b × √p

Where b is the arbitrage profit percentage, p is the probability of full execution (estimated from order book depth), and q = 1 - p.

Cap: 50% of order book depth. Beyond this, your order significantly moves the market.

Final Results

From April 2024 to April 2025, total extracted profit [2]:

Single-condition arbitrage: Buy both sides low $5,899,287 + sell both sides high $4,682,075 = $10,581,362

Market rebalancing: Buy all YES low $11,092,286 + sell all YES high $612,189 + buy all NO $17,307,114 = $29,011,589

Cross-market combined arbitrage: $95,634

Total: $39,688,585

Top 10 arbitrageurs took $8,127,849 (20.5% of total). Top arbitrageur: $2,009,632 from 4,049 trades, averaging $496 per trade [2].

Not a lottery. Not luck. This is systematic, mathematically precise execution.

The Final Reality

While traders are reading “10 Prediction Market Tips,” what are quant systems doing?

They're using integer programming to detect dependencies among 17,218 conditions. They're using Bregman projection to calculate optimal arbitrage trades. They're running the Frank-Wolfe algorithm to handle gradient explosions. They're using VWAP to estimate slippage and execute orders in parallel. They're systematically extracting $40 million in guaranteed profit.

The difference isn’t luck. It’s mathematical infrastructure.

The paper is public [1]. The algorithms are known. The profits are real.

The real question: Can you build it before the next $40 million is extracted?

Quick Reference

• Marginal Polytope → The set of all valid prices. Prices must be within this region to be arbitrage-free.

• Integer Programming → Describes valid outcomes with linear constraints, avoiding brute-force enumeration. Compresses 2^63 checks into a few constraints [3].

• Bregman Divergence / KL Divergence → Measures “distance” between two probability distributions, more appropriate than Euclidean distance for price/probability scenarios. Assigns higher weight to changes near extremes [5][6].

• LMSR (Logarithmic Market Scoring Rule) → The pricing mechanism used by Polymarket’s market maker; prices represent implied probabilities [4].

• Frank-Wolfe Algorithm → An iterative optimization algorithm that adds one new vertex per iteration, avoiding enumeration of exponentially many valid outcomes [7].

• Gurobi → Industry-leading integer programming solver, the “guide” for each Frank-Wolfe iteration [8].

• CLOB (Central Limit Order Book) → Polymarket’s trading mechanism; orders execute in sequence, no atomicity [9].

• VWAP (Volume Weighted Average Price) → The average price you actually pay, accounting for order book depth. More realistic than the “best quote” [10].

• Kelly Formula → Tells you what fraction of capital to allocate to a trade, balancing return and risk [11].

• Non-Atomic Execution → The problem where multiple orders can't be guaranteed to fill simultaneously. One leg fills, the other doesn’t—exposure risk.

• DeepSeek → The large language model used for market dependency screening, 81.45% accuracy.

References

[1] Original post: https://x.com/RohOnChain/status/2017314080395296995

[2] Research paper “Unravelling the Probabilistic Forest: Arbitrage in Prediction Markets”: https://arxiv.org/abs/2508.03474

[3] Theoretical foundation “Arbitrage-Free Combinatorial Market Making via Integer Programming”: https://arxiv.org/abs/1606.02825

[4] LMSR explanation: https://www.cultivatelabs.com/crowdsourced-forecasting-guide/how-does-logarithmic-market-scoring-rule-lmsr-work

[5] Introduction to Bregman Divergences: https://mark.reid.name/blog/meet-the-bregman-divergences.html

[6] KL Divergence - Wikipedia: https://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence

[7] Frank-Wolfe Algorithm - Wikipedia: https://en.wikipedia.org/wiki/Frank%E2%80%93Wolfe_algorithm

[8] Gurobi Optimizer: https://www.gurobi.com/

[9] Polymarket CLOB API Documentation: https://docs.polymarket.com/

[10] VWAP explanation - Investopedia: https://www.investopedia.com/terms/v/vwap.asp

[11] Kelly Formula - Investopedia: https://www.investopedia.com/articles/trading/04/091504.asp

[12] Decrypt article “The $40 Million Free Money Glitch”: https://decrypt.co/339958/40-million-free-money-glitch-crypto-prediction-markets

Disclaimer

  1. This article is reproduced from [mrryanchi], with copyright belonging to the original author [@RohOnChain]. If you have any objections to this reproduction, please contact the Gate Learn team, and we will address it promptly according to relevant procedures.
  2. Disclaimer: The views and opinions expressed in this article are solely those of the author and do not constitute any investment advice.
  3. Other language versions of this article are translated by the Gate Learn team. Without explicit mention of Gate, reproduction, distribution, or plagiarism of translated articles is prohibited.

Share

Crypto Calendar
Tokenların Kilidini Aç
Wormhole, 3 Nisan'da 1.280.000.000 W token açacak ve bu, mevcut dolaşımdaki arzın yaklaşık %28,39'unu oluşturacak.
W
-7.32%
2026-04-02
Tokenların Kilidini Aç
Pyth Network, 19 May'da 2.130.000.000 PYTH tokenini serbest bırakacak ve bu, mevcut dolaşım arzının yaklaşık %36,96'sını oluşturacak.
PYTH
2.25%
2026-05-18
Tokenların Kilidini Aç
Pump.fun, 12 Temmuz'da 82,500,000,000 PUMP token'ı kilidini açacak ve bu, mevcut dolaşımdaki arzın yaklaşık %23,31'ini oluşturacak.
PUMP
-3.37%
2026-07-11
Token Kilidi Açma
Succinct, 5 Ağustos'ta mevcut dolaşımdaki arzın yaklaşık %104,17'sini oluşturan 208,330,000 PROVE token'ını serbest bırakacak.
PROVE
2026-08-04
sign up guide logosign up guide logo
sign up guide content imgsign up guide content img
Sign Up

Related Articles

The Future of Cross-Chain Bridges: Full-Chain Interoperability Becomes Inevitable, Liquidity Bridges Will Decline
Beginner

The Future of Cross-Chain Bridges: Full-Chain Interoperability Becomes Inevitable, Liquidity Bridges Will Decline

This article explores the development trends, applications, and prospects of cross-chain bridges.
2023-12-27 07:44:05
Solana Need L2s And Appchains?
Advanced

Solana Need L2s And Appchains?

Solana faces both opportunities and challenges in its development. Recently, severe network congestion has led to a high transaction failure rate and increased fees. Consequently, some have suggested using Layer 2 and appchain technologies to address this issue. This article explores the feasibility of this strategy.
2024-06-24 01:39:17
Sui: How are users leveraging its speed, security, & scalability?
Intermediate

Sui: How are users leveraging its speed, security, & scalability?

Sui is a PoS L1 blockchain with a novel architecture whose object-centric model enables parallelization of transactions through verifier level scaling. In this research paper the unique features of the Sui blockchain will be introduced, the economic prospects of SUI tokens will be presented, and it will be explained how investors can learn about which dApps are driving the use of the chain through the Sui application campaign.
2025-08-13 07:33:39
Navigating the Zero Knowledge Landscape
Advanced

Navigating the Zero Knowledge Landscape

This article introduces the technical principles, framework, and applications of Zero-Knowledge (ZK) technology, covering aspects from privacy, identity (ID), decentralized exchanges (DEX), to oracles.
2024-01-04 16:01:13
What is Tronscan and How Can You Use it in 2025?
Beginner

What is Tronscan and How Can You Use it in 2025?

Tronscan is a blockchain explorer that goes beyond the basics, offering wallet management, token tracking, smart contract insights, and governance participation. By 2025, it has evolved with enhanced security features, expanded analytics, cross-chain integration, and improved mobile experience. The platform now includes advanced biometric authentication, real-time transaction monitoring, and a comprehensive DeFi dashboard. Developers benefit from AI-powered smart contract analysis and improved testing environments, while users enjoy a unified multi-chain portfolio view and gesture-based navigation on mobile devices.
2025-05-22 03:13:17
What Is Ethereum 2.0? Understanding The Merge
Intermediate

What Is Ethereum 2.0? Understanding The Merge

A change in one of the top cryptocurrencies that might impact the whole ecosystem
2023-01-18 14:25:24