Q&ABacktest Methodology

How to Backtest a Gold Trading EA Properly:
The Complete Guide

Published 15 June 2026 ยท 14 min read

Quick Answer

A proper XAUUSD EA backtest requires: real tick data (99% modelling quality), realistic spread (12โ€“15 pips for ECN), slippage of 2โ€“3 points, a minimum 3-year test period spanning multiple market regimes, and an explicit out-of-sample validation step on data you did not use during optimisation. Skipping any of these produces results that look better than reality.

The 8-Step Backtest Process

Click any step to expand the full detail, common mistakes, and warning signs.

What to Do

Open MT5 โ†’ Tools โ†’ History Center. Find XAUUSD and select M1 timeframe โ†’ click Download to pull data from your broker server. For longer historical data (5+ years), Dukascopy offers free XAUUSD tick data in .csv format at dukascopy.com/swiss/english/marketwatch/historical/. Import through MT5's Custom Symbol or via third-party tick data importers. Confirm data is available for your full intended test period before proceeding.

Common Mistake

Using broker history data that only goes back 1โ€“2 years. For a proper test you need at least 3 years, ideally 5. Dukascopy provides data from 2003 onwards.

Warning Sign

The date picker in Strategy Tester shows no data for early periods โ†’ data not downloaded yet.

What to Do

After downloading from Dukascopy, use a tool like Tick Data Suite or Birt's EA Backtesting Guide methodology to import the .csv file into MT5's History Center. Verify by opening the XAUUSD chart and scrolling back to your intended start date โ€” price data should be visible. Then check Strategy Tester: with XAUUSD selected and your date range set, the "Bars in test" number should be consistent with the period selected.

Common Mistake

Assuming broker data is sufficient. Broker history data often has gaps, is not tick-level, or does not go back far enough. For a 5-year proper backtest, external tick data is often necessary.

Warning Sign

Strategy Tester report shows "Modelling quality: 25%" even after setup โ†’ tick data was not properly imported.

What to Do

View โ†’ Strategy Tester (Ctrl+R). Symbol: XAUUSD. Model: "Every Tick Based on Real Ticks" (the only option that uses actual downloaded tick data). Period: M1 (or your EA's operating timeframe). The "Open prices only" and "Every Tick" (simulated) options are significantly less accurate and should not be used as the basis for live deployment decisions.

Common Mistake

Selecting "Every Tick" instead of "Every Tick Based on Real Ticks." The first uses simulated ticks generated from OHLC bars (25โ€“90% quality). The second uses your imported real tick data (99% quality). The naming is confusing โ€” look for "Based on Real Ticks" specifically.

Warning Sign

Report shows "Modelling quality: 90%" โ†’ "Every Tick" selected instead of "Every Tick Based on Real Ticks".

What to Do

Right-click the Symbol selector in Strategy Tester โ†’ properties. Set "Spread" to match your broker's real ECN spread. For most ECN brokers on XAUUSD: 100โ€“150 points (10โ€“15 pips). Check your broker's specification page for their stated average XAUUSD spread during London and New York sessions. If you plan to run the EA mainly during London hours, use the London session average rather than the 24h average.

Common Mistake

Leaving spread at 0 (MT5 default in some versions). A 0-spread backtest on XAUUSD overstates every winning trade's net profit by 10โ€“15 pips โ€” often the difference between a profitable and unprofitable strategy at realistic lot sizes.

Warning Sign

Backtest profit is dramatically higher than any live broker could replicate โ†’ spread was 0.

What to Do

Click "Expert Properties" โ†’ Testing tab. In the "Execution" section, ensure "Based on Real Ticks" is selected and set "Deviation" to 20โ€“30 (MT5 uses points, so 30 points = 3 pips on standard XAUUSD). This tells the backtester to simulate market conditions where your entry may be filled up to 3 pips away from the signal price โ€” which is a realistic average for XAUUSD on a VPS with normal broker latency.

Common Mistake

Leaving slippage at 0. Slippage adds up significantly over hundreds of trades: 200 trades per month ร— 2 pips average slippage ร— $0.10 per pip (0.01 lot) = $40/month in hidden costs that the 0-slippage backtest hides.

Warning Sign

Win rate in backtest is much higher than in demo โ†’ slippage was 0 in backtest but not on demo.

What to Do

For XAUUSD, the minimum meaningful test period is 3 years. Five years is preferred. The test must include diverse market conditions: at least one extended trending period (e.g. 2019 uptrend), one high-volatility event period (e.g. Marchโ€“May 2020 COVID), and one range-bound consolidation period (e.g. 2021 H2 or 2023). Strategies that only look good in one regime type will fail when conditions change. For the in-sample / out-of-sample split: use the earliest 75% of your data for optimisation and hold back the final 25% for out-of-sample validation.

Common Mistake

Including only a recent 12-month period that happened to be favourable for the EA type. If your EA is a breakout strategy, a single trending year will produce a spectacular backtest that completely fails in the subsequent ranging year.

Warning Sign

All profitable months cluster in one specific calendar year โ†’ regime-dependent results.

What to Do

After running, record: Profit Factor (aim for 1.4โ€“2.0), Max Drawdown % (compare to lot size used โ€” if using 0.01 lot and DD is 3%, estimate live DD at 0.05 lot will be 5ร— higher), Expected Payoff per trade, Sharpe Ratio, Maximum Consecutive Losses. Do not focus primarily on total profit in pips or dollars โ€” those numbers are lot-size dependent and can be inflated trivially. Focus on the ratio metrics that remain meaningful regardless of lot size.

Common Mistake

Evaluating the backtest primarily on total profit. Running the EA at 1.0 lot will produce 100ร— the dollar profit of 0.01 lot โ€” but that tells you nothing about whether the strategy is good. Profit factor and drawdown % are the only lot-size-independent metrics.

Warning Sign

Strategy looks impressive but profit factor is below 1.3 โ†’ the strategy is marginal and may not survive realistic conditions.

What to Do

Take the settings you tested in steps 1โ€“7 and run them only on the data period you deliberately held back (the final 25% of your total available data). Do not change any settings based on what you see โ€” this is a one-time, read-only test. Compare the out-of-sample profit factor to the in-sample profit factor. A gap of under 30% is acceptable. A gap over 50% suggests the settings are curve-fitted to the in-sample period and will likely underperform going forward.

Common Mistake

Using the out-of-sample results to make further adjustments, then claiming the strategy is validated. Once you adjust to the OOS data, it is no longer out-of-sample โ€” it has become part of your optimisation dataset.

Warning Sign

In-sample profit factor 2.1, out-of-sample profit factor 0.9 โ†’ strategy is almost certainly curve-fitted.

Backtest Quality Scorecard

After running your backtest, rate it on 8 dimensions to get an overall reliability assessment. Select the option that best describes your setup for each item.

1. Data Quality

2. Test Period

3. Regime Diversity

4. Spread Setting

5. Slippage

6. Out-of-Sample

7. Profit Factor

8. Max DD vs Expectation

Backtest Reliability

0 / 24

Rate your backtest above to see reliability score

Which Metrics to Focus On (and Which to Ignore)

Not all backtest metrics are equally useful. Here is a tiered guide to which numbers to prioritise:

Focus on these

ยท
Profit Factor โ€” Total gross profit รท total gross loss. 1.4โ€“2.0 is realistic. Above 2.5 warrants scrutiny.
ยท
Maximum Drawdown % โ€” Peak-to-trough equity decline as a %. Note the lot size โ€” this scales linearly.
ยท
Max Consecutive Losses โ€” The worst losing streak โ€” determines minimum account buffer needed to stay solvent.
ยท
Sharpe Ratio โ€” Risk-adjusted return. Above 1.0 is acceptable, above 2.0 is strong.
ยท
Month-by-Month Breakdown โ€” Shows whether results are consistent across time or clustered in favourable periods.

Secondary โ€” useful context

ยท
Win Rate โ€” The percentage of winning trades. Meaningful only in context of the average win/loss ratio.
ยท
Average Trade Duration โ€” Indicates whether the EA behaves like a scalper, day trader, or swing trader.
ยท
Expected Payoff โ€” Average profit per trade in currency โ€” useful for position sizing calculations.

Ignore in isolation

ยท
Total Pips Profit โ€” Entirely lot-size dependent. 5,000 pips at 0.01 lot = same as 500 pips at 0.1 lot. Meaningless without context.
ยท
Total Dollar Profit โ€” Same problem as total pips. Inflatable by simply increasing lot size.
ยท
Number of Trades โ€” More trades is not inherently better โ€” it means more spread cost. Quality over quantity.

The 3 Most Common Backtest Inflation Mistakes

Spread set to 0

Overstates every winning trade by 10โ€“15 pips on XAUUSD. A strategy targeting 25 pips with 0 spread shows 25 pips profit. The same strategy at 12-pip spread nets only 13 pips โ€” a 48% reduction in profit per trade.

Fix

Set spread to 100โ€“150 points (10โ€“15 pips) before running.

OHLC bars only (25% quality)

Intra-candle stop loss triggers and entry timing are invisible. Trailing stops exit at fictional prices. Winning trades that would have been stopped out mid-candle appear as winners.

Fix

Switch to "Every Tick Based on Real Ticks" after downloading tick data.

No slippage

Every entry executes at the exact signal price. In live trading, entries typically fill 1โ€“3 pips away from signal. Over 200 trades per month, 2 pips average slippage adds up to 400 pips of hidden cost.

Fix

Set Deviation to 20โ€“30 points in Expert Properties โ†’ Testing.

Related Reading

Frequently Asked Questions

The three most important metrics are: (1) Profit Factor โ€” total gross profit divided by total gross loss. Target 1.4โ€“2.0 for realistic strategies. (2) Maximum Drawdown % โ€” the largest peak-to-trough equity decline. This should be assessed relative to the lot size used; a 5% DD at 0.01 lot becomes 25% at 0.05 lot. (3) Maximum Consecutive Losses โ€” how many trades in a row the EA can lose. This determines the minimum account buffer needed to survive a worst-case losing streak. Secondary metrics worth noting: Sharpe Ratio (above 1.0 is acceptable, above 2.0 is strong), Expected Payoff per trade, and month-by-month breakdown.

Out-of-sample (OOS) data is historical data deliberately excluded from the optimisation process. After finalising your EA settings using your main data period, you run the settings once โ€” and only once โ€” on the OOS period. If the results are similar to your in-sample results, that is evidence the strategy has genuine edge. If the results collapse, the strategy was over-fitted to the optimisation data. The OOS test is the most important step because it is the only genuine test of whether the strategy works on data it has never "seen." Most traders skip this step โ€” which is why so many EAs that look good in backtest fail live.

XAUUSD has gone through distinctly different market regimes in recent years. 2019: moderate range with upward drift. 2020: extreme volatility โ€” COVID crash and recovery, gold hitting all-time highs. 2021: range-bound, choppy, difficult for breakout strategies. 2022: commodity surge, strong trend up and then reversal. 2023โ€“2024: mixed trending and consolidation. When backtesting Goldie Razor V2.8.4 specifically, using 2019โ€“2024 as the minimum period captures all five of these distinct regime types. This is why the EA documentation recommends this period as the baseline test window โ€” it includes the hardest conditions (2021 chop) alongside the easiest (2020 trend).

Expect live performance to be 20โ€“40% below backtest performance on key metrics like profit factor and return. This gap comes from three sources: real spread vs backtest spread (even if you used realistic spread, live spread varies while backtest spread is fixed), real slippage vs estimated slippage, and the psychological factor of manual interventions (traders who disable the EA during drawdowns, close trades early, or adjust settings in response to losing streaks all reduce live performance below what the pure EA would deliver). A gap above 50% suggests backtest assumptions were unrealistic.

When sharing XAUUSD EA backtest results, always disclose: the exact date range tested, the spread setting used, the slippage setting, the modelling quality percentage, the lot size, and whether the settings were optimised on that data or validated out-of-sample. Include the full month-by-month breakdown including losing months. State explicitly whether the test period was cherry-picked or covers a diverse range of conditions. The goal of honest disclosure is to let the reader assess the credibility of the results themselves rather than relying on your summary statistics alone.

Goldie Razor V2.8.4

M15 breakout + H4 EMA filter โ€” built for XAUUSD on MT5

View Goldie Razor โ†’