TaylorTree: Backtesting Issues

I'm in the process of validating some investment ideas and thought I'd share some of the hidden biases that occur when comparing systems against each other and the market in general. I should know...these issues have bit me numerous times over my past 7 years of system testing. Maybe the testing methods below will help you as they have helped me.

Failure to Trade All Signals

Every signal received from your idea should be tested. Which means taking trades for a stock you already own. If your idea is to buy 20-day highs and you receive your first signal for YHOO on 10/15/2007...you buy it. If you receive another signal on 10/16/2007 for YHOO...guess what? You buy it. Despite already owning the stock. Failure to do so means your filtering the idea. This is okay if you already know the idea is valid...but if so...then why are you testing in the first place?

A big obstacle in taking all signals is money. First test the idea without money constraints. Money should not limit your trades. In fact, when testing...the only limit on your trades should be the idea.

Failure to Test the Other Side

The Trading Digest covers idea reversals in their article on the 50/200 Moving Avererage Crossover system. In essence, any idea should be reversed, tested, and compared with the original idea.

So, if the idea is selecting 20-day highs...the reverse of that idea is selecting stocks less than or equal to their 20-day highs. But, performing this test presents a problem. There are fewer stocks at their 20-day highs (idea #1) than stocks not at their 20-day highs (idea #2).

It's unfair to compare 150 trades from idea #1 against 5,000 trades from idea #2. Whose to say randomly selecting 150 trades from the 5,000 trades in the idea #2 sample set wouldn't generate as good if not better results than idea #1?

And that's just how to solve the problem...boot-strap it. Take those 5,000 trades and randomly select 150 trades. Compare those results against idea #1's sample set of 150. Of course, it wouldn't be wise to select one 150-set sample from 5,000. Take several 150-set samples...compile the results and use that for the comparison.

Remove the Best Trades

With any system...the Pareto Principle is large and in charge. You'll find only 5% to 10% of the trades generate the majority of the idea's profits. What are the chances you would have missed a few of those trades? Throw money limits into the mix and real world events like vacations and that home remodeling project your wife has been bugging you about...and it's very easy to do. So, remove them from the test sample. Don't let a few trades in the sample play such a large role in the results.

Market Exposure

When buying presents for your kids...the number one rule is every kid must receive the same number of presents. If you fail here...you will pay. Testing and comparing ideas are a lot like your kids. Give them equal time and exposure.

Use the same signal starting point for the comparisons. If your testing date range is 01/01/1995 - 12/31/2004 and idea #1 kicks off it's first signal on 05/01/1995 and idea #2 kicks off it's first signal on 02/01/1996...no fair! Idea #1 received a 9 months head start on idea #2. Might not seem like much...but with markets and babies...9 months can make a difference.

In fact, it usually pays to bucket the trades in similar timeframes. That way you can ensure not only the trades share similar signal start times but also similar bucket sizes. For example, to test 10 years of data...01/01/1995 - 12/31/2004:

Bucket 1 -> 01/01/1995 - 12/31/1996
Bucket 2 -> 01/01/1997 - 12/31/1998
Bucket 3 -> 01/01/1999 - 12/31/2000
Bucket 4 -> 01/01/2001 - 12/31/2002
Bucket 5 -> 01/01/2003 - 12/31/2004

The buckets 1 - 5 above would be the testing date ranges. After the simulations are complete for each idea and corresponding bucket...verify idea #1's bucket 1 contains roughly the same number of trades as idea #2's bucket 1. If not, then boot-strap the larger bucket size down to the smaller bucket's size. Compare results. And remember, remove the top 5% or so trades from each idea's buckets.

[Clarification: The testing date ranges in the buckets listed above are not the start and end dates for your trades. The date ranges are the start and end dates for your idea. Meaning...all ideas triggered from 01/01/1999 to 12/31/2000 would be in Bucket 3. You would need to track those signals for as long as you stay in them...which may mean all the way to your end date of 12/31/2004. Make sense?

In essence, your overall begin and end dates are 01/01/1995 - 12/31/2004. Then you slice and dice the signals that occurred during that time frame based on entry date into buckets 1 - 5.]

Questions?

What if the idea consists of several smaller ideas such as selecting stocks at their 20-day high and on that new high experience a 30-day volume surge? In this case, you would need to iterate through the above process twice.

In the first pass, you'd compare the 20-day high stocks against stocks not at their 20-day high. In the second pass (if the first pass proves fruitful), compare the 20-day high stocks with 30-day volume surges against 20-day high stocks without 30-day volume surges.

As you can see, getting an idea through the validation process is quite an ordeal. Unfortunately for us system traders; very few ideas make it out alive. Thankfully, we don't have to cut those ideas from our sample set.

Later Trades,

MT

TaylorTree

Sunday, October 21, 2007

Backtesting Issues

No comments: