## Thursday, November 25, 2010

### Running Variance

Variance - kinda the bread and butter for analysis work on a time series. Doesn't get much respect though. But, take the square root of the variance and you get the almighty standard deviation. Today, though, let's give variance its due...
For an intro into variance...check out these posts:
Problem with variance is calculating it in the traditional sense. Its costly to compute across a time series. It can be quite a drag on your simulation engine's performance. The way to reduce the cost is to calculate the running variance. And that's when you get into quite a briar patch - loss of precision and overflow issues. See John D. Cook's post covering the variance briar patch:
And a few more posts by John covering different variance formulas and their outcomes:
John does great work and I learn a lot from his posts. But, I was still having problems finding a variance formula that fit my needs:
• Reduced the precision loss issue as much as possible;
• Allowed an easy way to window the running variance;
• Allowed an easy way to memoize the call.
Thankfully, I found a post by Subluminal Messages covering his very cool Running Standard Deviations formula. The code doesn't work as is - needs correcting on a few items - but you can get the gist of the formula just fine. The formula uses the power sum of the squared differences of the values versus Welford's approach of using the sum of the squared differences of the mean. Which makes it a bit easier to memoize. Not sure if its as good in solving the precision loss and overflow issues as Welford's does....but so far I haven't found any issues with it.

So, let's start with the formula for the Power Sum Average ($$PSA$$):

$$PSA = PSA_{yesterday} + ( ( (x_{today} * x_{today}) - x_{yesterday} ) ) / n)$$

Where:
• $$x$$ = value in your time series
• $$n$$ = number of values you've analyzed so far
You also need the Simple Moving Average, which you can find in one of my previous posts here.
Once you have the $$PSA$$ and $$SMA$$; you can tackle the Running Population Variance ($$Var$$ ):

$$Population Var = (PSA_{today} * n - n * SMA_{today} * SMA_{today}) / n$$

Now, one problem with all these formulas - they don't cover how to window the running variance. Windowing the variance gives you the ability to view the 20 period running variance at bar 150. All the formulas I've mentioned above only give you the running cumulative variance. Deriving the running windowed variance is just a matter of using the same SMA I've posted about before and adjusting the Power Sum Average to the following:

$$PSA = PSA_{yesterday} + (((x_{today} * x_{today}) - (x_{yesterday} * x_{yesterday}) / n)$$

Where:
• $$x$$ = value in your time series
• $$n$$ = the period
[Update] If you want the sample Variance you just need to adjust the Var formula to the following:

$$Sample Var = (PSA_{today} * n - n * SMA_{today} * SMA_{today}) / (n - 1)$$

Okay, on to the code.

Code for the Power Sum Average:
def powersumavg(bar, series, period, pval=None):
"""
Returns the power sum average based on the blog post from
Subliminal Messages.  Use the power sum average to help derive the running
variance.
sources: http://subluminal.wordpress.com/2008/07/31/running-standard-deviations/

Keyword arguments:
bar     --  current index or location of the value in the series
series  --  list or tuple of data to average
period  -- number of values to include in average
pval    --  previous powersumavg (n - 1) of the series.
"""

if period < 1:
raise ValueError("period must be 1 or greater")

if bar < 0:
bar = 0

if pval == None:
if bar > 0:
raise ValueError("pval of None invalid when bar > 0")

pval = 0.0

newamt = float(series[bar])

if bar < period:
result = pval + (newamt * newamt - pval) / (bar + 1.0)

else:
oldamt = float(series[bar - period])
result = pval + (((newamt * newamt) - (oldamt * oldamt)) / period)

return result
Code for the Running Windowed Variance:
def running_var(bar, series, period, asma, apowsumavg):
"""
Returns the running variance based on a given time period.
sources: http://subluminal.wordpress.com/2008/07/31/running-standard-deviations/

Keyword arguments:
bar     --  current index or location of the value in the series
series  --  list or tuple of data to average
asma    --  current average of the given period
apowsumavg -- current powersumavg of the given period
"""
if period < 1:
raise ValueError("period must be 1 or greater")

if bar <= 0:
return 0.0

if asma == None:
raise ValueError("asma of None invalid when bar > 0")

if apowsumavg == None:
raise ValueError("powsumavg of None invalid when bar > 0")

windowsize = bar + 1.0
if windowsize >= period:
windowsize = period

return (apowsumavg * windowsize - windowsize * asma * asma) / windowsize


Example call and results:
list_of_values = [3, 5, 8, 10, 4, 8, 12, 15, 11, 9]
prev_powersumavg = None
prev_sma = None
prev_sma = None
period = 3
for bar, price in enumerate(list_of_values):
new_sma = running_sma(bar, list_of_values, period, prev_sma)
new_powersumavg = powersumavg(bar, list_of_values, period, prev_powersumavg)
new_var = running_var(bar, list_of_values, period, new_sma, new_powersumavg)

msg = "SMA=%.4f, PSA=%.4f, Var=%.4f" % (new_sma, new_powersumavg, new_var)
print "bar %i: %s" % (bar, msg)

prev_sma = new_sma
prev_powersumavg = new_powersumavg

----------------------------------------------------------
Results of call:
bar 0: SMA=3.0000, PSA=9.0000, Var=0.0000
bar 1: SMA=4.0000, PSA=17.0000, Var=1.0000
bar 2: SMA=5.3333, PSA=32.6667, Var=4.2222
bar 3: SMA=7.6667, PSA=63.0000, Var=4.2222
bar 4: SMA=7.3333, PSA=60.0000, Var=6.2222
bar 5: SMA=7.3333, PSA=60.0000, Var=6.2222
bar 6: SMA=8.0000, PSA=74.6667, Var=10.6667
bar 7: SMA=11.6667, PSA=144.3333, Var=8.2222
bar 8: SMA=12.6667, PSA=163.3333, Var=2.8889
bar 9: SMA=11.6667, PSA=142.3333, Var=6.2222


Of course, as I said in the beginning of this post, just take the square root of this Running Windowed Variance to obtain the Standard Deviation.

MT

## Saturday, November 06, 2010

### Portfolio Performance for October 2010

Back to back months of these kind of numbers make an old trader like me nervous. When's the trap door going to fall?

No doubt the current market environment is to the system's liking. One thing I need to explore is the system's position sizing algo. I position size based on the volatility of the stock over x days. But, lately the volatility on the stocks selected have been so small. Which is seriously underestimating the true risk of the position. So, need to perform some studies on how to handle volatility shrinkage during boom times like these.

On to the charts...
As you can see, we are officially out of the almost 3 year drawdown (knocking on wood as I type this).

What's ahead for TaylorTree? Spending what free time I have on preparing for another Missouri winter. Don't believe this Texas boy will ever get used to the cold. Also, working on the continued upgrade of the database and record structures of the simulation engine. Tests so far have proved the new structures are much faster and memory efficient...but have yet to test on the type of data demands the simulation engine handles - 10GB+.