Showing posts with label python. Show all posts
Showing posts with label python. Show all posts

Sunday, January 16, 2011

Collecting Max Items in Python

Lately, I've needed a way to collect a running list of the top X values and their associated items in Python. This is useful if you'd like to know such things as:
  • Top 100 price gainers in your price series database;
  • Top 10 most volatile stocks in your price series database;
  • Top 5 longest running batch jobs in your operations arena;
  • Any many more...
Here's the MaxItems code to do the job:
import heapq

class MaxItems(object):
    Caches the max X items of whatever you're analyzing and
    returns a list containing only those max X values and
    associated items.
    def __init__(self, size):
        self.size = size
        self._recs = []

    def push(self, value, items):
        if len(self._recs) < self.size:
            heapq.heappush(self._recs, (value, items))

            minitem = heapq.heappushpop(self._recs, (value, items))

    def items(self):
        return heapq.nlargest(self.size, self._recs)
Example call and results:
pricegains = []
pricegains.append({'symbol':'GOOG', 'gain':234.0})
pricegains.append({'symbol':'YHOO', 'gain':124.0})
pricegains.append({'symbol':'IBM', 'gain':1242.0})
pricegains.append({'symbol':'GE', 'gain':1800.0})
pricegains.append({'symbol':'ENE', 'gain':0.0})
pricegains.append({'symbol':'KREM', 'gain':12.0})
maxitems = MaxItems(3)

for row in pricegains:
    maxitems.push(row['gain'], row)

print maxitems.items()

Results of call:
(1800.0, {'symbol': 'GE', 'gain': 1800.0})
(1242.0, {'symbol': 'IBM', 'gain': 1242.0})
(234.0, {'symbol': 'GOOG', 'gain': 234.0})
The heapq module works nicely in accomplishing the task. What's ironic is Python's heapq module implements the min-heap algorithm which works out nicely and efficiently in determining the maximum values over a list. But, does not work out so efficiently for determining the minimum values over a list.

I'll cover the MinItems class in another post. But, to give you a hint of what does work in collecting the minimum values over a list is one of the alternatives I explored in building the MaxItems class...

Alternative yet Inefficient version of MaxItems:
import bisect

class MaxItems2(object):
    Caches the max X items of whatever you're analyzing and
    returns a list containing only those max X values and
    associated items.
    def __init__(self, size):
        self.size = size
        self._recs = []

    def push(self, value, items):
        if len(self._recs) < self.size:
            bisect.insort(self._recs, (value, items))

        elif bisect.bisect(self._recs, (value, items)) > self.size:
            bisect.insort(self._recs, (value, items))
            minitem = self._recs.pop(0)

    def items(self):
        return sorted(self._recs, reverse=True)
MaxItems2 takes advantage of the bisect module and while it works great; performance is at a minimum 2x worse on average than using the heapq method.
Test Code:
import random

pricegains = []
maxitems = MaxItems(100)
for x in xrange(500000):
    gain = random.uniform(1.0,500.0)
    maxitems.push(gain, ('This', 'is', 'Record'))

rows = maxitems.items()
Calling the above code with the wonderful timeit module produced the following results:
  • heapq method: Ten iterations finished in 1.90 seconds.
  • bisect method: Ten iterations finished in 3.80 seconds.
If you know of a faster way to collect the top x of a group of items...please share.

Thursday, November 25, 2010

Running Variance

Variance - kinda the bread and butter for analysis work on a time series. Doesn't get much respect though. But, take the square root of the variance and you get the almighty standard deviation. Today, though, let's give variance its due...
For an intro into variance...check out these posts:
Problem with variance is calculating it in the traditional sense. Its costly to compute across a time series. It can be quite a drag on your simulation engine's performance. The way to reduce the cost is to calculate the running variance. And that's when you get into quite a briar patch - loss of precision and overflow issues. See John D. Cook's post covering the variance briar patch:
And a few more posts by John covering different variance formulas and their outcomes:
John does great work and I learn a lot from his posts. But, I was still having problems finding a variance formula that fit my needs:
  • Reduced the precision loss issue as much as possible;
  • Allowed an easy way to window the running variance;
  • Allowed an easy way to memoize the call.
Thankfully, I found a post by Subluminal Messages covering his very cool Running Standard Deviations formula. The code doesn't work as is - needs correcting on a few items - but you can get the gist of the formula just fine. The formula uses the power sum of the squared differences of the values versus Welford's approach of using the sum of the squared differences of the mean. Which makes it a bit easier to memoize. Not sure if its as good in solving the precision loss and overflow issues as Welford's does....but so far I haven't found any issues with it.

So, let's start with the formula for the Power Sum Average (\(PSA\)):

\( PSA = PSA_{yesterday} + ( ( (x_{today} * x_{today}) - x_{yesterday} ) ) / n) \)

  • \(x\) = value in your time series
  • \(n\) = number of values you've analyzed so far
You also need the Simple Moving Average, which you can find in one of my previous posts here.
Once you have the \(PSA\) and \(SMA\); you can tackle the Running Population Variance (\(Var\) ):

\(Population Var = (PSA_{today} * n - n * SMA_{today} * SMA_{today}) / n \)

Now, one problem with all these formulas - they don't cover how to window the running variance. Windowing the variance gives you the ability to view the 20 period running variance at bar 150. All the formulas I've mentioned above only give you the running cumulative variance. Deriving the running windowed variance is just a matter of using the same SMA I've posted about before and adjusting the Power Sum Average to the following:

\( PSA = PSA_{yesterday} + (((x_{today} * x_{today}) - (x_{yesterday} * x_{yesterday}) / n) \)

  • \(x\) = value in your time series
  • \(n\) = the period
[Update] If you want the sample Variance you just need to adjust the Var formula to the following:

\(Sample Var = (PSA_{today} * n - n * SMA_{today} * SMA_{today}) / (n - 1) \)

Okay, on to the code.

Code for the Power Sum Average:
def powersumavg(bar, series, period, pval=None):
    Returns the power sum average based on the blog post from
    Subliminal Messages.  Use the power sum average to help derive the running
    Keyword arguments:
    bar     --  current index or location of the value in the series
    series  --  list or tuple of data to average
    period  -- number of values to include in average
    pval    --  previous powersumavg (n - 1) of the series.
    if period < 1:
        raise ValueError("period must be 1 or greater")
    if bar < 0:
        bar = 0
    if pval == None:
        if bar > 0:
            raise ValueError("pval of None invalid when bar > 0")
        pval = 0.0
    newamt = float(series[bar])
    if bar < period:
        result = pval + (newamt * newamt - pval) / (bar + 1.0)
        oldamt = float(series[bar - period])
        result = pval + (((newamt * newamt) - (oldamt * oldamt)) / period)
    return result
Code for the Running Windowed Variance:
def running_var(bar, series, period, asma, apowsumavg):
    Returns the running variance based on a given time period.

    Keyword arguments:
    bar     --  current index or location of the value in the series
    series  --  list or tuple of data to average
    asma    --  current average of the given period
    apowsumavg -- current powersumavg of the given period
    if period < 1:
        raise ValueError("period must be 1 or greater")

    if bar <= 0:
        return 0.0

    if asma == None:
        raise ValueError("asma of None invalid when bar > 0")

    if apowsumavg == None:
        raise ValueError("powsumavg of None invalid when bar > 0")

    windowsize = bar + 1.0
    if windowsize >= period:
        windowsize = period

    return (apowsumavg * windowsize - windowsize * asma * asma) / windowsize

Example call and results:
list_of_values = [3, 5, 8, 10, 4, 8, 12, 15, 11, 9]
prev_powersumavg = None
prev_sma = None
prev_sma = None
period = 3
for bar, price in enumerate(list_of_values):
    new_sma = running_sma(bar, list_of_values, period, prev_sma)
    new_powersumavg = powersumavg(bar, list_of_values, period, prev_powersumavg)
    new_var = running_var(bar, list_of_values, period, new_sma, new_powersumavg)

    msg = "SMA=%.4f, PSA=%.4f, Var=%.4f" % (new_sma, new_powersumavg, new_var)
    print "bar %i: %s" % (bar, msg)

    prev_sma = new_sma
    prev_powersumavg = new_powersumavg

Results of call:
bar 0: SMA=3.0000, PSA=9.0000, Var=0.0000
bar 1: SMA=4.0000, PSA=17.0000, Var=1.0000
bar 2: SMA=5.3333, PSA=32.6667, Var=4.2222
bar 3: SMA=7.6667, PSA=63.0000, Var=4.2222
bar 4: SMA=7.3333, PSA=60.0000, Var=6.2222
bar 5: SMA=7.3333, PSA=60.0000, Var=6.2222
bar 6: SMA=8.0000, PSA=74.6667, Var=10.6667
bar 7: SMA=11.6667, PSA=144.3333, Var=8.2222
bar 8: SMA=12.6667, PSA=163.3333, Var=2.8889
bar 9: SMA=11.6667, PSA=142.3333, Var=6.2222

Of course, as I said in the beginning of this post, just take the square root of this Running Windowed Variance to obtain the Standard Deviation.

Later Trades,


Saturday, September 11, 2010

Running Sum

We've covered Running SMAs and EMAs...let's dig into Running Sums or often called Running Totals. Formula as follows:
\(Sum_{today} = Sum_{yesterday} + (price_{today} - price_{today - period})\)

Where \( price_{today - period} \) represents the price that is dropping off the slice you are summing. For example:

Take a list of numbers = 20, 40, 60, 80, 100, 120.
The formula for the 3-bar running sum would be:
bar 1: 20
bar 2: 20 + 40 = 60
bar 3: 20 + 40 + 60 = 120
bar 4: 40 + 60 + 80 = 180
Or we can apply our formula from above as \( Sum_{today} = 120 + (80 - 20) \)
bar 5: 60 + 80 + 100 = 240
Or use formula of \( Sum_{today} = 180 + (100 - 40) \)
bar 6: 80 + 100 + 120 = 300
Or use formula of \( Sum_{today} = 240 + (120 - 60) \)

Coding in Python we get:
def running_sum(bar, series, period, pval=None):
    Returns the running sum of values in a list of tuple - avoids summing
    entire series on each call.

    Keyword arguments:
    bar     --  current index or location of the value in the series
    series  --  list or tuple of data to sum
    period  -- number of values to include in sum
    pval    --  previous sum (n - 1) of the series.
    if period < 1:
        raise ValueError("period must be 1 or greater")

    if bar <= 0:
        return series[0]

    if bar < period:
        return pval + series[bar]
    return pval + (series[bar] - series[bar - period])
Example call and results:
list_of_values = [20, 40, 60, 80, 100, 120]
prevsum = list_of_values[0]   #first sum is the first value in the series.

for bar, price in enumerate(list_of_values):
    newsum = running_sum(bar, list_of_values, 3, pval=prevsum)
    print "bar %i: %i" % (bar, newsum)
    prevsum = newsum

bar 0: 20
bar 1: 60
bar 2: 120
bar 3: 180
bar 4: 240
bar 5: 300

Sunday, August 01, 2010

Exponential Moving Average (EMA)

Now that we've tackled Running Simple Moving Averages (SMA)...let's move on to Exponential Moving Averages (EMA). You may wonder why we're not covering Running Exponential Moving Averages? The default formula for EMA is the running method - so we're already covered.

Check out the posts below to understand the background on Exponential Moving Averages (EMA) and their calculation. Be careful with using EMAs in your backtesting. Or any of these running type of indicators. Since all of them require a starting value. If that starting value changes - your signals change. Which can happen if you switch price quote providers that have different history requirements. Should not be a big deal but something to be aware of.

Let's begin. We need to calculate our smoothing factor for the time series. Typical use in technical analysis is:
\( \alpha = 2.0 / (periods + 1.0) \)

We can use any value between 0 & 1 for the smoothing factor. Closer to one is less smooth and places greater weight on the more recent values. Use a value of 1 and you get the most recent value back. Closer to zero is more smooth and places greater weight on the older values.

Now, the formula for an EMA given our smoothing factor:
\( EMA_{today} = EMA_{yesterday} + \alpha(price_{today} - EMA_{yesterday}) \)

Coding in Python we get:
def ema(bar, series, period, prevma, smoothing=None):
    '''Returns the Exponential Moving Average of a series.

    Keyword arguments:
    bar         -- currrent index or location of the series
    series      -- series of values to be averaged
    period      -- number of values in the series to average
    prevma      -- previous exponential moving average
    smoothing   -- smoothing factor to use in the series.
        valid values: between 0 & 1.
        default: None - which then uses formula = 2.0 / (period + 1.0)
        closer to 1 to gives greater weight to recent values - less smooth
        closer to 0 gives greater weight to older values -- more smooth
    if period < 1:
        raise ValueError("period must be 1 or greater")

    if smoothing:
        if (smoothing < 0) or (smoothing > 1.0):
            raise ValueError("smoothing must be between 0 and 1")

        smoothing = 2.0 / (period + 1.0)

    if bar <= 0:
        return series[0]

    elif bar < period:
        return cumulative_sma(bar, series, prevma)

    return prevma + smoothing * (series[bar] - prevma)

def cumulative_sma(bar, series, prevma):
    Returns the cumulative or unweighted simple moving average.
    Avoids averaging the entire series on each call.

    Keyword arguments:
    bar     --  current index or location of the value in the series
    series  --  list or tuple of data to average
    prevma  --  previous average (n - 1) of the series.

    if bar <= 0:
        return series[0]

        return prevma + ((series[bar] - prevma) / (bar + 1.0))

Example call and results using the typical smoothing factor of 2 / (period + 1):
prices = [32.47, 32.70, 32.77, 33.11, 33.25, 33.23, 33.23, 33.0, 33.04, 33.21]
period = 5   #number of bars to average
prevsma = prevema = prices[0]   #1st day nothing to average

for bar, close in enumerate(prices):
    currentema = ema(bar, prices, period, prevema, smoothing=None)

    #running_sma defined in simple moving average blog post
    currentsma = running_sma(bar, prices, period, prevsma)

    print "Day %02d Value=%.2f %i-bar SMA=%f and EMA=%f" % (bar + 1, close, period, currentsma, currentema)
    prevema = currentema
    prevsma = currentsma

Results of call:

Day 01 Value=32.47 5-day SMA=32.470000 and EMA=32.470000
Day 02 Value=32.70 5-day SMA=32.585000 and EMA=32.585000
Day 03 Value=32.77 5-day SMA=32.646667 and EMA=32.646667
Day 04 Value=33.11 5-day SMA=32.762500 and EMA=32.762500
Day 05 Value=33.25 5-day SMA=32.860000 and EMA=32.860000
Day 06 Value=33.23 5-day SMA=33.012000 and EMA=32.983333
Day 07 Value=33.23 5-day SMA=33.118000 and EMA=33.065556
Day 08 Value=33.00 5-day SMA=33.164000 and EMA=33.043704
Day 09 Value=33.04 5-day SMA=33.150000 and EMA=33.042469
Day 10 Value=33.21 5-day SMA=33.142000 and EMA=33.098313

Sunday, June 20, 2010

Running Simple Moving Average (SMA)

When building a platform to test trading of the big issues to deal with is all the indicators that require a spin through the price series in order to calculate. For example, in order to calculate the 200 day simple moving average (SMA) of closing prices for Google today you would have to loop back 200 - 1 days ago and sum the closing prices and divide by 200.

When you are backtesting an idea you often need to start from day 1 of a stock's trading history and loop forward to the most current day. In essence, pretending each day is the current day at that point in time. Thus, you are looping back 200 - 1 data points for each day in the series. This isn't such a big deal with a stock such as Google whose trading history is rather limited (2004). But, take a stock like IBM with a more extensive trading history and your code is going to bog down with each call to the SMA indicator. Throw 20,000 securities into your backtest and the looping adds up.

Therefore, running calculations are the preferred method in order to spin just once through the data points. So, in order to calculate the running simple moving average for closing prices you apply the following formula:
\(SMA_{today} = SMA_{yesterday} + ((Price_{today} - Price_{today - n}) /n)\)
  • \(n\) = number of values included in your rolling computational window.
Straight-forward and avoids the loop. Here's the sample Python code for the Running SMA:
def cumulative_sma(bar, series, prevma):
    Returns the cumulative or unweighted simple moving average.
    Avoids sum of series per call.

    Keyword arguments:
    bar     --  current index or location of the value in the series
    series  --  list or tuple of data to average
    prevma  --  previous average (n - 1) of the series.
    if bar <= 0:
        return series[0]

    return prevma + ((series[bar] - prevma) / (bar + 1.0))
def running_sma(bar, series, period, prevma):
    Returns the running simple moving average - avoids sum of series per call.

    Keyword arguments:
    bar     --  current index or location of the value in the series
    series  --  list or tuple of data to average
    period  --  number of values to include in average
    prevma  --  previous simple moving average (n - 1) of the series

    if period < 1:
        raise ValueError("period must be 1 or greater")

    if bar <= 0:
        return series[0]

    elif bar < period:
        return cumulative_sma(bar, series, prevma)

    return prevma + ((series[bar] - series[bar - period]) / float(period))
And the example call and results:
prices = [10, 15, 25, 18, 13, 16]
prevsma = prices[0]   #1st day nothing to average so return itself.
for bar, close in enumerate(prices):
    currentsma = running_sma(bar, prices, 3, prevsma)
    print "Today's 3-day SMA = %.4f" % currentsma
    prevsma = currentsma

------- Results ----------------
Today's 3-day SMA = 10.0000
Today's 3-day SMA = 12.5000
Today's 3-day SMA = 16.6667
Today's 3-day SMA = 19.3333
Today's 3-day SMA = 18.6667
Today's 3-day SMA = 15.6667

Thursday, February 11, 2010

Trading Framework Part I: Tools I Use

I received a question from a reader regarding the software I use...more specifically...the open source software I use in trading. Instead of a direct response, I figured this type of question might be useful to other readers of this blog.

My basic trading framework is the following:
Operating System:Windows Vista Home Premium
Programming Languages:Python 2.6.2 & R 2.9.1
Databases:SQLite 2.4.1, Numpy 1.3.0, & CSV
Programming Editor:SciTE 1.78
Graphing Engines:Matplotlib 0.98.5 & R
GUI:HTML & JavaScript
Scheduler:Windows Task Scheduler (DOS) & Cygwin (Bash)
Historical Quotes:CSI & Yahoo Finance

Operating System
Choosing Windows as the operating system is mainly out of convenience. As you can see above, the only real item that would prevent a full move to Linux is the historical quote provider, CSI. Everything else can run on another platform or a suitable alternative is available.

Another reason I've stayed with Windows is due to my current job (windows shop). But, I will admit, I have been very close to switching to a Mac the past few months or possibly OpenSUSE. Just haven't taken the bite yet.

On a side note, prior to my current employer...I worked for a University that was really ahead of its time. Every program we developed had to pass a compatibility test, "Could it easily run on another platform?" While this at times was an impossible task due to user requirements...we still always coded with this compatibility in mind. And I've kept this same philosophy in developing the trading simulation engine.

Programming Languages
I'm originally a Cobol programmer. Yes, that's right...if you've never heard of you're reading a blog by one. Cobol programmers, the good ones, are very keen on whitespace. When you're throwing a lot of code around...the whitespace is what keeps you sane. And so, when I was trying out the various scripting languages back in the day...Python really struck my fancy. I spent the better part of 9 years trying to force programmers to keep the code pretty in Cobol. Only to see Python come around and truly force programmers to code clean. Over the years, I have worked in various other languages, but I've always stuck with Python.

I think another reason I chose Python was due to WealthLab's Scripting language (Pascal-based). I felt I could build an environment similar to WealthLab that would offer the same scripting ease. So far, Python has done a great job in keeping the framework simple and extensible.

Another language I have used from time to time in my trading is R. I use R mainly to analyze trading results. A few years ago, I actually developed a prototype of the trading simulation engine in R. But, it was too slow. The loops killed it. With the recent development of Revolution Computing's ParallelR...I've often wondered what the results would now be. But, I'm past the point of return with the engine in Python. But, as far as fast analysis of CSV is really hard to beat R.

I struggled several years with how to store and retrieve the historical price series data for the trading simulation engine. The main problem was the data could not fit into memory yet access had to be extremely fast. So, for years I used plain CSV files to store the data.

Basically, reading the CSV files from CSI and writing out new price CSV files with my fixes from possible bad data along with additional calculated fields. At first I stored the data into 1 big CSV file. Then used either the DOS sort or Bash sort command to sort the file by date. I was afraid I would run into file size limits (at the time I was on Windows XP 32-bit). So, I started writing the data out to thousands of files broken down by date. Basically, each file was a date containing all the prices for that date. Worked really well...except analysis on the backend became difficult. Plus, it felt kludgy.

I had always tried to use regular databases for the pricing backend...but they couldn't handle the storage and retrieval rates I required. Just too slow. And yes, I tried all of them: MySQL, PostGreSQL, Firebird, Berkely DB, SQLite, etc.

It wasn't until I read an article by Bret Taylor covering how FriendFeed uses MySQL that I had an idea as to how to use a database to get the best of both worlds - fast storage & retrieval along with slick and easy access to the data. That's when I went back to SQLite and began a massive hacking of code while on a Texas Hill Country vacation. Really bumped the trading simulation engine to another level. The trick to fast storage & retrieval? Use less but bigger rows.

For a memory database? I use numpy. It's a fantastic in-memory multi-dimensional storage tool. I dump the price series from SQLite to numpy to enable row or column-wise retrieval. Only recently have I found the performance hit is a little too much. So, I've removed numpy from one side of the framework. And contemplating removing it from the other side as well. It takes more work to replicate numpy via a dictionary of dictionaries of lists. But, surprisingly, it is worth the effort when dealing with price series. Which means, I may not use numpy in the engine for long. Still a great tool to use for in-memory storage.

Graphing Engines and GUI
I really try to keep it simple in the front-end of the trading framework. I use Matplotlib to visualize price or trading results. And HTML along with Javascript to display trading statistics. Honestly, not a lot has gone into this side of things. Still very raw. My goal for 2010 is to work more in this area.

I have used R quite a bit in analyzing the output of the trading backtests. R is really powerful here. Quickly and easily chart and/or view pretty much any subset of the data you wish.

If there's certain items I look at over and over in the backtests...I'll typically replicate in Python & Matplotlib and include in the backtest results.

Editor, Schedulers, and Shells.
SciTE is hands down my favorite Python editor. I don't like the fancy IDE type stuff. SciTE keeps it simple.

Windows Task Scheduler is for the birds. I should main job is centered around Enterprise Scheduling. But, the windows task scheduler gets the job done most of the time. I just have to code around a lot of the times it misses or doesn't get things quite right. Which is okay...that's life. That's one of the main reasons I have thought about switching to a nix box for cron and the like.

The DOS shell or Bash shell...I don't get too fancy in either. I do use the Bash shell quite a bit in performing global changes in the python code. Or back when the database was CSV based. Again, nix boxes win here. But, us windows developers hopefully can always get a copy of Cygwin to save the day.

Historical Quotes
I have used CSIdata for many years. Mainly for the following reasons:
  • Dividend-adjusted quotes which are essential if analyzing long-term trading systems.
  • Adjusted closing price - needed if you wish to test the exclusion of data based on the actual price traded - not the split-adjusted price.
  • CSV files - CSI does a great job of building and maintaining CSV files of price history.
  • Delisted data - I thought this would be a bigger deal but didn't really impact test results...but still nice to have for confirmation.
  • Data is used by several hedge funds and web sites such as Yahoo Finance.
The only drawback I have to CSI is the daily limit to the number of stocks you can export out of the product. It can get frustrating trying to work around the limit. Of course, you can always pony up for a higher limit.

This covers Part I of the series. Next up? The type of analysis I perform with the trading framework.

Later Trades,


Friday, November 14, 2008

What I'm Researching...

Jim Barry's Rexx Tutor Part2

Posted: 13 Nov 2008 01:00 PM CST

great summaries on the classic rexx functions.

Project Aardvark

Posted: 13 Nov 2008 12:53 PM CST

Joel on Software's Real World. A must see!

Reading List: Fog Creek Software Management Training Program - Joel on Software

Posted: 13 Nov 2008 12:50 PM CST

great reading list!

In Python how do I sort a list of dictionaries by values of the dictionary? - Stack Overflow

Posted: 09 Nov 2008 09:29 PM CST

nice efficient sorting of values in a python dictionary.

AT&T Labs Research - Yoix / YWAIT

Posted: 07 Nov 2008 07:36 AM CST

Interesting way to build a web application. Wonder how complex this would be to use versus traditional web-based systems (LAMP)? This may be easier to deploy if the goal of the software is simulation/visualizations. Something to toy with.

AT&T Labs Research - Yoix / Byzgraf

Posted: 07 Nov 2008 07:33 AM CST

Another great looking toolset using Yoix that enables plotting functions: line, bar, histograms, etc.

AT&T Labs Research - Yoix / YDAT

Posted: 07 Nov 2008 07:32 AM CST

Extremely cool visualization toolset from AT&T Labs Research. Handles graphviz files.

Friday, November 07, 2008

What I'm Researching...

Overview of RAMFS and TMPFS on Linux

Posted: 06 Nov 2008 11:02 PM CST

Map your memory as a drive? Wonder how this would work if you built a linux server with 32gb memory and mapped at least half that dedicated for simulations? How much faster would this be versus traditional disk-based sims?

Replacing multiple occurrences in nested arrays - Stack Overflow

Posted: 06 Nov 2008 10:58 PM CST

will this work in updating a dictionary of prices? if you have a dictionary of portfolio positions with values being python lists...would this be a good solution in updating the closing price of the stock (one of the items in the list)?

Monday, October 20, 2008

What I'm Researching...

RocketDock - About RocketDock

Posted: 20 Oct 2008 12:17 AM CDT

extremely cool application dock for windows.

Python Programming/Lists - Wikibooks, collection of open-content textbooks

Posted: 20 Oct 2008 12:12 AM CDT

Great collection of python list examples.

Introduction To New-Style Classes in Python

Posted: 19 Oct 2008 01:18 AM CDT

great explanation of python classes. check out the final part discussing the __slots__ feature. basically, reserve attributes...those not defined cannot be assigned.

PyTables User's Guide

Posted: 18 Oct 2008 12:30 PM CDT

html version of the pytables userguide.

rdoc:graphics:barplot [R Wiki]

Posted: 17 Oct 2008 04:22 PM CDT

R doc for barplot

Welcome to DrQueue Commercial Website

Posted: 12 Oct 2008 11:44 PM CDT

queue manager with python binding. looks to be used as a render manager...but could see other uses as well.

Building home linux render cluster

Posted: 12 Oct 2008 11:30 PM CDT

excellent article on building a cheap 24 core x 48GB ram linux cluster.

Wednesday, October 08, 2008

What I'm Researching...

Linus' blog: .. so I got one of the new Intel SSD's

Posted: 07 Oct 2008 10:02 PM CDT

great analysis on evaluating SSD hard drives. read the comments for more info. as an aside...linus has a

pymc - Google Code

Posted: 07 Oct 2008 12:45 PM CDT

monte carlo in python? looks worth exploring further.

Tuesday, October 07, 2008

What I'm Researching...

The Sect of Homokaasu - The Rasterbator

Posted: 07 Oct 2008 01:45 AM CDT

Cool, print huge posters from normal paper - software breaks up images to fit on 8.5 x 11 paper. Hat-tip to my wife for finding this site.

PerTrac Support - Statistics

Posted: 06 Oct 2008 12:43 PM CDT

Great site covering formulas of investment stats. Useful for coding the performance part of the testing platform.

pickle(cPickle) vs numpy tofile/fromfile - Python - Snipplr

Posted: 05 Oct 2008 11:09 PM CDT

interesting code snippet comparing performance of cpickle and numpy to/from file routines. been thinking about this lately...using numpy directly or cpickle instead of using a bloated dbms for persistent storage of time series on the testing platform.

HintsForSQLUsers - Hierarchical Datasets in Python

Posted: 05 Oct 2008 11:06 PM CDT

covers many of the faq of SQL developers when developing with PyTables.

EasyvizDocumentation - scitools - Google Code - Easyviz Documentation

Posted: 05 Oct 2008 09:55 PM CDT

Python plotting interface to various backend plotting engines: Gnuplot, Matplotlib, Grace, Veusz, PyX, VTK, VisIt, OpenDX, and a few more. Seems like a fairly straight-forward interface. And choosing the backend used is a one-line import statement. Interesting.

PyX - Python graphics package

Posted: 05 Oct 2008 12:25 PM CDT

looks like a dead-simple plotting library in python to produce pub quality pdf/ps images. Need to explore.

Sunday, October 05, 2008

What I'm Researching...

TinyMCE - Home

Posted: 05 Oct 2008 12:12 AM CDT

WYSIWYG Javascript WYSIWYG editor - haven't tried it...but may be worth testing on a new project of mine.

PyTables - Hierarchical Datasets in Python

Posted: 04 Oct 2008 01:35 PM CDT

the original python interface to the HDF5 library. Have tested this before...need to test again using new architecture. Original tests found speeds that were equivalent to SQLite but of course slower than CSV files.

Python bindings for the HDF5 library — h5py v0.3.1 documentation

Posted: 04 Oct 2008 01:33 PM CDT

a python interface to the excellent HDF5 library. worth testing in project.

Dive into Erlang

Posted: 04 Oct 2008 12:24 PM CDT

enjoyed reading this guy's take on Erlang. Of course, he had me with quoting Unix philosophy, "Do one thing and do it well."

Optimal RAID setup for SQL server - Stack Overflow

Posted: 04 Oct 2008 10:35 AM CDT

Excellent Q&A on choosing the optimal RAID config for disk i/o performance. By the by, stackoverflow is an awesome site for programmers!!!

Friday, September 21, 2007

Recent Links for 09/21/2007

Newbie - converting csv files to arrays in NumPy
Great message thread on how to convert csv files to numpy arrays.
Cookbook/InputOutput - Numpy and Scipy
File processing examples using numpy, scipy, and matplotlib. How to read/write a numpy array from/to ascii/binary files.
Numpy Example List
Examples of Numpy functions such as fromfile(), hsplit(), recarray(), shuffle(), sort(), split(), sqrt(), std(), tofile(), unique(), var(), vsplit(), where(), zeros(), empty(), and many more.
Introducing Plists: An Erlang module for doing list operations in parallel
Could you spawn a trading system process for each stock of a given day's trading (a list)? What if you had 20,000 stocks for a given day? Can plists/erlang handle 20,000 processes without hitting memory constraints?

Monday, September 17, 2007

Recent Links for 09/17/2007

Sunday, September 16, 2007

Recent Links for 09/15/2007

Links for 2007-09-15 []

Posted: 16 Sep 2007 12:00 AM CDT

Wednesday, September 05, 2007

Recent Links 09/05/2007

Speed up R, Python, and MATLAB - Going Parallel

Tuesday, September 04, 2007

Recent Links 09/04/2007

World Beta - Engineering Targeted Returns and Risk: More On The Endowment Style Of Investing  Annotated

    • World Beta shares some links covering the endowment investing side of things...
      • A link to
        Frontier Capital Management
        - check out their knowledge section for more great papers similar to the ones Faber links to.
      • Faber mentions a great upcoming book covering the twelve top endowment CIO's .
      • from Alpha Magazine...Highbridge Capital Managment shares its office organization - putting traders and developers together.  I've always thought this would be a great idea in any shop.  By putting users and developers together - manual taks can be seen and automation can happen.

     - post by taylortree tkdiff

  • Great little file compare utility.  Graphic front end to the diff program.
    note:  tested this today against a large file/program (well, not that large in my line of work...but I guess to Google's)...couldn't handle it.  But, works great on small files.
     - post by taylortree

Google Mondrian: web-based code review and storage

  • Online code review that works like a blog/wiki.  I it possible to create a code review system similar to Mondrian within a source management toolset such as subversion?  Seems like most of the backend is there already...would only need to add some front end tools to display the changes being committed and allow comments on those changes.
     - post by taylortree

Monday, September 03, 2007

Recent Links 09/03/2007 -- Numerical Python Basics

Programming in R

Finding Duplicate Elements in an Array :: Phil! Gregory  Annotated

Now, suppose that the array is of length n and only contains positive
integers less than n. We can be sure (by the pigeonhole principle)
that there is at least one duplicate.
    So, how do we find the beginning of the cycle? The easiest approach is to
    use Floyd's cycle-finding algorithm. It works roughly like this:
    Start at the beginning of the sequence. Keep track of two values (call
    them ai and aj). At
    each step of the algorithm, move ai one step
    along the sequence, but move aj two steps. Stop
    when ai = aj.