TaylorTree: programming

Showing posts with label programming. Show all posts

Thursday, February 11, 2010

Trading Framework Part I: Tools I Use

I received a question from a reader regarding the software I use...more specifically...the open source software I use in trading. Instead of a direct response, I figured this type of question might be useful to other readers of this blog.

My basic trading framework is the following:

Operating System:	Windows Vista Home Premium
Programming Languages:	Python 2.6.2 & R 2.9.1
Databases:	SQLite 2.4.1, Numpy 1.3.0, & CSV
Programming Editor:	SciTE 1.78
Graphing Engines:	Matplotlib 0.98.5 & R
GUI:	HTML & JavaScript
Scheduler:	Windows Task Scheduler
Shells:	Command.com (DOS) & Cygwin (Bash)
Historical Quotes:	CSI & Yahoo Finance

Operating System
Choosing Windows as the operating system is mainly out of convenience. As you can see above, the only real item that would prevent a full move to Linux is the historical quote provider, CSI. Everything else can run on another platform or a suitable alternative is available.

Another reason I've stayed with Windows is due to my current job (windows shop). But, I will admit, I have been very close to switching to a Mac the past few months or possibly OpenSUSE. Just haven't taken the bite yet.

On a side note, prior to my current employer...I worked for a University that was really ahead of its time. Every program we developed had to pass a compatibility test, "Could it easily run on another platform?" While this at times was an impossible task due to user requirements...we still always coded with this compatibility in mind. And I've kept this same philosophy in developing the trading simulation engine.

Programming Languages
I'm originally a Cobol programmer. Yes, that's right...if you've never heard of one...now you're reading a blog by one. Cobol programmers, the good ones, are very keen on whitespace. When you're throwing a lot of code around...the whitespace is what keeps you sane. And so, when I was trying out the various scripting languages back in the day...Python really struck my fancy. I spent the better part of 9 years trying to force programmers to keep the code pretty in Cobol. Only to see Python come around and truly force programmers to code clean. Over the years, I have worked in various other languages, but I've always stuck with Python.

I think another reason I chose Python was due to WealthLab's Scripting language (Pascal-based). I felt I could build an environment similar to WealthLab that would offer the same scripting ease. So far, Python has done a great job in keeping the framework simple and extensible.

Another language I have used from time to time in my trading is R. I use R mainly to analyze trading results. A few years ago, I actually developed a prototype of the trading simulation engine in R. But, it was too slow. The loops killed it. With the recent development of Revolution Computing's ParallelR...I've often wondered what the results would now be. But, I'm past the point of return with the engine in Python. But, as far as fast analysis of CSV files...it is really hard to beat R.

Databases
I struggled several years with how to store and retrieve the historical price series data for the trading simulation engine. The main problem was the data could not fit into memory yet access had to be extremely fast. So, for years I used plain CSV files to store the data.

Basically, reading the CSV files from CSI and writing out new price CSV files with my fixes from possible bad data along with additional calculated fields. At first I stored the data into 1 big CSV file. Then used either the DOS sort or Bash sort command to sort the file by date. I was afraid I would run into file size limits (at the time I was on Windows XP 32-bit). So, I started writing the data out to thousands of files broken down by date. Basically, each file was a date containing all the prices for that date. Worked really well...except analysis on the backend became difficult. Plus, it felt kludgy.

I had always tried to use regular databases for the pricing backend...but they couldn't handle the storage and retrieval rates I required. Just too slow. And yes, I tried all of them: MySQL, PostGreSQL, Firebird, Berkely DB, SQLite, etc.

It wasn't until I read an article by Bret Taylor covering how FriendFeed uses MySQL that I had an idea as to how to use a database to get the best of both worlds - fast storage & retrieval along with slick and easy access to the data. That's when I went back to SQLite and began a massive hacking of code while on a Texas Hill Country vacation. Really bumped the trading simulation engine to another level. The trick to fast storage & retrieval? Use less but bigger rows.

For a memory database? I use numpy. It's a fantastic in-memory multi-dimensional storage tool. I dump the price series from SQLite to numpy to enable row or column-wise retrieval. Only recently have I found the performance hit is a little too much. So, I've removed numpy from one side of the framework. And contemplating removing it from the other side as well. It takes more work to replicate numpy via a dictionary of dictionaries of lists. But, surprisingly, it is worth the effort when dealing with price series. Which means, I may not use numpy in the engine for long. Still a great tool to use for in-memory storage.

Graphing Engines and GUI

I really try to keep it simple in the front-end of the trading framework. I use Matplotlib to visualize price or trading results. And HTML along with Javascript to display trading statistics. Honestly, not a lot has gone into this side of things. Still very raw. My goal for 2010 is to work more in this area.

I have used R quite a bit in analyzing the output of the trading backtests. R is really powerful here. Quickly and easily chart and/or view pretty much any subset of the data you wish.

If there's certain items I look at over and over in the backtests...I'll typically replicate in Python & Matplotlib and include in the backtest results.

Editor, Schedulers, and Shells.
SciTE is hands down my favorite Python editor. I don't like the fancy IDE type stuff. SciTE keeps it simple.

Windows Task Scheduler is for the birds. I should know...my main job is centered around Enterprise Scheduling. But, the windows task scheduler gets the job done most of the time. I just have to code around a lot of the times it misses or doesn't get things quite right. Which is okay...that's life. That's one of the main reasons I have thought about switching to a nix box for cron and the like.

The DOS shell or Bash shell...I don't get too fancy in either. I do use the Bash shell quite a bit in performing global changes in the python code. Or back when the database was CSV based. Again, nix boxes win here. But, us windows developers hopefully can always get a copy of Cygwin to save the day.

Historical Quotes
I have used CSIdata for many years. Mainly for the following reasons:

Dividend-adjusted quotes which are essential if analyzing long-term trading systems.
Adjusted closing price - needed if you wish to test the exclusion of data based on the actual price traded - not the split-adjusted price.
CSV files - CSI does a great job of building and maintaining CSV files of price history.
Delisted data - I thought this would be a bigger deal but didn't really impact test results...but still nice to have for confirmation.
Data is used by several hedge funds and web sites such as Yahoo Finance.

The only drawback I have to CSI is the daily limit to the number of stocks you can export out of the product. It can get frustrating trying to work around the limit. Of course, you can always pony up for a higher limit.

This covers Part I of the series. Next up? The type of analysis I perform with the trading framework.

Later Trades,

Monday, July 27, 2009

Portfolio Performance for June 2009

June was a great month, both personally and for the portfolio. My family and I headed off to Texas for a few weeks to spend time with family and escape to the hill country for some good old R&R.

It was great visiting with everyone, checking out the beautiful Texas scenery, and enjoying some great food. There's a place in Liberty called Jax that served great catfish and an equally great ambiance. The restaurant is just across the street from the courthouse...so the place is the true heartbeat of the town. Could have been a setting out of a John Grisham novel. Very cool.

The picture taken to above was a rainbow we caught on our way back from dinner just before sunset. Felt it was appropriate considering the portfolio beat the market for the first month since February.

Not obvious in the above chart, but the S&P 500 returned only 0.02% for the month of June. And the portfolio returned 1.58%. Not much to brag about but nice to breathe some air for a month.

For the month of June, the portfolio is approximately 21% in cash which is quickly dropping due to the high number of signals received in the month of July.

As far as the portfolio simulation engine...I've had some exciting progress this past month.

It really helped getting some quiet time. Each morning, I would get my coffee, sit out on the front porch, watch the hummingbirds go to war, the doves get lovey, and hack away on the simulation engine. I've created a new database that utilizes the Python's struct module for binary storage, SQLite for storing pointers to the records, and Numpy for field named record access. The best part is the database requires very little memory, has a small disk footprint, and is faster than anything I've worked with before. Previously, using a database of any kind was not an option. The aha moment was in realizing the bottleneck in performance was due to the number of records stored not the size of the records. Therefore, my main goal was to reduce the number of rows in the table and scale horizontally in the table versus vertically. This drastically reduced the lookup time.

Now, I'm in the process of refining the reporting engine and building a price series plotting framework with Matplotlib and Numpy. So, far the results look very promising. Nice to finally get some pretty charts to the simulation engine instead of a clunky MS Excel interface. Still more work ahead.

Heck, I've made so much progress sitting out watching those birds in the morning that I came home and started on a flower garden in our backyard. I've just finished tilling up the dirt and planting a few plants. We've already got hummingbirds fighting and a squirrel trying to figure out how to open that bird feeder. Now, I just need to get to hacking!

Later Trades,

MT

Tuesday, July 21, 2009

R on Stack Overflow...

Funny, I was working through a problem in R today and was seriously wishing R had the same presence as python over at Stack Overflow. Looks like others have this wish as well...and they're doing something about it.

In concert with users online across the country, this session will lead a flashmob to populate Stack Overflow with R language content.

Very cool! Check out R on Stack Overflow. And post those questions!

MT

Friday, November 14, 2008

What I'm Researching...

Jim Barry's Rexx Tutor Part2

Posted: 13 Nov 2008 01:00 PM CST

great summaries on the classic rexx functions.

Project Aardvark

Posted: 13 Nov 2008 12:53 PM CST

Joel on Software's Real World. A must see!

Reading List: Fog Creek Software Management Training Program - Joel on Software

Posted: 13 Nov 2008 12:50 PM CST

great reading list!

In Python how do I sort a list of dictionaries by values of the dictionary? - Stack Overflow

Posted: 09 Nov 2008 09:29 PM CST

nice efficient sorting of values in a python dictionary.

AT&T Labs Research - Yoix / YWAIT

Posted: 07 Nov 2008 07:36 AM CST

Interesting way to build a web application. Wonder how complex this would be to use versus traditional web-based systems (LAMP)? This may be easier to deploy if the goal of the software is simulation/visualizations. Something to toy with.

AT&T Labs Research - Yoix / Byzgraf

Posted: 07 Nov 2008 07:33 AM CST

Another great looking toolset using Yoix that enables plotting functions: line, bar, histograms, etc.

AT&T Labs Research - Yoix / YDAT

Posted: 07 Nov 2008 07:32 AM CST

Extremely cool visualization toolset from AT&T Labs Research. Handles graphviz files.

Tuesday, September 23, 2008

Barplot function in R

Much of my backtesting platform is text driven. Not that I'm opposed to graphs...just felt my time was better spent developing the foundation for the platform before adding bells and whistles. Little did I realize how difficult it is to find a simple graphing engine for the platform. Problem is...I'm old school...couldn't care less about flash graphs. Keep it simple.

Since I'm using python...figured I had to give the matplotlib library a try. It is nice...simple...but something was missing. Couldn't put my finger on it. So, dug around and played with the R language plotting libraries. A bit more my speed...though a bit particular in the settings. Anyway, here's a function I wrote to generate bar charts using R with a replacement for pie charts in mind...


#-----------------------------------------------------------------
# Simple bar chart - use instead of pie chart when possible.
#-----------------------------------------------------------------
barPie <- function(xSeries, chTitle="Your Bar Chart", xLab="X Label",
                     xDesc="%")
{
 xSeries <- sort(xSeries)

 # save off original settings in order to reset on exit
 oldPar <- par(no.readonly=TRUE)

 plot.new()

 # set page margins in inches
 par(mai=c(1,1.5,1,1))


 # pad 30% for labels
 # start plotting at 0.0 unless negative
 if (min(xSeries) < 0.0)
 {
     xLim = c((min(xSeries) * 1.3), (max(xSeries) * 1.3))
 }
 else
 {
     xLim = c(0.00, (max(xSeries) * 1.3))
 }

 # horizontal barplot in color baby!
 bp <- barplot(xSeries, horiz=T,
       xlab=xLab, las=1, col=rainbow(length(xSeries)),
       xlim=xLim,
       axes=F, cex.names=0.7, main=chTitle)

 # if x negative then start label at 0.0
 # otherwise, start label at value of x.
 xVals = ifelse(xSeries < 0.0, 0.0, xSeries)
 text(xVals, bp, paste(xSeries, xDesc, sep=""),pos=4, cex=0.65)

 # format x axis
 xRange <- formatC(pretty(xSeries), 1, format="f")

 axis(1, at=xRange, labels=as.character(xRange), cex.axis=0.75)
 box()

 #restore par value to previous state
 on.exit(par(oldPar))
}

Used data from my portfolio to plot sector allocations and called the function...


sectors <- c(10.64,119.83,162.66,66.48,71.78,35.44,32.77,161.17,53.91,
               101.81,53.38,231.45,31.24,103.01)
sectors <- round((sectors/sum(sectors)*100.00), 1)

# write to png driver
png("c:/taylortrade/rlang/sectors_test.png")

barPie(sectors, "Sector Allocation", "Pct Allocated")

# stop writing to png driver
dev.off()

And here's the result...

Tuesday, December 11, 2007

To Design or Code?

"The one who does the work decides." -- KDE principle

Jeff Atwood over at the Coding Horror blog discusses a fascinating problem in software development. Doers and talkers. Designers and coders.

I believe all developers need to have a bit of both in their toolbox. Mainly, because the first design is most always changed due to scope creep (failure to see all the pieces to the puzzle). If you spend all your time talking about that first design...you never get to coding. And if you can't get to the coding...you'll fail to find those missing puzzle pieces. And fail to deliver a prototype for the customers to evaluate.

Designers, this means getting your hands dirty in order to create something to improve. Developing systems is an iterative process. Design, code, design, code. And yes, even code, design, code, design. The ultimate goal is to refine the process until you and your customers are satisfied. Whatever it takes. And yes, that means moving to the coding stage even when the optimal design has yet to reveal itself. It's really a Kaizen process. Small accomplished improvements to the initial design pays dividends to all.

Coders, this means before plunging forward hacking away at the problem...ask for feedback of your idea and possible alternatives to the problem. It's important to starting coding down the right path. One that encompasses as much information of the problem as possible. This means you'll need to bring some information to the design table yourself. Perhaps a bit of discovery coding must take place to figure out what elements are involved and possible problems or bottlenecks in your proposed solution. This also helps in keeping the design discussion focused.

So, how does this apply to investing? Well, how many investors/traders do you know that invest without a plan? Without a design? An overriding investment philosophy? Just plunge ahead into the market?

These type of investors would be well served by stepping back a bit and design their investment model. Then ask for feedback of their proposed design. It's okay to perform some discovery trading first. Determining how the market handles your ideas. But, gather what you need and then design. Then invest with the goal of continually refining your design.

What about investors/traders who are afraid of the market? Have not found an investment model that is perfect? And refuse to step a toe in the market waters until they feel 100% comfortable in their design? Problem with this thinking is knowledge requires experience. And nothing is ever 100%...especially in the market. So, create your investment manifesto and then try it out. You can't improve upon something that isn't there to improve. And you can't design a successful investment strategy if you don't have market experience.

Later trades,

MT

Saturday, December 08, 2007

Breakthrough Programming

“The reasonable man adapts himself to the world: the unreasonable one persists in trying to adapt the world to himself. Therefore all progress depends on the unreasonable man.” -- George Bernard Shaw

Scott over at the scottberkun.com blog links to an amazing paper on Managing for Breakthroughs in Productivity. The article discusses how breakthroughs occur...and don't occur. My favorite quote...

For breakthroughs to occur, people must be given a chance to do work than can not be proven: ambition and risk are necessary for breakthroughs. If individuals are not trusted to take risks, breakthroughs are unlikely.

Spot on! Programming and risk goes hand in hand. As do programming and ambition.

To be an effective programmer you must have the ambition to automate tasks...all tasks. That's your job. If you spend your time manually putting widgets together...then you're not really programming.

And in order to automate tasks...a programmer has to take risks. Cause you are programming something that has never been done by a computer before. At least that computer. And most likely, never been programmed by you before.

MT

Friday, November 30, 2007

Programming Culture

Hat tip to Howard Lindzon for sharing this post...Software Engineering Tips for Startups

Alex's tips for startups is one of the best posts I've read in a long time on building a programming culture. I realize the focus is on tech startups. But, I believe his points are applicable to all companies with programming departments.

Some of my favorite quotes...

So the first tip is to always have a strong technical co-founder. Someone who shares or invents the business along with others, but also has the technical feet on the ground. Someone who can make sure the business is mapped onto technology correctly.

Avoid hiring managers...What you need are experienced technical people who love coding. These are going to be natural mentors for your younger engineers. Mentors and not managers.

Coding becomes sculpting. Starting with a shapeless form you continuously refine the code to satisfy the business requirements and make sure that the system is designed and implemented correctly.

And probably my favorite in regard to hiring programmers...

...candidates need to demonstrate love for simple and elegant code.

Simple and elegant code = Simple and elegant company.

I would also like to add one more trait to look for in programmers. Willingness to share knowledge. Evidence of this sharing trait in the interview and in past performance.

For example, all programmers develop tools to make their jobs easier. But, do they develop those tools for themselves only? Or for others to use as well? We all know...developing anything for others to use is the harder path to follow. But, without knowledge transfer, the wheel will be re-invented...again, again, and again.

Have a great weekend,

MT

Thursday, September 06, 2007

Wednesday, September 05, 2007

Monday, September 03, 2007

Recent Links 09/03/2007

ONLamp.com -- Numerical Python Basics

Numpy basics.
- post by taylortree

Programming in R

Finding Duplicate Elements in an Array :: Phil! Gregory Annotated

Interesting way to find duplicates in an array. Enjoyed the links on the pigeonhold principle and Floyd's cycle-finding algorithm.
- post by taylortree

Now, suppose that the array is of length n and only contains positive
integers less than n. We can be sure (by the pigeonhole principle)
that there is at least one duplicate.

So, how do we find the beginning of the cycle? The easiest approach is to
use Floyd's cycle-finding algorithm. It works roughly like this:
Start at the beginning of the sequence. Keep track of two values (call
them a_i and a_j). At
each step of the algorithm, move a_i one step
along the sequence, but move a_j two steps. Stop
when a_i = a_j.

TaylorTree