- Top 100 price gainers in your price series database;
- Top 10 most volatile stocks in your price series database;
- Top 5 longest running batch jobs in your operations arena;
- Any many more...
import heapq
class MaxItems(object):
    """
    Caches the max X items of whatever you're analyzing and
    returns a list containing only those max X values and
    associated items.
    """
    def __init__(self, size):
        self.size = size
        self._recs = []
    def push(self, value, items):
        if len(self._recs) < self.size:
            heapq.heappush(self._recs, (value, items))
        else:
            minitem = heapq.heappushpop(self._recs, (value, items))
    def items(self):
        return heapq.nlargest(self.size, self._recs)
Example call and results:
pricegains = []
pricegains.append({'symbol':'GOOG', 'gain':234.0})
pricegains.append({'symbol':'YHOO', 'gain':124.0})
pricegains.append({'symbol':'IBM', 'gain':1242.0})
pricegains.append({'symbol':'GE', 'gain':1800.0})
pricegains.append({'symbol':'ENE', 'gain':0.0})
pricegains.append({'symbol':'KREM', 'gain':12.0})
maxitems = MaxItems(3)
for row in pricegains:
    maxitems.push(row['gain'], row)
print maxitems.items()
----------------------------------------------------------
Results of call:
(1800.0, {'symbol': 'GE', 'gain': 1800.0})
(1242.0, {'symbol': 'IBM', 'gain': 1242.0})
(234.0, {'symbol': 'GOOG', 'gain': 234.0})
The heapq module works nicely in accomplishing the task.  What's ironic is Python's heapq module implements the min-heap algorithm which works out nicely and efficiently in determining the maximum values over a list.  But, does not work out so efficiently for determining the minimum values over a list.
I'll cover the MinItems class in another post. But, to give you a hint of what does work in collecting the minimum values over a list is one of the alternatives I explored in building the MaxItems class...
Alternative yet Inefficient version of MaxItems:
import bisect
class MaxItems2(object):
    """
    Caches the max X items of whatever you're analyzing and
    returns a list containing only those max X values and
    associated items.
    """
    def __init__(self, size):
        self.size = size
        self._recs = []
    def push(self, value, items):
        if len(self._recs) < self.size:
            bisect.insort(self._recs, (value, items))
        elif bisect.bisect(self._recs, (value, items)) > self.size:
            bisect.insort(self._recs, (value, items))
            minitem = self._recs.pop(0)
    def items(self):
        return sorted(self._recs, reverse=True)
MaxItems2 takes advantage of the bisect module and while it works great; performance is at a minimum 2x worse on average than using the heapq method.Test Code:
import random
pricegains = []
maxitems = MaxItems(100)
for x in xrange(500000):
    gain = random.uniform(1.0,500.0)
    maxitems.push(gain, ('This', 'is', 'Record'))
rows = maxitems.items()
Calling the above code with the wonderful timeit module produced the following results:- heapq method: Ten iterations finished in 1.90 seconds.
- bisect method: Ten iterations finished in 3.80 seconds.
MT
 
 


















