PyXR

c:\python24\lib \ heapq.py


0001 # -*- coding: Latin-1 -*-
0002 
0003 """Heap queue algorithm (a.k.a. priority queue).
0004 
0005 Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
0006 all k, counting elements from 0.  For the sake of comparison,
0007 non-existing elements are considered to be infinite.  The interesting
0008 property of a heap is that a[0] is always its smallest element.
0009 
0010 Usage:
0011 
0012 heap = []            # creates an empty heap
0013 heappush(heap, item) # pushes a new item on the heap
0014 item = heappop(heap) # pops the smallest item from the heap
0015 item = heap[0]       # smallest item on the heap without popping it
0016 heapify(x)           # transforms list into a heap, in-place, in linear time
0017 item = heapreplace(heap, item) # pops and returns smallest item, and adds
0018                                # new item; the heap size is unchanged
0019 
0020 Our API differs from textbook heap algorithms as follows:
0021 
0022 - We use 0-based indexing.  This makes the relationship between the
0023   index for a node and the indexes for its children slightly less
0024   obvious, but is more suitable since Python uses 0-based indexing.
0025 
0026 - Our heappop() method returns the smallest item, not the largest.
0027 
0028 These two make it possible to view the heap as a regular Python list
0029 without surprises: heap[0] is the smallest item, and heap.sort()
0030 maintains the heap invariant!
0031 """
0032 
0033 # Original code by Kevin O'Connor, augmented by Tim Peters and Raymond Hettinger
0034 
0035 __about__ = """Heap queues
0036 
0037 [explanation by François Pinard]
0038 
0039 Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for
0040 all k, counting elements from 0.  For the sake of comparison,
0041 non-existing elements are considered to be infinite.  The interesting
0042 property of a heap is that a[0] is always its smallest element.
0043 
0044 The strange invariant above is meant to be an efficient memory
0045 representation for a tournament.  The numbers below are `k', not a[k]:
0046 
0047                                    0
0048 
0049                   1                                 2
0050 
0051           3               4                5               6
0052 
0053       7       8       9       10      11      12      13      14
0054 
0055     15 16   17 18   19 20   21 22   23 24   25 26   27 28   29 30
0056 
0057 
0058 In the tree above, each cell `k' is topping `2*k+1' and `2*k+2'.  In
0059 an usual binary tournament we see in sports, each cell is the winner
0060 over the two cells it tops, and we can trace the winner down the tree
0061 to see all opponents s/he had.  However, in many computer applications
0062 of such tournaments, we do not need to trace the history of a winner.
0063 To be more memory efficient, when a winner is promoted, we try to
0064 replace it by something else at a lower level, and the rule becomes
0065 that a cell and the two cells it tops contain three different items,
0066 but the top cell "wins" over the two topped cells.
0067 
0068 If this heap invariant is protected at all time, index 0 is clearly
0069 the overall winner.  The simplest algorithmic way to remove it and
0070 find the "next" winner is to move some loser (let's say cell 30 in the
0071 diagram above) into the 0 position, and then percolate this new 0 down
0072 the tree, exchanging values, until the invariant is re-established.
0073 This is clearly logarithmic on the total number of items in the tree.
0074 By iterating over all items, you get an O(n ln n) sort.
0075 
0076 A nice feature of this sort is that you can efficiently insert new
0077 items while the sort is going on, provided that the inserted items are
0078 not "better" than the last 0'th element you extracted.  This is
0079 especially useful in simulation contexts, where the tree holds all
0080 incoming events, and the "win" condition means the smallest scheduled
0081 time.  When an event schedule other events for execution, they are
0082 scheduled into the future, so they can easily go into the heap.  So, a
0083 heap is a good structure for implementing schedulers (this is what I
0084 used for my MIDI sequencer :-).
0085 
0086 Various structures for implementing schedulers have been extensively
0087 studied, and heaps are good for this, as they are reasonably speedy,
0088 the speed is almost constant, and the worst case is not much different
0089 than the average case.  However, there are other representations which
0090 are more efficient overall, yet the worst cases might be terrible.
0091 
0092 Heaps are also very useful in big disk sorts.  You most probably all
0093 know that a big sort implies producing "runs" (which are pre-sorted
0094 sequences, which size is usually related to the amount of CPU memory),
0095 followed by a merging passes for these runs, which merging is often
0096 very cleverly organised[1].  It is very important that the initial
0097 sort produces the longest runs possible.  Tournaments are a good way
0098 to that.  If, using all the memory available to hold a tournament, you
0099 replace and percolate items that happen to fit the current run, you'll
0100 produce runs which are twice the size of the memory for random input,
0101 and much better for input fuzzily ordered.
0102 
0103 Moreover, if you output the 0'th item on disk and get an input which
0104 may not fit in the current tournament (because the value "wins" over
0105 the last output value), it cannot fit in the heap, so the size of the
0106 heap decreases.  The freed memory could be cleverly reused immediately
0107 for progressively building a second heap, which grows at exactly the
0108 same rate the first heap is melting.  When the first heap completely
0109 vanishes, you switch heaps and start a new run.  Clever and quite
0110 effective!
0111 
0112 In a word, heaps are useful memory structures to know.  I use them in
0113 a few applications, and I think it is good to keep a `heap' module
0114 around. :-)
0115 
0116 --------------------
0117 [1] The disk balancing algorithms which are current, nowadays, are
0118 more annoying than clever, and this is a consequence of the seeking
0119 capabilities of the disks.  On devices which cannot seek, like big
0120 tape drives, the story was quite different, and one had to be very
0121 clever to ensure (far in advance) that each tape movement will be the
0122 most effective possible (that is, will best participate at
0123 "progressing" the merge).  Some tapes were even able to read
0124 backwards, and this was also used to avoid the rewinding time.
0125 Believe me, real good tape sorts were quite spectacular to watch!
0126 From all times, sorting has always been a Great Art! :-)
0127 """
0128 
0129 __all__ = ['heappush', 'heappop', 'heapify', 'heapreplace', 'nlargest',
0130            'nsmallest']
0131 
0132 from itertools import islice, repeat
0133 import bisect
0134 
0135 def heappush(heap, item):
0136     """Push item onto heap, maintaining the heap invariant."""
0137     heap.append(item)
0138     _siftdown(heap, 0, len(heap)-1)
0139 
0140 def heappop(heap):
0141     """Pop the smallest item off the heap, maintaining the heap invariant."""
0142     lastelt = heap.pop()    # raises appropriate IndexError if heap is empty
0143     if heap:
0144         returnitem = heap[0]
0145         heap[0] = lastelt
0146         _siftup(heap, 0)
0147     else:
0148         returnitem = lastelt
0149     return returnitem
0150 
0151 def heapreplace(heap, item):
0152     """Pop and return the current smallest value, and add the new item.
0153 
0154     This is more efficient than heappop() followed by heappush(), and can be
0155     more appropriate when using a fixed-size heap.  Note that the value
0156     returned may be larger than item!  That constrains reasonable uses of
0157     this routine unless written as part of a conditional replacement:
0158 
0159         if item > heap[0]:
0160             item = heapreplace(heap, item)
0161     """
0162     returnitem = heap[0]    # raises appropriate IndexError if heap is empty
0163     heap[0] = item
0164     _siftup(heap, 0)
0165     return returnitem
0166 
0167 def heapify(x):
0168     """Transform list into a heap, in-place, in O(len(heap)) time."""
0169     n = len(x)
0170     # Transform bottom-up.  The largest index there's any point to looking at
0171     # is the largest with a child index in-range, so must have 2*i + 1 < n,
0172     # or i < (n-1)/2.  If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so
0173     # j-1 is the largest, which is n//2 - 1.  If n is odd = 2*j+1, this is
0174     # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1.
0175     for i in reversed(xrange(n//2)):
0176         _siftup(x, i)
0177 
0178 def nlargest(iterable, n):
0179     """Find the n largest elements in a dataset.
0180 
0181     Equivalent to:  sorted(iterable, reverse=True)[:n]
0182     """
0183     it = iter(iterable)
0184     result = list(islice(it, n))
0185     if not result:
0186         return result
0187     heapify(result)
0188     _heapreplace = heapreplace
0189     sol = result[0]         # sol --> smallest of the nlargest
0190     for elem in it:
0191         if elem <= sol:
0192             continue
0193         _heapreplace(result, elem)
0194         sol = result[0]
0195     result.sort(reverse=True)
0196     return result
0197 
0198 def nsmallest(iterable, n):
0199     """Find the n smallest elements in a dataset.
0200 
0201     Equivalent to:  sorted(iterable)[:n]
0202     """
0203     if hasattr(iterable, '__len__') and n * 10 <= len(iterable):
0204         # For smaller values of n, the bisect method is faster than a minheap.
0205         # It is also memory efficient, consuming only n elements of space.
0206         it = iter(iterable)
0207         result = sorted(islice(it, 0, n))
0208         if not result:
0209             return result
0210         insort = bisect.insort
0211         pop = result.pop
0212         los = result[-1]    # los --> Largest of the nsmallest
0213         for elem in it:
0214             if los <= elem:
0215                 continue
0216             insort(result, elem)
0217             pop()
0218             los = result[-1]
0219         return result
0220     # An alternative approach manifests the whole iterable in memory but
0221     # saves comparisons by heapifying all at once.  Also, saves time
0222     # over bisect.insort() which has O(n) data movement time for every
0223     # insertion.  Finding the n smallest of an m length iterable requires
0224     #    O(m) + O(n log m) comparisons.
0225     h = list(iterable)
0226     heapify(h)
0227     return map(heappop, repeat(h, min(n, len(h))))
0228 
0229 # 'heap' is a heap at all indices >= startpos, except possibly for pos.  pos
0230 # is the index of a leaf with a possibly out-of-order value.  Restore the
0231 # heap invariant.
0232 def _siftdown(heap, startpos, pos):
0233     newitem = heap[pos]
0234     # Follow the path to the root, moving parents down until finding a place
0235     # newitem fits.
0236     while pos > startpos:
0237         parentpos = (pos - 1) >> 1
0238         parent = heap[parentpos]
0239         if parent <= newitem:
0240             break
0241         heap[pos] = parent
0242         pos = parentpos
0243     heap[pos] = newitem
0244 
0245 # The child indices of heap index pos are already heaps, and we want to make
0246 # a heap at index pos too.  We do this by bubbling the smaller child of
0247 # pos up (and so on with that child's children, etc) until hitting a leaf,
0248 # then using _siftdown to move the oddball originally at index pos into place.
0249 #
0250 # We *could* break out of the loop as soon as we find a pos where newitem <=
0251 # both its children, but turns out that's not a good idea, and despite that
0252 # many books write the algorithm that way.  During a heap pop, the last array
0253 # element is sifted in, and that tends to be large, so that comparing it
0254 # against values starting from the root usually doesn't pay (= usually doesn't
0255 # get us out of the loop early).  See Knuth, Volume 3, where this is
0256 # explained and quantified in an exercise.
0257 #
0258 # Cutting the # of comparisons is important, since these routines have no
0259 # way to extract "the priority" from an array element, so that intelligence
0260 # is likely to be hiding in custom __cmp__ methods, or in array elements
0261 # storing (priority, record) tuples.  Comparisons are thus potentially
0262 # expensive.
0263 #
0264 # On random arrays of length 1000, making this change cut the number of
0265 # comparisons made by heapify() a little, and those made by exhaustive
0266 # heappop() a lot, in accord with theory.  Here are typical results from 3
0267 # runs (3 just to demonstrate how small the variance is):
0268 #
0269 # Compares needed by heapify     Compares needed by 1000 heappops
0270 # --------------------------     --------------------------------
0271 # 1837 cut to 1663               14996 cut to 8680
0272 # 1855 cut to 1659               14966 cut to 8678
0273 # 1847 cut to 1660               15024 cut to 8703
0274 #
0275 # Building the heap by using heappush() 1000 times instead required
0276 # 2198, 2148, and 2219 compares:  heapify() is more efficient, when
0277 # you can use it.
0278 #
0279 # The total compares needed by list.sort() on the same lists were 8627,
0280 # 8627, and 8632 (this should be compared to the sum of heapify() and
0281 # heappop() compares):  list.sort() is (unsurprisingly!) more efficient
0282 # for sorting.
0283 
0284 def _siftup(heap, pos):
0285     endpos = len(heap)
0286     startpos = pos
0287     newitem = heap[pos]
0288     # Bubble up the smaller child until hitting a leaf.
0289     childpos = 2*pos + 1    # leftmost child position
0290     while childpos < endpos:
0291         # Set childpos to index of smaller child.
0292         rightpos = childpos + 1
0293         if rightpos < endpos and heap[rightpos] <= heap[childpos]:
0294             childpos = rightpos
0295         # Move the smaller child up.
0296         heap[pos] = heap[childpos]
0297         pos = childpos
0298         childpos = 2*pos + 1
0299     # The leaf at pos is empty now.  Put newitem there, and bubble it up
0300     # to its final resting place (by sifting its parents down).
0301     heap[pos] = newitem
0302     _siftdown(heap, startpos, pos)
0303 
0304 # If available, use C implementation
0305 try:
0306     from _heapq import heappush, heappop, heapify, heapreplace, nlargest, nsmallest
0307 except ImportError:
0308     pass
0309 
0310 if __name__ == "__main__":
0311     # Simple sanity test
0312     heap = []
0313     data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0]
0314     for item in data:
0315         heappush(heap, item)
0316     sort = []
0317     while heap:
0318         sort.append(heappop(heap))
0319     print sort
0320
Generated by PyXR 0.9.4