0001 # -*- coding: Latin-1 -*- 0002 0003 """Heap queue algorithm (a.k.a. priority queue). 0004 0005 Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for 0006 all k, counting elements from 0. For the sake of comparison, 0007 non-existing elements are considered to be infinite. The interesting 0008 property of a heap is that a[0] is always its smallest element. 0009 0010 Usage: 0011 0012 heap = [] # creates an empty heap 0013 heappush(heap, item) # pushes a new item on the heap 0014 item = heappop(heap) # pops the smallest item from the heap 0015 item = heap[0] # smallest item on the heap without popping it 0016 heapify(x) # transforms list into a heap, in-place, in linear time 0017 item = heapreplace(heap, item) # pops and returns smallest item, and adds 0018 # new item; the heap size is unchanged 0019 0020 Our API differs from textbook heap algorithms as follows: 0021 0022 - We use 0-based indexing. This makes the relationship between the 0023 index for a node and the indexes for its children slightly less 0024 obvious, but is more suitable since Python uses 0-based indexing. 0025 0026 - Our heappop() method returns the smallest item, not the largest. 0027 0028 These two make it possible to view the heap as a regular Python list 0029 without surprises: heap[0] is the smallest item, and heap.sort() 0030 maintains the heap invariant! 0031 """ 0032 0033 # Original code by Kevin O'Connor, augmented by Tim Peters and Raymond Hettinger 0034 0035 __about__ = """Heap queues 0036 0037 [explanation by François Pinard] 0038 0039 Heaps are arrays for which a[k] <= a[2*k+1] and a[k] <= a[2*k+2] for 0040 all k, counting elements from 0. For the sake of comparison, 0041 non-existing elements are considered to be infinite. The interesting 0042 property of a heap is that a[0] is always its smallest element. 0043 0044 The strange invariant above is meant to be an efficient memory 0045 representation for a tournament. The numbers below are `k', not a[k]: 0046 0047 0 0048 0049 1 2 0050 0051 3 4 5 6 0052 0053 7 8 9 10 11 12 13 14 0054 0055 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 0056 0057 0058 In the tree above, each cell `k' is topping `2*k+1' and `2*k+2'. In 0059 an usual binary tournament we see in sports, each cell is the winner 0060 over the two cells it tops, and we can trace the winner down the tree 0061 to see all opponents s/he had. However, in many computer applications 0062 of such tournaments, we do not need to trace the history of a winner. 0063 To be more memory efficient, when a winner is promoted, we try to 0064 replace it by something else at a lower level, and the rule becomes 0065 that a cell and the two cells it tops contain three different items, 0066 but the top cell "wins" over the two topped cells. 0067 0068 If this heap invariant is protected at all time, index 0 is clearly 0069 the overall winner. The simplest algorithmic way to remove it and 0070 find the "next" winner is to move some loser (let's say cell 30 in the 0071 diagram above) into the 0 position, and then percolate this new 0 down 0072 the tree, exchanging values, until the invariant is re-established. 0073 This is clearly logarithmic on the total number of items in the tree. 0074 By iterating over all items, you get an O(n ln n) sort. 0075 0076 A nice feature of this sort is that you can efficiently insert new 0077 items while the sort is going on, provided that the inserted items are 0078 not "better" than the last 0'th element you extracted. This is 0079 especially useful in simulation contexts, where the tree holds all 0080 incoming events, and the "win" condition means the smallest scheduled 0081 time. When an event schedule other events for execution, they are 0082 scheduled into the future, so they can easily go into the heap. So, a 0083 heap is a good structure for implementing schedulers (this is what I 0084 used for my MIDI sequencer :-). 0085 0086 Various structures for implementing schedulers have been extensively 0087 studied, and heaps are good for this, as they are reasonably speedy, 0088 the speed is almost constant, and the worst case is not much different 0089 than the average case. However, there are other representations which 0090 are more efficient overall, yet the worst cases might be terrible. 0091 0092 Heaps are also very useful in big disk sorts. You most probably all 0093 know that a big sort implies producing "runs" (which are pre-sorted 0094 sequences, which size is usually related to the amount of CPU memory), 0095 followed by a merging passes for these runs, which merging is often 0096 very cleverly organised[1]. It is very important that the initial 0097 sort produces the longest runs possible. Tournaments are a good way 0098 to that. If, using all the memory available to hold a tournament, you 0099 replace and percolate items that happen to fit the current run, you'll 0100 produce runs which are twice the size of the memory for random input, 0101 and much better for input fuzzily ordered. 0102 0103 Moreover, if you output the 0'th item on disk and get an input which 0104 may not fit in the current tournament (because the value "wins" over 0105 the last output value), it cannot fit in the heap, so the size of the 0106 heap decreases. The freed memory could be cleverly reused immediately 0107 for progressively building a second heap, which grows at exactly the 0108 same rate the first heap is melting. When the first heap completely 0109 vanishes, you switch heaps and start a new run. Clever and quite 0110 effective! 0111 0112 In a word, heaps are useful memory structures to know. I use them in 0113 a few applications, and I think it is good to keep a `heap' module 0114 around. :-) 0115 0116 -------------------- 0117 [1] The disk balancing algorithms which are current, nowadays, are 0118 more annoying than clever, and this is a consequence of the seeking 0119 capabilities of the disks. On devices which cannot seek, like big 0120 tape drives, the story was quite different, and one had to be very 0121 clever to ensure (far in advance) that each tape movement will be the 0122 most effective possible (that is, will best participate at 0123 "progressing" the merge). Some tapes were even able to read 0124 backwards, and this was also used to avoid the rewinding time. 0125 Believe me, real good tape sorts were quite spectacular to watch! 0126 From all times, sorting has always been a Great Art! :-) 0127 """ 0128 0129 __all__ = ['heappush', 'heappop', 'heapify', 'heapreplace', 'nlargest', 0130 'nsmallest'] 0131 0132 from itertools import islice, repeat 0133 import bisect 0134 0135 def heappush(heap, item): 0136 """Push item onto heap, maintaining the heap invariant.""" 0137 heap.append(item) 0138 _siftdown(heap, 0, len(heap)-1) 0139 0140 def heappop(heap): 0141 """Pop the smallest item off the heap, maintaining the heap invariant.""" 0142 lastelt = heap.pop() # raises appropriate IndexError if heap is empty 0143 if heap: 0144 returnitem = heap[0] 0145 heap[0] = lastelt 0146 _siftup(heap, 0) 0147 else: 0148 returnitem = lastelt 0149 return returnitem 0150 0151 def heapreplace(heap, item): 0152 """Pop and return the current smallest value, and add the new item. 0153 0154 This is more efficient than heappop() followed by heappush(), and can be 0155 more appropriate when using a fixed-size heap. Note that the value 0156 returned may be larger than item! That constrains reasonable uses of 0157 this routine unless written as part of a conditional replacement: 0158 0159 if item > heap[0]: 0160 item = heapreplace(heap, item) 0161 """ 0162 returnitem = heap[0] # raises appropriate IndexError if heap is empty 0163 heap[0] = item 0164 _siftup(heap, 0) 0165 return returnitem 0166 0167 def heapify(x): 0168 """Transform list into a heap, in-place, in O(len(heap)) time.""" 0169 n = len(x) 0170 # Transform bottom-up. The largest index there's any point to looking at 0171 # is the largest with a child index in-range, so must have 2*i + 1 < n, 0172 # or i < (n-1)/2. If n is even = 2*j, this is (2*j-1)/2 = j-1/2 so 0173 # j-1 is the largest, which is n//2 - 1. If n is odd = 2*j+1, this is 0174 # (2*j+1-1)/2 = j so j-1 is the largest, and that's again n//2-1. 0175 for i in reversed(xrange(n//2)): 0176 _siftup(x, i) 0177 0178 def nlargest(iterable, n): 0179 """Find the n largest elements in a dataset. 0180 0181 Equivalent to: sorted(iterable, reverse=True)[:n] 0182 """ 0183 it = iter(iterable) 0184 result = list(islice(it, n)) 0185 if not result: 0186 return result 0187 heapify(result) 0188 _heapreplace = heapreplace 0189 sol = result[0] # sol --> smallest of the nlargest 0190 for elem in it: 0191 if elem <= sol: 0192 continue 0193 _heapreplace(result, elem) 0194 sol = result[0] 0195 result.sort(reverse=True) 0196 return result 0197 0198 def nsmallest(iterable, n): 0199 """Find the n smallest elements in a dataset. 0200 0201 Equivalent to: sorted(iterable)[:n] 0202 """ 0203 if hasattr(iterable, '__len__') and n * 10 <= len(iterable): 0204 # For smaller values of n, the bisect method is faster than a minheap. 0205 # It is also memory efficient, consuming only n elements of space. 0206 it = iter(iterable) 0207 result = sorted(islice(it, 0, n)) 0208 if not result: 0209 return result 0210 insort = bisect.insort 0211 pop = result.pop 0212 los = result[-1] # los --> Largest of the nsmallest 0213 for elem in it: 0214 if los <= elem: 0215 continue 0216 insort(result, elem) 0217 pop() 0218 los = result[-1] 0219 return result 0220 # An alternative approach manifests the whole iterable in memory but 0221 # saves comparisons by heapifying all at once. Also, saves time 0222 # over bisect.insort() which has O(n) data movement time for every 0223 # insertion. Finding the n smallest of an m length iterable requires 0224 # O(m) + O(n log m) comparisons. 0225 h = list(iterable) 0226 heapify(h) 0227 return map(heappop, repeat(h, min(n, len(h)))) 0228 0229 # 'heap' is a heap at all indices >= startpos, except possibly for pos. pos 0230 # is the index of a leaf with a possibly out-of-order value. Restore the 0231 # heap invariant. 0232 def _siftdown(heap, startpos, pos): 0233 newitem = heap[pos] 0234 # Follow the path to the root, moving parents down until finding a place 0235 # newitem fits. 0236 while pos > startpos: 0237 parentpos = (pos - 1) >> 1 0238 parent = heap[parentpos] 0239 if parent <= newitem: 0240 break 0241 heap[pos] = parent 0242 pos = parentpos 0243 heap[pos] = newitem 0244 0245 # The child indices of heap index pos are already heaps, and we want to make 0246 # a heap at index pos too. We do this by bubbling the smaller child of 0247 # pos up (and so on with that child's children, etc) until hitting a leaf, 0248 # then using _siftdown to move the oddball originally at index pos into place. 0249 # 0250 # We *could* break out of the loop as soon as we find a pos where newitem <= 0251 # both its children, but turns out that's not a good idea, and despite that 0252 # many books write the algorithm that way. During a heap pop, the last array 0253 # element is sifted in, and that tends to be large, so that comparing it 0254 # against values starting from the root usually doesn't pay (= usually doesn't 0255 # get us out of the loop early). See Knuth, Volume 3, where this is 0256 # explained and quantified in an exercise. 0257 # 0258 # Cutting the # of comparisons is important, since these routines have no 0259 # way to extract "the priority" from an array element, so that intelligence 0260 # is likely to be hiding in custom __cmp__ methods, or in array elements 0261 # storing (priority, record) tuples. Comparisons are thus potentially 0262 # expensive. 0263 # 0264 # On random arrays of length 1000, making this change cut the number of 0265 # comparisons made by heapify() a little, and those made by exhaustive 0266 # heappop() a lot, in accord with theory. Here are typical results from 3 0267 # runs (3 just to demonstrate how small the variance is): 0268 # 0269 # Compares needed by heapify Compares needed by 1000 heappops 0270 # -------------------------- -------------------------------- 0271 # 1837 cut to 1663 14996 cut to 8680 0272 # 1855 cut to 1659 14966 cut to 8678 0273 # 1847 cut to 1660 15024 cut to 8703 0274 # 0275 # Building the heap by using heappush() 1000 times instead required 0276 # 2198, 2148, and 2219 compares: heapify() is more efficient, when 0277 # you can use it. 0278 # 0279 # The total compares needed by list.sort() on the same lists were 8627, 0280 # 8627, and 8632 (this should be compared to the sum of heapify() and 0281 # heappop() compares): list.sort() is (unsurprisingly!) more efficient 0282 # for sorting. 0283 0284 def _siftup(heap, pos): 0285 endpos = len(heap) 0286 startpos = pos 0287 newitem = heap[pos] 0288 # Bubble up the smaller child until hitting a leaf. 0289 childpos = 2*pos + 1 # leftmost child position 0290 while childpos < endpos: 0291 # Set childpos to index of smaller child. 0292 rightpos = childpos + 1 0293 if rightpos < endpos and heap[rightpos] <= heap[childpos]: 0294 childpos = rightpos 0295 # Move the smaller child up. 0296 heap[pos] = heap[childpos] 0297 pos = childpos 0298 childpos = 2*pos + 1 0299 # The leaf at pos is empty now. Put newitem there, and bubble it up 0300 # to its final resting place (by sifting its parents down). 0301 heap[pos] = newitem 0302 _siftdown(heap, startpos, pos) 0303 0304 # If available, use C implementation 0305 try: 0306 from _heapq import heappush, heappop, heapify, heapreplace, nlargest, nsmallest 0307 except ImportError: 0308 pass 0309 0310 if __name__ == "__main__": 0311 # Simple sanity test 0312 heap = [] 0313 data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0] 0314 for item in data: 0315 heappush(heap, item) 0316 sort = [] 0317 while heap: 0318 sort.append(heappop(heap)) 0319 print sort 0320
Generated by PyXR 0.9.4