To Previous Chapter To Table of Contents To Bottom of Page To Next Chapter

CS421 - Advanced Data Structures and Algorithm Development - Chapter 4 - Sorting

Problem: Assume: Goal: minimize the numbers of comparisons and swaps
  1. Strategies
      four important rankings
    • two parts are unranked relative to each other:
      • one part is sorted (insert sort - linear insert sort, binary insert sort, insert sort, jump insert sort)
      • both parts are sorted (merge sort - merge sort)
    • two parts are ranked relative to each other:
      • neither part is sorted (split sort - quick sort)
      • one part is sorted (select sort - bubble sort, linear select sort, heap sort)
    merge sort and split sort strategies suggest recursion, while insert and select sorts suggest interation
      other ways to sort
    • count sort - compare every pair of elements, and arrange elements based on number of elements they're less than (T(n2)
    • swap sort - repeatedly swap any two elements that are our of order relative to each other
    count and swap sorts least efficient;
    insert and select sorts moderately efficient;
    merge and split sorts most efficient

    advantage of insert over select sort is that we sort online
    advantage of select sort over insert sort is that we always find the i largest elements even before the sort ends

  2. Swap Sorts
    	BubbleSort (List, lower, upper)
    	{ Sort List[lower..upper] in increasing order.
    	  upper >= lower > 0 }
    	
    	for unsorted from upper downto lower + 1
    		for index from lower to unsorted - 1
    			if List[index] > List[index+1]
    				List[index] <-> List[index+1]
    	

    The ith scan costs up to n-i comparisons and there are n-1 scans so Bubble Sort can use up to n(n-1) = O(n2) comparisons and swaps

    BubbleSort is good only when the input is nearly sorted to begin with

    • The Average Cost
      inversion - number of smaller elements initially to the right of each element (The sum of all inversions is a measure of how much work BubbleSort does - one inversion is removed with each swap)

      The worst cost occurs when the input is in decreasing order, and the best cost occurs when the input is increasing order.
      The number of inversions in the worst cost is (n 2)
      The number of inversions in the best cost is 0
      Therefore the average number of inversions is (n 2) / 2 and is O(n2)

  3. Insert Sorts
    • Linear Insert Sort
      	LinearInsertSort (List, lower, upper)
      	{ Sort List[lower..upper] in increasing order.
      	  upper >= lower > 0 }
      	
      	for sorted from lower + 1 to upper
      		placeholder = List[sorted]
      		index = sorted - 1
      		while index >= lower and List[index] > placeholder
      				List[index+1] =  List[index]
      				index--
      		List[index+1] = placeholder
      	

      The ith scan costs up to i-1 comparisons and swaps and there are n-1 scans so LinearInsertSort can use up to n(n-1)/2 = O(n2) comparisons and swaps

      If all n! inputs are equally likely, the LinearInsertSort is also O(n2) on average; number of swaps = number of comparisons in both average and worst cases

    • Binary Insert Sort (use binary search instead of linear search)
      maximum number of comparisons = n ceil[lg n] - 2ceil[lg n] + 1 = O( n lg n)
      still requires O(n2) swaps in the worst case
      to reduce the O(n2) swaps to O(n lg n) requires more storage and more overhead per insertion (tree sort)
  4. Select Sorts
    • Linear Select Sort
      	LinearSelectSort (List, lower, upper)
      	{ Sort List[lower..upper] in increasing order.
      	  upper >= lower > 0 }
      	
      	for unsorted from upper downto lower + 1
      		nextmax = unsorted
      		for index from lower to unsorted - 1
      			if List[index] > List[nextmax]
      				nextmax = index
      				List[unsorted] <-> List[nextmax]
      	

      The ith scan costs up to n-i comparisons and there are n-1 scans so LinearSelectSort can use up to n(n-1)/2 = O(n2) comparisons, but only O(n) swaps
      The average number of comparisons and swaps are the same as the worst cost

      Both LinearInsertSort and LinearSelectSort are easy to code

    • Heap Sort
      A heap is a rooted binary tree such that the value of any node is at least as large as the values of its children
      The root is the largest element, replace it with one of the leaves and "reheap"
      Two problems:
      • create a heap efficiently
      • fix the heap efficiently("reheap")
      Given a height m heap, there will be 2m comparisons for each of the n elements or a total of 2nm comparisons.
      A height m binary tree has at most 2m+1 - 1 nodes so the shortest possible n-node binary tree has height at least floor[lg n].
      A complete binary tree is an ordered, rooted binary tree in which every non-leaf has two children and with all its leaves on one level.
      A left-complete binary tree is a complete binary tree with zero or more if its rightmost leaves deleted.
      A n-node left-complete binary tree has height floor[lg n] and no n-node binary tree is shorter.

      Two ways to build a heap:

      • incrementally add leaves to an initially empty heap = O(n lg n)
      • recursively create two subheaps and add a root
        f(n) <= 2 f(n/2) + 2 lg n = 4n - 2 lg n - 4 == O(n)

        Build a heap as a left-complete binary tree (no more than 2n - 4 comparisons)
        Repeatedly remove the root and replace it with any leaf and "reheap" until the tree is empty ( O(n lg n) comparisons)

    • Programming (represent the left-complete binary tree implicitly in an array using level order)
      leftchild(i) = 2i - lower + 1
      rightchild(i) = 2i - lower + 2
      parent(i) = floor[(i + lower - 1)/2]
      before "reheap", swap root with the rightmost leaf

      	HeapSort (List, lower, upper)
      	{ Sort List[lower..upper] in increasing order.
      	  First build a heap, then repeatedly replace the root with the
      	  next rightmost leaf using "trickle-down"
      	  upper >= lower > 0 }
      	
      	smallestparent = [(upper + lower - 1) / 2]
      	for index from smallestparent downto lower 
      		FixHeap (List, index, upper)
      		
      	for index from upper downto lower+1
      		List[lower] <-> List[index]
      		FixHeap (List, lower, index-1)
      	

      	FixHeap(List, low, high)
      	{ Create a subheap within List[lower..upper] in positions low..high
      	  List[low..high] is already a heap except that the root node, List[low],
      	  may be smaller than its children. lower and upper are global variables;
      	  they are used in the functions leftchild and rightchild.
      	  upper >= high >= low >= lower > 0 }
      
      	 root = low
      	 if high >= leftchild(root)
      	 	if high >= rightchild(root)
      			if List[rightchild(root)] > List[leftchild(root)]
      				largestchild = rightchild(root)
      			else
      				largestchild = leftchild(root)
      		if List[largestchild] > List[root]
      			List[largestchild] <--> List[root]
      			FixHeap(List, largestchild, upper)
      	
    • A Theoretical Improvement
      Use a binary search to find the position to insert a new element
  5. Merge Sorts
    	LinearMergeSort (List, lower, upper)
    	{ Sort List[lower..upper] in increasing order.
    	  upper >= lower > 0 }
    	
    	if upper > lower
    		mid = [(lower+upper)/2]
    		LinearMergeSort(List, lower, mid)
    		LinearMergeSort(List, mid+1, upper)
    		LinearMerge(List, lower, mid, upper)
    		
    	LinearMerge (List, lower, mid, upper)
    	{ Merge List[lower..mid] and List[mid+1..upper],
    	  where each is sorted in increasing order.
    	  Use the array Save[lower..upper] as extra storage.
    	  upper >= mid >= lower > 0 }
    	  
    	next = lower
    	lower1 = lower
    	lower2 = mid + 1
    	while (mid >= lower1) and (upper >= lower2)
    		if List[lower1] > List[lower2]
    			Save[next] = List[lower2]
    			lower2++
    		else
    			Save[next] = List[lower1]
    			lower1++
    		next++
    	if mid >= lower1
    		Save[next..upper] = List[lower1..mid]
    	else
    		Save[next..upper] = List[lower2..upper]
    	List[lower..upper] = Save[lower..upper]
    	

    f(n) = n ceil[lg n] - 2ceil[lg n] + 1 = O(n lg n)
    requires twice as much storage
    copies every two sublists to merge them

  6. Split Sorts
    • QuickSort
      	QuickSort (List, lower, upper)
      	{ Sort List[lower..upper] in increasing order.
      	  upper >= lower > 0 }
      
      	if upper > lower
      		index = uniform(lower, upper)
      		pivotloc = Split(List, lower, upper, index)
      		QuickSort(List, lower, pivotloc-1)
      		QuickSort(List, pivotloc+1, upper)	
      	

      worst expected cost = O(n2) comparisons
      Split does n-1 comparisons and QuickSort is O( n lg n) on average

    • Programming -
      • Use LinearSelectSort or LinearInsertSort for "short" sublists to minimize overhead due to recursion
      • extra space used for "stack frame" doing recursion
      • How do we select index
  7. Lower Bounds on Sorting
    • finding the largest & smallest of n elements = building a poset starting from the poset with no relations (requires ceil[3n/2] - 2 comparisons which can be thought of as a lower bound on sorting What is the optimal cost for sorting?
    • Information Theory
      • information lower bound - if an algorithm can use only binary decisions to distinquish between n possibilities then it must use at least lg n such decisions on average.
      • Given an experiment with n disjoint events having probabilities p1, p2, ... pn, the entropy of the experiment is
        H(p1,p2,..,pn) = - SUM (pk lg pk)
      • entropy of a sum = sum of the entropies
      • If n is unsorted, we need at least n comparisons to search it and from information theory, we need at least ceil[lg(n+1)] bits.
    • Worst Cost
      • from information theory, ceil[lg n!] is a lower bound on the worst number of comparisons necessary to sort n things. But lg n! = Q(n lg n) [see page 264 for proof]
      • In the worst case sorting requires W( n lg n) comparisons [see page 265-267 for proof]
    • Average Cost (since all permutations are equally likely - sorting costs W(n lg n) comparisons on average
  8. Optimal Sorting
    What is the minimum number of comparisons needed to sort?
    binary search is optimal for searching, but binary insert sort is only asymptotically optimal for sorting.
    • Binary Merging
      Merge(List1, List2, n, m)
      	{Merge List1[1..n] and List2[1..m], where each is sorted in increasing order
      	 Return the merged array List[1..(n+m)].
      	 Use next to keep track of the next free location in List[1..(n+m)]
      	 n >= m > 0 }
      	 
      	 next = n
      	 while n > 0 and m > 0 
      	 	if n > m then
      			next = BinaryMerge(List1, List2, n, m, List, next)
      		else
      			next = BinaryMerge(List2, List1, m , n, List, next)
      			
      	if n = 0 then
      		List[(n+1)..(n+m)] = List2[1..m]
      	else
      		List[(m+1)..(n+m)] = List1[1..n]
      	return List
      	
      BinaryMerge(Lista, Listb, n, m, List, next)
      	{Merge Lista[1..n] and Listb[1..m], where each is sorted in increasing order
      	 List[next] is the next free location counting from the end of List.
      	 n >= m > 0 }
      	 
      	 k = 2floor[lg(n/m)] - 1
      	 if Lista[n-k] >= Listb[m] then
      	 	List[(next-k)..next] = Lista[(n-k)..n]
      		n = n-k-1
      		next = next-k-1
      	 else
      	 	l = BinarySearch(Lista, n-k, n, Listb[m])
      		List[(next-l)..next] = Lista[(l+1)..n]
      		next = next-n+l
      		List[next] = Listb[m]
      		n = l
      		m--
      		next--
      	return next 
      
  9. Changing the Model
    • Distribute Sorts
      RadixSort(List, lower, upper, digits)
      	{ Sort the integers List[lower..upper] in increasing order.
      	  Each integer is a decimal number in the range 0..10digits
      	  Use lists Sublist[i] (digits >= i >= 1) as temporary storage.
      	  upper >= lower > 0; digits > 0. }
      	  
      	  for i = 1 to digits
      	  	make Sublist[i] empty
      		
      	  for i = digits downto 1
      	  	for next = lower to upper
      			k = ith digit of List[next]
      			add List[next] to Sublist[k]
      		make List empty
      		for j = 1 to digits
      			add Sublist[j] to List
      

      O((n+m)k) where k = # of digits; 0 >= numbers >= m
  10. Coda - Sorting Out Sorting

To Previous Chapter To Table of Contents To top of page To Next Chapter