To Previous Chapter To Table of Contents To Bottom of Page To Next Chapter

CS421 - Advanced Data Structures and Algorithm Development - Chapter 3: Selecting

  1. Rankings
    • asymmetry - if a precedes b then b cannot precede a
    • transitivity - if a precedes b and precedes c then a precedes c
    • partial order - transitive asymmetric relation
    • PAUSE: Is "sister of" a partial order? "cousin of"?
    • relation on a set is a set of ordered pairs of elements of the set
    • If every pair of elements in a set S is relatable by the p.o. relation R, then R is a linear order and S is orderable
    • PAUSE: Is "funnier than" a linear order
    • poset - partially ordered set
    • One poset is a subposet of another if all its relations are contained within the other
    • The poset containing n unrelated elements, called singletons, is a subposet of every poset with n elements
    • voter's paradox (from game theory - decision making in uncertain, risky, or competitive situations
  2. Finding the Best
    FindMax (List, upper, lower)
    	{ Find the index of the largest of List[lower, upper];
    	  upper >= lower > 0	}
    	max = lower
    	for index = lower+1 to upper
    		if List[index] > List[max] then
    			max = index
    	return max
    

    n-1 comparisons
    max poset - any poset solving the problem of finding the best
    must connect every node to every other node; connecting n nodes requires at least n - 1 connections ==> FindMax has optimal worst cost of n-1
    • The Average Cost - (n-1)/2????
      consider the n! equally likely permutations of the n elements
      probability that L[i] is better than all previous elements is 1/i
      The average number of assignments is 1 + sum(P(L[i] > L[j] for all j < i) x 1 = 1 + sum(1/i) {i from 2 to n} [nth harmonic number
      H2k = 1 + 1/2 + 1/3 + 1/4 + ... + 1/2k
      = 1 + 1/2 + 2/4 + 4/8 + ... + 2k-1/2k
      = 1 + k/2
      BUT
      H2k = 1 + 1/2 + 1/3 + 1/4 + ... + 1/2k
      = 1 + 2/2 + 4/4 + 8/8 + ... + 2k-1/2k + 1/2k
      = k + 1/2k
      k + 1/2k >= H2k >= k/2 + 1
      Since 2ceil(lg n) >= n >= 2floor(lg n)
      ceil(lg n) + 1/2ceil(lg n) >= Hn >= floor(lg n) /2 + 1
      Hence Hn = Q(lg n)
    • Statistics - know the average number of assignments, but not the likely variation
      • statistics - area of mathematics that analyzes, summarizes, and interprets numerical data
      • variance - average of the square of the differences of the numbers of assignments from the average
      • standard deviation - square root of the variance
      • random variable - function associated with an experiment that can take on numerical values of some property of the experiment
      • average of a random variable X = m and the standard deviation is s
      • 75% of a distribution lies within two standard deviations of the distribution's average (can be shown using Chebyshev's inequality)
      • With the assumption that each permutation of L is equally likely, the average # of assignments FindMax does is approx. ln n with a std. dev. of sqrt(ln n).
  3. Finding the Second Best
    2n-3 algorithm (find best, discard and find second best)
    Is this optimal?
    What if we break the problem into two halves and use FindMax on both halves , then find the second best (ceil[3n/2]-2)

    FindSecondBest(List, lower, upper)
    	{Find the index of the second largest of List[lower..upper]
    	 Break the list into 2 and find the largest of each half, then
    	 find the largest of the candidates for second largest.}
    	 
    	 mid = (lower+upper)/2
    	 index1 = FindMax(List, lower, mid)
    	 index2 = FindMax(List, mid+1, upper)
    	 if List[index1] > List[index2] then
    	 	List[index1] = List[index2]
    		return FindMax(List, lower, mid)
    	else
    		List[index2] = List[index1]
    		return FindMax(List, mid+1, upper)
    

    Is ceil[3n/2]-1 optimal? ---- break the first part into quarters: ceil[5n/4]-1...

    A height m binomial tree is

    • a single node if m = 0
    • two height m-1 binomial trees connected at their roots for m > 0
    A height m binomial tree has 2m nodes, so an n-node binomial tree has height lg n
    there are (lg nl) level l nodes
    (ij) = (i-1j) + (i-1j-1)
    By building a binomial tree, we can find the second best of n things using at most n + ceil[lg n] - 2 comparisons

    BuildBinomialTree(List, lower, upper)
    	{Build a binomial tree on List[lower..upper], upper - lower + 1 is
    	 a power of two. upper >= lower > 0 }
    	
    	if upper > lower then
    		mid = (lower + upper) / 2
    		BuildBinomialTree(List, lower, mid)
    		BuildBinomialTree(List, mid+1, upper)
    		if List[mid] > List[upper] then
    			List[lower..mid] >==< List[mid+1..upper]
    

    BuildBinomialTree has worst number of comparisons = 2lg n - 1 = n-1
    Could build binomial tree using pointers but requires more space (two pointers for each node)

    FindSecondMax(List, lower, upper)
    	{Find the index of the second largest of List[lower..upper].
    	 Rearrange List to form a binomial tree, then find the largest of the
    	 candidates. upper-lower+1 is power of 2. upper > lower > 0 }
    	 
    	 BuildBinomialTree(List, lower, upper)
    	 
    	 2max = upper - 1
    	 for i = 1 to lg(upper-lower+1) - 1
    	 	if List[upper-2i] > List[2max] then
    			2max = upper -2i
    	return 2max
    

    FindSecondMax does up to (n lg n) / 2 swaps
    first phase does n - 1 comparisons
    second phase does lg n - 1 comparisons
    Is n + lg n - 2 optimal?

    • Adversaries: fiends who try to make all algorithms solving a problem work as hard as possible (selects inputs that make solving the problem the hardest) -- Hangman example
    • The Lower Bound
      PAUSE: To find the third best, do we need to know the best?
      second best algorithms: must be n-1 losers to best, and k-1 of thise must lose twice, so there must be at least n+k-2 comparisons
      let Alice's strength=a and Bob's strength=b, winner will have strength=a+b
      if a>=b then 2a >= a + b, so in the worst case the strength doubles and each original strength starts at 1
      strength must equal or exceed the number of contestants n, where 2k >= n > 2k-1 where k is the number of "fights" and k < ceil[lg n]
      therefore, finding the second best requires at least n + ceil[lg n] - 2 comparisons in the worst case and the FindSecondMax has the optimal worst cost
  4. Finding the Best and the Worst
    Use FindMax twice with different orders: 2n-3 comparisons
    FindMaxMin(List, lower, upper)
    	{Find the indicies of the largest and smallest of List[lower..upper]
    	 upper >= lower > 0 }
    	
    	case upper - lower + 1
    		= 1 : return lower, lower
    		= 2 : if List[lower] > List[upper] then
    				return lower, upper
    			  else
    			  	return upper, lower
    		> 2 : 
    				mid = (lower+upper)/2
    				max1, min1 = FindMaxMin(List, lower, mid)
    				max2, min2 = FindMaxMin(List, mid+1, upper)
    				if List[max1] > List[max2] then
    					max = max1
    				else
    					max = max2
    				if List[min1] > List[min2] then
    					min = min2
    				else
    					min = min1
    				return max, min	
    

    f(n) = 0 if n=1; 1 if n=2; f(floor[n/2])+f(ceil[n/2])+2 if n > 0
    f(n) = 3n/2 - 2 + (n-2floor[lg n])/2 if n <= 3x2floor[lg n]-1;
    = 3n/2 -2 + (2floor[lg n]+1 - n)/2 otherwise
    PAUSE: Check that f(n) = 3n/2 -2 when n = 2i or n = 2i+/- 1
    PAUSE: Since the algorithm is good when n = 2i, what should we do if n = 2i + 2i? What does that decomposition suggest?
    Split six things into two parts (3x3 vs. 4x2)
    • Designing Algorithms Using Recurrence
      f(n) = 0 if n=1; 1 if n=2; min(1<=k<=n-1) {f(k) + f(n-k) + 2 if n > 2
      e.g. f(n) = 0 if n=1; 1 if n=2; f(2) + f(n-2) + 2 if n > 2
      f(n) = ceil[3n/2]-2

      FindMaxMin(List, lower, upper)
      	{Find the indicies of the largest and smallest of List[lower..upper]
      	 upper >= lower > 0 }
      	
      	case upper - lower + 1
      		= 1 : return lower, lower
      		= 2 : if List[lower] > List[upper] then
      				return lower, upper
      			  else
      			  	return upper, lower
      		> 2 : 
      				if List[lower] > List[lower+1] then
      					max1 = lower; min1 = lower+1
      				else
      					max1 = lower+1; min1 = lower
      				max2, min2 = FindMaxMin(List, lower+2, upper)
      				if List[max1] > List[max2] then
      					max = max1
      				else
      					max = max2
      				if List[min1] > List[min2] then
      					min = min2
      				else
      					min = min1
      				return max, min	
      
      

      f(n) = 0 if n=1; 1 if n=2; f(n-2) + 3 if n>2
    • The Lower Bound
      Show that ceil[3n/2]-2 is optimal by examining all possible states any algorithm solving the problem must go through. [state space lower bound]
        four kinds of elements
      • Novices: not been compared yet
      • Winners: won at least once, but not lost
      • Losers: lost at least one, but not yet won
      • Moderates: both won and lost at least once

      starting state: (n,0,0,0) ==> ending state: (0,1,1,n-2) [wnat to show that it requires (3n/2)-2 comparisons to go from start to end
      six types of useful comparisons (N:N, W:W, L:L, N:W, N:L, W:L) [Table 3.4]
      use an adversary to force every transition to be the least efficient: winners always win except against other winners and losers always lose except against another loser [therefore we ignore W:L comparisons] [Table 3.5]
      Compare novices with novices requires floor[n/2] comparisons; then compare W:W and L:L for n-2 comparisons; overall comparisons is floor[3n/2]-2 (optimal worst cost)
  5. Finding the ith Best

    Split(List, lower, upper, pivot-loc)
    	{Split List[lower..upper] into two parts, those less than List[pivot-loc]
    	 on the left and those greater than it on the right. Return the pivot's
    	 new position. upper >= pivot-loc >= lower > 0}
    	 
    	 pivot = List[pivot-loc]
    	 List[lower] <=> List[pivot-loc]
    	 pivot-loc = lower
    	 for index = lower+1 to upper
    	 	if pivot > List[index] then
    			pivot-loc++
    			List[index] <=> List[pivot-loc]
    	List[lower] <=> List[pivot-loc]
    	return pivot-loc
    
    • Finding the ith Best By Randomizing
      Find(List, lower, upper, i)
      	{Find the index of the ith largest of List[lower..upper]
      	 upper >= lower > 0; upper - lower + 1 >= i >= 1 }
      	 
      	 index = uniform(lower, upper)
      	 pivot-loc = Split(List, lower, upper, index)
      	 case upper - pivot-loc
      	 	< i-1: Find(List, lower, pivot-loc - 1, i-upper+ pivot-loc - 1)
      		= i-1: return pivot-loc
      		> i-1: Find(List, pivot-loc + 1, upper, i)
      
    • Finding the ith Best Without Randomizing
      Select(List, lower, upper, i)
      	{Find the index of the ith largest of List[lower..upper].
      	 Use the list Medians[1..ceil[n/5]] as extra storage.
      	 upper >= lower > 0; upper - lower + 1 >= i >= 1 }
      	 
      	 n = upper - lower + 1	 
      	 for j = 0 to ceil[n/5] - 1
      	 	Medians[j+1] = median of List[(lower+5j)..(lower+5j+4)]
      	 if n is not a multiple of 5 then
      		Medians[ceil[n/5]] = median of the remaining elements of List
      	 index = Select(Medians, 1, ceil[n/5], ceil[n/10])
      	 Make index the index of the corresponding element of List
      	 pivot-loc = Split(List, lower, upper, index)
      	 case upper - pivot-loc
      	 	< i-1: Select(List, lower, pivot-loc - 1, i - upper + pivot-loc -1_
      		= i-1: return pivot-loc
      		> i-1: Select(List, pivot-loc + 1, upper, i)
      	
      

  6. The Partition Problem
    What is the worst cost for find the i1 best elements, the i2 next best elements, ..., the ik worst elements?
    selection is half a serach problem (finding the ith) and half a structured problem (order the input to five the ith).
    searching is a partition problem where our input is a sorted list plus one more element and we have to produce a sorted list.
    sorting is a partition problem where each ij is one element
  7. Changing the Model
    Want to find the "approximate best":
    Let X = "approximate best" produced by algorithm and Y = "actual best":
    • X = Y
    • X is close to Y:
      rank(X) <= rank(Y) + "small" [approximation algorithm)
    • X is usually Y:
      P(X = Y) >= large [probablistic algorithm]
    • X is "usually" "close" to Y:
      P(rank(X) <= rank(Y) + "small") >= "large" [heuristic]
    last three algorithms are relaxed algorithms
  8. Coda - Artists and Artisans

To Previous Chapter To Table of Contents To top of page To Next Chapter