Counting Inversions

Why would we want to count the number of inversions? One application may be to see how close two different people’s rankings of the same products are. If one person ranks a set of movies, then we can use his ranking as the standard, and count the number of inversions in the second person’s rankings with respect to the first person’s. This way we can find which person out of a whole pool has very similar tastes to the “golden” person.

I have no idea if this is really a robust way of doing it, though.

Counting the Number of Inversions

The brute force approach for counting the number of inversions is \(\theta(N^{2})\). We can do better by piggy-backing on merge sort: Let’s say you have left and right sublists. The total number of inversions is the number within the left sublist, the number within the right sublist, and the number between the two sublists (i.e. the elements in the right sublist that are greater than the ones in the left sublist).

To calculate this inter-list number, we utilize the fact that each sublist is already sorted. Hence, if a number on the right sublist is smaller than an entry on the left, then it is smaller than all the other “future” entries in the left sublist (utilizing sortedness here). So counting during the merge operation is \(\Theta(N)\).

The down side to this approach is that you are modifying the list.

Implementation

Python

def simple_compare(left, right):
    """
    Keyword Arguments:
    left  -- 
    right -- 
    """
    if left < right:
	return -1
    elif left > right:
	return 1
    else:
	return 0
    
def merge(lsta, lstb, comparator):
    """
    Merge two sorted lists together and return the merged list. This may be
    inefficient as I take a lot of slices and produce temporary lists in several
    places.

    Also returns the number of inversions between the right and the left
    sublists.
    
    Keyword Arguments:
    lsta -- Left list
    lstb -- Right list
    comparator -- Comparator function. It should take in two arguments, and
                  return -1 if the left is smaller, 1 if the right is smaller,
		  and 0 if they are equal. If none is provided, then it will
		  assume a numerical sort.
    """
    new_list = []
    index_a = 0
    index_b = 0
    completed = False
    inversions = 0
    len_a = len(lsta)
    while not completed:
	if comparator(lsta[index_a], lstb[index_b]) > 0:
	    start = index_b
	    while comparator(lsta[index_a], lstb[index_b]) > 0:
		index_b += 1
		inversions += len_a - index_a
		if index_b == len(lstb):
		    completed = True
		    break
	    new_list.extend(lstb[start:index_b])
	    if completed:
		new_list.extend(lsta[index_a:])
	else:
	    start = index_a
	    while comparator(lstb[index_b], lsta[index_a]) > 0:
		index_a += 1
		if index_a == len(lsta):
		    completed = True
		    break
	    new_list.extend(lsta[start:index_a])
	    if completed:
		new_list.extend(lstb[index_b:])
    return new_list, inversions

def count_inversions_int(lst, merge_method=merge, comparator=simple_compare):
    """
    Count the inversions in the list using a mergesort. So this *will* modify
    the list.

    Keyword Arguments:
    lst        -- List to sort
    merge_method     -- Function for merging. 
    comparator -- Comparator function. It should take in two arguments, and
                  return -1 if the left is smaller, 1 if the right is smaller,
		  and 0 if they are equal. If none is provided, then it will
		  assume a numerical sort.
    """
    if len(lst) == 1:
	return lst, 0
    split = len(lst)/2
    left_lst, linversions = count_inversions_int(lst[:split],
						 merge_method=merge_method,
						 comparator=comparator)
    right_lst, rinversions = count_inversions_int(lst[split:],
						  merge_method=merge_method,
						  comparator=comparator)
    new_list, minversions = merge_method(left_lst, right_lst,
					 comparator=comparator)
    return new_list, linversions + rinversions + minversions

def count_inversions(lst):
    """
    Count the inversions in the list using a mergesort. So this *will* modify
    the list.

    Keyword Arguments:
    lst        -- List to sort
    """
    return count_inversions_int(lst)[1]