The 10 test cases for MEDIAN have been designed to detect performance
differences as exhibited by 16 different algorithms (also see below):
OPE = Onion Peeling Eliminiation
LISF = Linear Insertion Sort Using Full List
LISH = Linear Insertion Sort Using Half List
LISZ = Linear Insertion Sort Using Zoom List
BISF = Binary Insertion Sort Using Full List
BISH = Binary Insertion Sort Using Half List
BISZ = Binary Insertion Sort Using Zoom List
TISF = Ternary Insertion Sort Using Full List
TISH = Ternary Insertion Sort Using Half List
TISZ = Ternary Insertion Sort Using Zoom List
TPFS = Ternary Partioning Find Using Straddled Pivots
TPFF = Ternary Partioning Find Using First Pivots
TPFP = Ternary Partioning Find Using Proportional Pivots
TPFR = Ternary Partioning Find Using Random Pivots
SLSB = Sorted List of Sorted Buckets
HTSB = Heap-like Tree of Sorted Buckets
The next table shows how many calls each algorithm made for each test case
solved within the bound of 7777 calls. The rightmost column shows the score.
Case # | 1 2 3 4 5 6 7 8 9 10 |
N | 5 177 577 975 1087 1267 1357 1415 1415 1499 |
Cat | M R N R R R R R A R | Pts
Alg
==== + ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== + ===
OPE | 4 7744 | 20
==== + ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== + ===
LISF | 4 4062 619 | 30
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
LISH | 3 2590 598 | 30
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
LISZ | 3 2160 598 | 30
==== + ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== + ===
BISF | 4 861 4175 7051 | 40
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
BISH | 5 843 4108 6803 | 40
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
BISZ | 4 730 3621 6269 7078 | 50
==== + ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== + ===
TISF | 3 712 2918 5415 6143 7376 | 60
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
TISH | 3 669 2707 5349 6011 7103 7642 | 70
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
TISZ | 3 609 2537 4889 5540 6641 7191 7511 7572 | 90
==== + ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== + ===
TPFS | 3 517 1525 2842 3257 3531 2231 3218 | 80
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
TPFF | 4 395 2205 2378 3635 3601 2663 2493 | 80
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
TPFP | 5 331 848 3512 1705 2291 3093 2860 2863 2985 | 100
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
TPFR | 4 372 1778 2201 2507 2981 3377 3987 3279 3540 | 100
==== + ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== + ===
SLSB | 4 491 1954 3242 3605 4258 4578 4824 4149 5147 | 100
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
HTSB | 4 508 2218 3184 3902 4517 4862 5074 4389 5354 | 100
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
The 10 test cases belong to 4 categories:
M = Manually designed
R = Randomly generated
N = Nearly sorted
A = Alternating outside-to-inside (1 3 5 ... 6 4 2)
Here is a similar table showing the number of calls for cases where
the algorithm FAILS (does not stay within the bound). When the number
of calls exceeds 9999, only an approximate value in "scientific
notation" is given, where >XeY means that the number of calls
exceeds X*10^Y, but does not exceed (X+1)*10^Y. One extra column
has been added on the right. It indicates whether for N = 1499 and
under worst-case conditions (W), the algorithm stays within the bound
of 7777 (shown as <=) or not (shown as >). The library, however, is
not able to create such worst-case conditions dynamically.
Case # | 1 2 3 4 5 6 7 8 9 10 |
N | 5 177 577 975 1087 1267 1357 1415 1415 1499 |1499
Cat | M R N R R R R R A R | W
Alg
==== + ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== + ===
OPE | >8e4 >2e5 >2e5 >4e5 >4e5 >4e5 >4e5 >5e5 | >
==== + ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== + ===
LISF | >1e5 >1e5 >1e5 >2e5 >2e5 >2e5 >2e5 | >
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
LISH | >7e4 >8e4 >1e5 >1e5 >1e5 >1e5 >1e5 | >
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
LISZ | >5e4 >7e4 >1e5 >1e5 >1e5 >6e4 >1e5 | >
==== + ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== + ===
BISF | 7791 9532 >1e4 >1e4 >1e4 >1e4 | >
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
BISH | 7811 9414 >1e4 >1e4 >1e4 >1e4 | >
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
BISZ | 8537 9299 9803 >1e4 >1e4 | >
==== + ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== + ===
TISF | 7981 8386 8339 8993 | >
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
TISH | 7980 7946 8519 | >
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
TISZ | 8032 | >
==== + ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== + ===
TPFS | >7e4 >1e4 | >
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
TPFF | >5e4 >1e4 | >
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
TPFP | | >
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
TPFR | | >
==== + ==== ==== ==== ==== ==== ==== ==== ==== ==== ==== + ===
SLSB | | <=
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
HTSB | | <=
---- + ---- ---- ---- ---- ---- ---- ---- ---- ---- ---- + ---
Information about the algorithms
OPE: Repeatedly eliminate the two extremes (min and max strength).
This takes (N-1)^2 / 4 calls.
All Insertion Sort methods: Maintain a sorted list of objects
investigated so far (sorted modulo up or down) and repeatedly
insert a next object. The location to insert can be found
by linear, binary, or ternary search. Linear insertion is
quadratic in both worst and average cases, and linear in best cases.
Binary and ternary insertion have N*log N complexity (ternary
has smaller constant factor).
Instead of maintaining the Full list (LISF, BISF, TISF) containing
all the objects in the end, it is enough to limit the list to
contain no more than half the number of objects (Half List: LISH,
BISH, TISH). Reason: after having considered (N+1)/2 objects,
the element at the end of the list cannot be the median,
because more than (N-1)/2 objects are stronger/weaker than this
object.
In fact, both extremes in the sorted list can be eliminated
once (N+1)/2 objects have investigated (Zoom List: LISZ, BISZ,
TISZ). That way, the list increases in length during the
first half, and descreases in length during the second half,
until only one candidate remains (which then must be the median);
it zooms out and then in on the median.
All Partitioning Find methods: Compare to median selection by
partitioning (as in QuickSort, discarding the segment that is known
not to contain the median). Only partitioning into three
parts (based on choosing two pivot objects) have been considered.
In general these methods are quadratic in worst case, but linear
in average and best case. There are various ways to choose the
pivots: one at each end (Straddled: TPFS), both at one end (First:
TPFF), at one third and two thirds in the list (Proportional: TPFP),
and Random (TPFR). For TPFS and TPFF, the sorted input is bad,
but for TPFP and TPFR it is (very) good. TPFR has no specific
worst case inputs. Worst case input for TPFP depends on details
of rounding when choosing the proportional pivots.
SLSB: Maintains buckets of at most K objects (for some K; K=8 is a
good choice). Each bucket is sorted (with respect to the order
of two reference objects), and the list of buckets is sorted on
the minimum of the buckets (w.r.t. the same reference objects).
Compared to insertion sort into a list of single objects (Full,
Half, or Zoom) this saves calls (over 2000 in the worst case
of the task), because only a partial order instead of a total
order is constructed. You can calculate the number of calls
in the worst case for N=1499, and it is just below 7777. Average
case behavior is better than worst case.
HTSB: This takes the idea of SLSB one step further by maintaining
the buckets in a heap-like leaftree. The data structure is
more complicated, and this method is not needed to obtain
a perfect score. It shows that more advanced data structures
can do even better, not only on average but also in the worst
case. Note that the advantage is not visible for "small" N
(such as 1499) on random cases.