EECS 380 Homework 3
Winter Semester '01
Assigned Feb 15th 2001.
Due Monday, March 12th, 6pm, EECS 380 box in room 2420 EECS
You are to turn this in with all parts stapled together.
Please use two-sided printing to reduce the amount of paper in homeworks,
if you can (graders are having trouble carrying all homeworks!).
Make sure to mention the following on the first page:
your name, uniquename and the number/time of your discussion section.
Activities
- Racing STL's sort() versus the qsort() function from stdlibc
- Improving the worst-case time of the quicksort algorithm
- Incremental sorting
- Visualization of bubble sort and shaker sort
1. Racing STL's sort() versus qsort() from stdlibc
You need to compare the runtime of sort() from STL
to qsort() (provided in the C standard library) on
large arrays of large integers. You need to conclude
which one is faster on average and by how much
(a multiplicative factor).
A sample call to STL's sort is given in the STL's
documentation.
A sample call to qsort is given in Sedgewick on p. 303.
Additionally, all functions from the C standard library have
man pages (try man qsort
, a caveat in this man page
is discussed in the newsgourp).
You need to use the following comparison methodology:
- For a given N, populate an array with N randomly generated large
integers. For this use the srand() and rand() calls (or srandom()
and random()). Type
man rand
and man random
for details on those two (here's an optional
journal article
on rand() and random()). It is recommended that you print
your random numbers when testing the code to make sure they
"look random".
- You will need two copies of the array --- one for sort and the
other for qsort. Run sort and qsort once, and measure the cpu time
for each. Make sure that N is large enough to give reliable readings
without repeated runs. You must compile your code with the compiler
flag
-O3
- Change the random seed (otherwise, you will always generate the same
"random" numbers), repeat the experiment for the same N,
compute the average runtime over all experiments done so far
for the same N. You can stop when the average stabilizes to within 10%.
- For a given N, you need to figure out which sorting call
is faster and by how much on average (by which factor).
Use three values N>10000, each at least 2x apart from each other.
You do not need to plot anything in this question. You need to produce
9 numbers: for each of the three N values you select, you will have two
averaged runtimes and a comparison factor (the ratio between the two
runtimes). The final answer in English should say whether qsort() or
sort() is recommended when speed is all that matters. You should compile
with -O3
, and are very likely to get a wrong answer otherwise.
Feel free to explain your results, but explanations will not be graded.
2. Improving the worst-case time of qsort
Using random inputs and the averaging methodology from part1,
empirically show that the STL's
nth_element
function runs in linear time on average when it is used to find the
median (i.e., n
is half the size of the container).
For this, build a plot of its average run time as
a function of the input size for at least 6 (significantly
different) values of N. Every datapoint needs to be averaged as
explained in part1, but you don't need to compare it to anything.
Assume that you are given a function median()
(with whatever arguments you wish) that computes the median
in worst-case linear time (STL's implementation
apparently does not). A very non-trivial algorithm for this
is explained in detail in the CLR book (pp.189-196), but you are
not required to read that --- just assume that you can use its
implementation.
- Given the example of worst-case input for the original quick sort
algorithm described in the Sedgwick book (page 321); explain why
the median-of-three pivoting and choosing a pivot at random, each,
do not improve the worst-case big-theta complexity of qsort.
- Consider modifying the original qsort algorithm using linear-time
median selection so that the modified algorithm will have better
worst-case complexity. Consider two cases: (i) each element in the
array can only repeat O(1) times, (ii) the general case. You need
to provide a proof of its worst-case complexity (it can refer to the
proof of worst case complexity of qsort and simply analyze
modifications).
In practice, linear-time median does a lot of work,
and "improved" qsort()
may not be competitive
with original qsort
on average, despite better
asymptotic worst-case performance and asymptotically equivalent
average-case performance.
3. Incremental sorting
Generate large random almost-sorted ranges. To do this you will
start with a randomly generated array of integers from part 1
and sort such an array; this will give you a large random sorted range.
Now let's spoil it. Pick two indices from 0 to N-1
at random ( idx=rand() % N; ) and swap the values.
In some application this can correspond to a perturbation
of an already-sorted range, so we need to bring it back to sorted order.
- Implement a generic sorting algorithm that takes linear worst-case time
on inputs generated as explained in the previous paragraph --- you
are allowed to paste any code from Chapter 6 of our text. Empirically
show that it takes linear time on average (similar to experiments in
part 2) and compare its runtime to the faster of sort()/qsort() (similar to
the experiments in part 1).
Remember that all inputs must be almost-sorted ranges from 3a).
4. Visualization of bubble sort and shaker sort
- You need to have a working implementation of bubble sort for arrays
(can paste from Sedgewick). Shaker sort is a variant of bubble sort
in which passes are performed left-to-right and right-to-left in turns,
rather than in one direction as in bubble sort (why would this be useful?).
You need a shaker sort implementation as well. Both implementations
should terminate once the range is sorted (rather than perform "dummy"
passes).
- Using the index-sorting implementation in the
handout,
write a function plot() that takes a range and saves its sorted index
so that the index can be plotted by gnuplot (i.e., two unsigned's
per line).
- Use N=200 for your experiments, use your student ID as random seed.
Generate an array of N random numbers as in part 1. Find out how
many passes bubble sort and shaker sort perform on this array.
Modify the codes of bubble sort and shaker sort so that plot()
is called half way through (e.g., if there are 90 passes total,
plot() should be called at pass 45).
You need to produce two pictures. They will probably look similar
to Fig 6.7 in Sedgewick.
Turn-in check-list
Part 1: * Source code (one file).
* A table with 9 numbers.
* A recommendation in English as to whether
sort() or qsort() should be used when speed matters.
Part 2: * A plot showing that STL's median selection runs in linear time
on average.
* A description of worst-case inputs for qsort.
* An explanation of why both median-of-three and random pivoting
do not change worst-case big-theta.
* A [theoretical] proof of the worst-case run time when using
the median as the pivot.
[You do not need to implement this modified quicksort]
Part 3: * Source code (one file).
* One plot showing that your algorithm has linear average-case
complexity on the almost-sorted ranges considered (plot as in
part 2)
* A comparison between your algorithm and the faster of
sort()/sqort() --- table as in part 1.
Part 4: * Source code for bubble and shaker sort.
* An explanation why it makes sense to perform bubble-passes in
both directions.
* Two plots. (Plots produced by different students
must be different. If plots submitted by different students are
the same, both homeworks may be disqualified (0 score for the
whole homework)).