*Maximum Segment Sum
Just to be clear, we're trying to find the largest sum or the longest sequence for
any set of matching sums.
If you talk with Paul (who was one of my interviewers, but I do not know his last name),
he may remember the first step. I worked out the rest while standing in the ticket line at LAX.
Actually, I'd really like for you to share this with my interviewers.
====================
= Algorithm
====================
1. We compact the sequence of integers such that subsequences of
positive integers are summed together as are sequences of negative integers.
2. If a zero is next to a positive integer or zero, it compacts as if it is positive.
3. If a zero is between two negative integers, it compacts as if it is negative.
4.a. A negative or a zero and a negative at the leftmost end point may be discarded, as
neither case increases the maximum segment sum.
4.b. A negative or a negative and a zero at the rightmost end point may also be discarded.
However, if we discard a rightmost element, we must record the new rightmost index
as one minus the leftmost index of the element we discard.
Note: Both, 4.a. and 4.b., may be ignored if we do not plan on an implementation that expects
both endpoints to be positive. I've include this assumption for simplicity of later steps.
Our sequence now has a pattern: positive, negative, positive, negative,. . . ., negative, positive.
We exploit this pattern, though by the note above, a real implementation may choose to allow negative
elements at the ends.
If we allow Step 4, then we have an even number of positive elements and an odd number of negative
elements. So the sequence has an odd number of elements.
5. We start from the leftmost positive integer (e.g., the start of the sequence).
6. We add three integers, which are sequenced positive, negative, positive.
7.a. If the resulting sum is greater than or equal to the first and second positive integer, then
we accumulate this result and keep its leftmost index.
7.a.1. We return to step 6, with our accumulator leading the next two integers,
sequenced negative, positive.
7.b. If the resulting sum is less than the first and second positive integer, we store the first
positive integer and the negative integer, and their leftmost indices.
7.b.1. We return to step 6, with our second positive integer in the leading place.
8. Accumulation stops when we exhaust the sequence of integers.
9. We restart the accumulation from the leftmost index of the compacted sequence until
it compacts no more (e.g., using a boolean variable set when compaction occurs and reset at step 8).
10. The Maximum Segment Sum is now the largest positive integer in the compacted sequence.
The leftmost index of the segment is stored with the sum. If the segment sum has a neighbor to
right, its rightmost index is its neighbor's leftmost index minus one. If their is no neighbor to
the right, the rightmost index is the rightmost index of the original sequence.
11. If there is a set of equal sums, choose the sum with the largest segment length.
======================
= End Algorithm.
======================
======================
= An example:
======================
The sequence is generated using a sipmle Java program, but the compaction steps are by hand.
For simplicity, all numbers are [-10,10].
Length: 20 integers
Sequence: 5 -6 1 6 2 7 -2 0 -8 4 -6 3 8 2 -6 9 0 0 2 4
Indices: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
From steps 1 - 3:
Compaction: 5 -6 16 -10 4 -6 13 -6 15
Indices: 1 2 3 7 10 11 12 15 16
Step 4 is unnecessary.
Steps 5 - 7.a.:
Compaction: 5 -6 16 -10 4 -6 13 -6 15
Indices: 1 2 3 7 10 11 12 15 16
Steps 7.a.1. to 9:
Compaction: 5 -6 16 -10 4 -6 22
Indices: 1 2 3 7 10 11 12
Step 10:
Maximum Segment Sum: 22, from indices 12 to 20.
========================
= End Example.
========================
========================
= Notes on Complexity
========================
We can remove worrying about the complexity of steps 1-4 as an efficient implementation
will merge them into step 5 such that each accumulation operation does
double duty (for the same price) on the first round.
So, in all cases of complexity I will discuss, I assume that getting to step 5 is basically free or
amortized and can be ignored. I cannot think of an example in which this is not the case.
========================
= Best Case Complexity
========================
The absolute best case is a known sequence of positive integers. This is an Order(n) summation.
Next best is that the accumulation from the left continues unabaited all the way to the rightmost
element. This is also Order(n), but a little more expensive given the comparisons that are
necessary after every second addition. This doesn't change the asymptotic behaviour.
Example:
5 -2 2 -3 3 -4 4 -5 5
5 -3 3 -4 4 -5 5 2 adds
5 -4 4 -5 5 2 adds
5 -5 5 2 adds
5 2 adds
The proof that this is Order(n) is simply that we use two additions to add three numbers together
and only store one number. Thus, every element in the sequence is added only once to the accumulator.
Also, we cannot get a sequence of integers to sum at better than Order(n).
========================
= Worst Case Complexity
========================
Well, in the worst case, the sequence of integers is already evenly mixed negative and positive and
all integers are about the absolute value. This will allow very little compaction by the first step.
This also means that there will be very little accumulation by subsequent steps, so the sequence will
remain of generally fixed size. However, as there is little compaction, there will be few rounds of accumulation
and the entire sequence will not have to be traveled often.
Additionally, the final search step at the end will be over the full sequence size. However, a simple
variation that keeps track of the currently known largest segment sum, updated on every round of
accumulation will get rid of the separate search phase.
So, the absolute worst case is a sequence of N integers that takes forever to compact. The model for this
is a sequence which does not compact until it reaches its rightmost triple of integers. The following
sequence details this:
5 -5 4 -4 3 -3 2 -2 1 -1 5
This sequence exibits worst-case behavior because it will only compact from right to left once per round of
accumulation. For example:
n = 9
5 -5 4 -4 3 -3 2 -2 5
5 -5 4 -4 3 -3 5 8 adds (n-1)
5 -5 4 -4 5 6 adds (n-3)
5 -5 5 4 adds (n-5)
5 2 adds (n-7)
So MSS is 5, with indices 1 to 9. This took 20 additions and 4 rounds of accumulation.
Also, most of these additions result in moving the accumulation to the next positive integer.
A slightly larger example which exihibits the same behavior:
n = 13
7 -7 5 -5 4 -4 3 -3 2 -2 1 -1 7
7 -7 5 -5 4 -4 3 -3 2 -2 7 12 adds
7 -7 5 -5 4 -4 3 -3 7 10 adds
7 -7 5 -5 4 -4 7 8 adds
7 -7 5 -5 7 6 adds
7 -7 7 4 adds
7 2 adds
MSS is 7, with indices of 1 to 13. This takes 42 additions and 6 rounds of accumulation.
Notice, worst case behavior will compact all the way to the left from the rightmost element.
It is easy to see that this is Order(n^2) behavior. I will not prove this, but I'm not completely ducking out.
(If you absolutely must have the proof, I have notes and I'll keep working on it, but I'd rather actually
get this to you today, than get stuck on a soon-to-be-irrelevant detail).
=========================
= HOWEVER
=========================
Any intelligent implementation will make note that only the rightmost elements are accumulating.
Reversing the list (and keeping track of the now rightmost indices) brings back the Best Case Complexity behavior.
A more intelligent implementation may notice this behavior for interior subsequences or alternate directions
for each run of the accumulation.
There is plenty of room to allow the worst case for simplicity of implementation and then build in the more
intelligent behavior as necessary.
=========================
= Average Complexity
=========================
How well should we expect this to perform given a random sequence of integers that does not have best or
worst case complexity?
At worst, we'll have some sequence that only accumulates once per round of accumulation, but continues
to accumulate nearly the entire sequence. If it comes from the right (which is possible since we're left biased
in the general algorithm) then we can make use of the reversal trick for some subsequence until it stops accumulating.
We can then pick up at this new element and go left again.
In fact, it may make sense to experiement with a variation that always reverses when there is a from-the-right accumulation.
So, even if the sequence see-saws (i.e., go left, go right, go left, go right), it is still only Order(n) complexity.
A better variation is to accumulate in the opposite direction when accumulation stalls. This would possibly see-saw a
bit, but it would have the effect of compacting the entire sequence in one round of accumulation.
Even here, it cannot see-saw n times because accumulation will have to end and begin with the next positive integer.
I'm going to ponder this one a bit longer.
So, to the best of my abilities to discern (though I'll keep trying tonight), this algorithm takes is Order(n)
for any intelligent implementation.
An unintelligent implementation will be bounded by Order(n) and Order(n^2).
===========================
= Update Log
===========================
20060626 -- 8:30pm EST -- Updated to fix error in example and add Step 11.
20060626 -- 6 pm EST -- Original posting.