educative.io

About the Pattern: Two Heaps - Sliding Window Median (hard) category


#1

Have questions about Sliding Window Median (hard)? Go for it!

View the lesson here.


#2

I don’t like the official answer here. There are better solutions to this problem.

  • A very simple one is to maintain a fixed and ordered array of size k and use the technique from insertion sort. This would only cost O(K) to compute the median of each window, the overall time is O(NK).
  • We can also a multiset with 2 pointers pointing to the lower median and higher median. Update the pointers after each insertion and deletion. The overall cost is O(NlgK).
  • Another approach is to use 2 multiset instead of 2 heaps since multiset naturally supports insertion and deletion in log time.

#3

More Readable solution using the patterns we discussed earlier

const Heap = require('./collections/heap'); //http://www.collectionsjs.com

class SlidingWindowMedian {

  find_sliding_window_median(nums, k) {
    result = [];
    let start=0;
    let medianStream = new MedianStream();
    for (let end=0; end < nums.length; end++) {
      medianStream.insertNum(nums[end]);

      if (end - start + 1 === k) {
        result.push(medianStream.findMedian());
        medianStream.deleteNum(nums[start]);
        start += 1;
      }
    }
    // TODO: Write your code here
    return result;
  }
};

class MedianStream {
  constructor() {
    this.maxHeap = new Heap([], null, (a,b) => a - b);
    this.minHeap = new Heap([], null, (a,b) => b - a);
  }

  insertNum(num) {
    if (this.maxHeap.length === 0 || this.maxHeap.peek() >= num) {
      this.maxHeap.push(num);
    } else {
      this.minHeap.push(num);
    }

    this.rebalanceHeap();
  }
  
  rebalanceHeap() {
    if (this.maxHeap.length > this.minHeap.length+1) {
      this.minHeap.push(this.maxHeap.pop());
    } else if (this.minHeap.length > this.maxHeap.length) {
      this.maxHeap.push(this.minHeap.pop());
    }
  }

  deleteNum(num) {
    if (num <= this.maxHeap.peek()) {
      this.maxHeap.delete(num);
    } else {
      this.minHeap.delete(num);
    }
    this.rebalanceHeap();
  }

  findMedian() {
    if (this.maxHeap.length > this.minHeap.length) {
      return this.maxHeap.peek();
    }

    return (this.maxHeap.peek() + this.minHeap.peek()) / 2;
  }
};



var slidingWindowMedian = new SlidingWindowMedian()
result = slidingWindowMedian.find_sliding_window_median(
  [1, 2, -1, 3, 5], 2)

console.log(`Sliding window medians are: ${result}`)

slidingWindowMedian = new SlidingWindowMedian()
result = slidingWindowMedian.find_sliding_window_median(
  [1, 2, -1, 3, 5], 3)
console.log(`Sliding window medians are: ${result}`)


#4

What do you mean by insertion sort? Insertion sort takes O(N^2) since it needs to swap elements based on the incoming elements. Heap is our best bet as i can think of. Can you provide us a code for your NlogK solution. I would be curious to see if there is way to solve this without a Heap. Appreciate your time.


#5

For the (modified) insertion sort, you only need to performing linear number of swaps to get the correct ordering. Since the window size is fixed, you first fill a vector with the first K elements and sort it which costs O(KlgK). Afterward, you slide the window meaning you replace an old element in this ordered vector with a new element. This will break the ordering, however there’s only one element that breaks the ordering. For instance, you might have an ordered vector as [1,2,3,4,5,6], then the element 4 is replaced by a new element say 10, you would have [1,2,3,10,5,6]. Now you swap 10 to the right [1,2,3,5,10,6], now swap to right again [1,2,3,5,6,10]. Therefore, it only costs linear time to make the vector ordered again! Remember that for an arbitrary array, it costs O(N*N) for insertion sort and O(NlgN) for quick/merge sort, but you can leverage some properties of partially ordered array to make sorting much efficient. I have 2 more solutions and they passed test on Leetcode (the same problem).
Here is one solution using 2 multiset:

// 2 multisets
class Solution {
public:
    vector<double> medianSlidingWindow(vector<int>& nums, int k) {
        vector<double> ret;
        int in_num, out_num;
        for (int i = 0; i < k; ++i) {
            insertData(nums[i]);
        }
        ret.push_back(getMedian());
        for (int i = k; i < nums.size(); ++i) {
            in_num = nums[i];
            out_num = nums[i - k];
            deleteData(out_num);
            insertData(in_num);
            ret.push_back(getMedian());
        }
        return ret;
    }
private:
    multiset<int> lo, hi;
    
    void insertData(int num) {
        lo.insert(num);
        auto lo_max = lo.end();
        lo_max--;
        hi.insert(*lo_max);
        lo.erase(lo_max);
        
        if (lo.size() < hi.size()) {
            auto hi_min = hi.begin();
            lo.insert(*hi_min);
            hi.erase(hi_min);
        }
    }
    
    void deleteData(int num) {
        if (lo.find(num) != lo.end()) {
            lo.erase(lo.find(num));
        } else {
            hi.erase(hi.find(num));
        }
        
        if (lo.size() < hi.size()) {
            auto hi_min = hi.begin();
            lo.insert(*hi_min);
            hi.erase(hi_min);
        } else if (lo.size() - hi.size() >= 2) {
            auto lo_max = lo.end();
            lo_max--;
            hi.insert(*lo_max);
            lo.erase(lo_max);
        }
    }
    
    double getMedian() {
        auto lo_max = lo.end();
        lo_max--;
        if (lo.size() > hi.size()) {
            return *lo_max;
        } else {
            return double(*lo_max) * 0.5 + double(*hi.begin()) * 0.5;
        }
    }
};

And another solution using 2 pointers together with a multiset:

// multiset with 2 pointers
class Solution {
public:
    vector<double> medianSlidingWindow(vector<int>& nums, int k) {
        vector<double> ret;
        int in_num, out_num;
        lo = data.end();
        hi = data.end();
        for (int i = 0; i < k; ++i) {
            insertData(nums[i]);
        }
        ret.push_back(getMedian());
        for (int i = k; i < nums.size(); ++i) {
            in_num = nums[i];
            out_num = nums[i - k];
            insertData(in_num);
            deleteData(out_num);
            ret.push_back(getMedian());
        }
        return ret;
    }
private:
    multiset<int> data; // size = k
    multiset<int>::iterator lo, hi;
    
    void insertData(int num) {
        int m = data.size();
        data.insert(num);
        
        if (m == 0) { // empty
            lo = hi = data.begin();
        } else if (m % 2 == 1) { // odd
            if (num >= *hi) hi++;
            else lo--;
        } else { // even
            if (num >= *hi) {
                lo++;
            } else if (num < *lo) {
                hi--;
            } else if (num >= *lo && num < *hi) {
                lo++;
                hi--;
            }
        }
    }
    
    void deleteData(int num) {
        int m = data.size();
        auto del_ptr = data.upper_bound(num);
        del_ptr--;
        
        if (m == 1) { // only 1 element
            lo = hi = data.end();
        } else if (m % 2 == 1) { // odd >= 3
            if (num > *lo) { // larger than median
                lo--;
            } else if (num == *lo){ // equal to median
                lo--;
                if (del_ptr == hi) hi++;
            } else { // smaller than median
                hi++;
            }
        } else { // even >= 2
            if (num >= *hi) hi--;
            else if (num <= *lo) lo++;
        }
        data.erase(del_ptr);
    }
    
    double getMedian() {
        return double(*lo) * 0.5 + double(*hi) * 0.5;
    }
};

#6

I recommend you solve the problem Median from Data Stream using the following 2 different approaches first:

  • 2 multiset: one for the smaller half, another for the larger half. when you need to compute the median, just grab the max of the smaller half and min of the larger half. multiset allows you to get min/max in O(1).
  • single multiset with 2 pointers: one pointer points to the lower median and another points to the higher median. You need to update the pointers after each insertion or deletion.

The reason why I prefer to use set/multiset is that they are implemented as Red-Black-Tree and all operations are O(lgN), most importantly they naturally support deletion. Also, getting min/max is O(1) since you have access to the begin and end iterators immediately.