Is hashing a DNA sequence really required?

Vikas_Kumar1 · January 23, 2024, 4:11am

In the recommended solution under topic ** Hashing and comparison in constant time** we are using polynomial rolling hash technique. Is it really required because if I use a hash map to store keys then storing and accessing should be O(1), right?

Course: Grokking Coding Interview Patterns in Go - Learn Interactively
Lesson: Solution: Repeated DNA Sequences

Dian_Us_Suqlain · January 24, 2024, 12:23pm

Hi @Vikas_Kumar1,

If we’re not using hashing and just comparing substrings on every iteration using perhaps, a hash map, we would have to generate a k-length substring (O(k)) for every (n - k + 1) iteration of the for loop, leading to the average and worst case time complexity becoming the same, i.e. O((n -k) * k).

On the contrary, in our solution, even though the worst case time complexity remains the same, the average case time complexity is significantly improved, since we’re only generating a k-length substring when a repeated sequence is encountered leading to the average case time complexity becoming O(n).

I hope this explanation helps clear your understanding regarding the polynomial rolling hash technique. Feel free to share more suggestions and feedback. We’d be happy to help. Thanks!

Happy learning!

Vikas_Kumar1 · January 26, 2024, 2:06pm

Okay. I was ignoring the k-length substring generation part here. This makes sense. Thanks!
I am wondering why the test cases didn’t time out even when this approach was not implemented.