Repeated DNA Sequences Space Complexity

sanchez.ericat · March 4, 2024, 4:05am

Hello!

Your solution says the optimized approach using the rolling hash has space complexity O(n), suggesting it contains on the order of n constant-sized hashes. However, it looks like the hash function increases with the size of k, so it seems like space complexity would have to be at least O(nk) unless we find a way to constrain the size of the hash to constant size.

Is such a constraint implied somewhere that I’m missing? If not, how do we modify the solution to keep constant-sized hashes without sacrificing time complexity?

Thanks!

Dian_Us_Suqlain · April 4, 2024, 12:42am

Hey @sanchez.ericat,

Apologies for the delayed response. Yes, in the context of this problem, we have specified the constraint in the problem statement section of the lesson. It specifies that k (the length of contiguous subsequences) is at most 10, utilizing a rolling hash function allows us to keep the hash representation constant-sized, mitigating potential space complexity concerns that would arise with larger k values. Thus, when k = 10, the space complexity O(n-k+1) simplifies to O(n) and remains linear with respect to the input size.

On the other hand, in practical applications, especially in bioinformatics where this problem is common, the hash values are often constrained to a fixed size by using a modulo operation with a large prime number. This technique ensures that regardless of the value of k , each hash value can be stored in a constant amount of space, typically fitting within a standard integer data type in most programming languages.

I hope it clears out any confusion. Feel free to share further suggestions and feedback. We’d be happy to help. Thanks!

Happy learning!!