Repeated DNA Sequences - Optimized approach worst case

Isaac_Haseley · November 6, 2023, 8:57pm

For the problem Repeated DNA Sequences: Solution: Repeated DNA Sequences

The optimized solution’s worst case runtime is:

To handle this case, I think we could create a third hash set to track which substrings we’ve already added to our result set, to prevent repeatedly creating and adding the same substring to the result set.

To demonstrate a worst case that we can’t navigate, we could maybe use an example like this: K=4, S=[1111 1111 1112 1112 1121 1121 1211 1211…] where the number of results is proportional to the length of the input.

Dian_Us_Suqlain · November 13, 2023, 10:57am

Hello @Isaac_Haseley,

Based on your proposed solution, if we use a third hash set to store the k-length substrings that are repeated for the first time, we would have to write an if condition such as the one below to ensure that substrings repeated more than one time are not added to the output set:

if hash_value in hash_set and s[i : i + k] not in third_hash_set: output.add(s[i : i + k]) third_hash_set.add(s[i : i + k])

However, this if condition will execute on every iteration of the for loop, leading to a k-length substring being generated n-k times. So the time complexity of this solution will be O((n-k) * k) (worst case) for every string s.

I hope this answers your query regarding the worst case approach. Please feel free to share any suggestions and feedbacks, we’d be happy to help. Thanks!

Happy learning!

Isaac_Haseley · November 14, 2023, 10:42pm

Thanks for your response @Dian_Us_Suqlain!

I was thinking more like: if hash_value in hash_set and hash_value not in third_hash_set: output.add(s[i : i + k]) third_hash_set.add(hash_value)