performance - Python efficient way to check if very large string contains a substring

Question

Welcome To Ask or Share your Answers For Others

performance - Python efficient way to check if very large string contains a substring

asked Oct 24, 2021 in Technique[技术] by 深蓝 (71.8m points)

Python is not my best language, and so I'm not all that good at finding the most efficient solutions to some of my problems. I have a very large string (coming from a 30 MB file) and I need to check if that file contains a smaller substring (this string is only a few dozen characters). The way I am currently doing it is:

if small_string in large_string:
    # logic here

But this seems to be very inefficient because it will check every possible sequence of characters in the file. I know that there will only be an exact match on a newline, so would it be better to read the file in as a list and iterate through that list to match?

EDIT: To clear up some confusion on "matching on a newline only", here's an example:

small_string = "This is a line"
big_string = "This is a line
This is another line
This is yet another"

If I'm not mistake, the in keyword will check all sequences, not just every line.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙；凝视深渊过久,深渊将回以凝视…

1.0k views

1 Answer

深蓝 · Answer 1 · 2021-10-23T19:22:50+0000

Is it really slow? You're talking about 30MB string; let's try it with even bigger string:

In [12]: string="agu82934u"*50*1024*1024+"string to be found"

In [13]: len(string)
Out[13]: 471859218

In [14]: %timeit "string to be found" in string
1 loops, best of 3: 335 ms per loop

In [15]: %timeit "string not to be found" in string
1 loops, best of 3: 200 ms per loop

I wouldn't say that 335 ms is much time looking for substring in 450MB string.

Categories

performance - Python efficient way to check if very large string contains a substring

Please log in or register to add a comment.

Please log in or register to answer this question.

1 Answer

Please log in or register to add a comment.

Just Browsing Browsing

Most popular tags