匹配两个字符串的开头 [英] Match beginning of two strings

查看:89
本文介绍了匹配两个字符串的开头的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述




我需要通过大约200GB的数据来提取

共同的第一部分。这样的事情。

Hi,

I have about 200GB of data that I need to go through and extract the
common first part of a line. Something like this.

a =" abcdefghijklmnopqrstuvwxyz"
b =" abcdefghijklmnopBHLHT"
c = extract(a,b)
print c
a = "abcdefghijklmnopqrstuvwxyz"
b = "abcdefghijklmnopBHLHT"
c = extract(a,b)
print c



" abcdefghijklmnop"


这里我想提取公共字符串abcdefghijklmnop。基本上我需要一个快速的方法来为任何两个给定的字符串做这个。对于我的情况,

常见字符串将始终位于两个字符串的开头。我可以

使用正则表达式来做这件事,但据我所知,

很多开销。新数据的生成速度约为每小时1GB

,因此需要相当快的速度,同时为其他进程留出CPU时间。


谢谢

Ravi


"abcdefghijklmnop"

Here I want to extract the common string "abcdefghijklmnop". Basically I
need a fast way to do that for any two given strings. For my situation,
the common string will always be at the beginning of both strings. I can
use regular expressions to do this, but from what I understand there is
a lot of overhead. New data is being generated at the rate of about 1GB
per hour, so this needs to be reasonably fast while leaving CPU time for
other processes.

Thanks
Ravi

推荐答案

>>
虽然你可以原谅没有猜到,os.path是
看的地方:
import os.path
a =" abcdefghijklmnopqrstuvwxyz"
b =" abcdefghijklmnopBHLHT"
print os.path.commonprefix([a,b])

-Scott David Daniels
Sc *********** @ Acm.Org
While you can be forgiven for not have guessed, os.path is the place to
look:
import os.path
a = "abcdefghijklmnopqrstuvwxyz"
b = "abcdefghijklmnopBHLHT"
print os.path.commonprefix([a,b])

-Scott David Daniels
Sc***********@Acm.Org




当然不是我期待的地方,谢谢


Ravi



Certainly not where I was expecting it, Thanks

Ravi


-----开始PGP签名留言 - ----

哈希:SHA1


周六,2003年8月2日17:39:26 -0400,

Ravi < RX **** @ cwru.edu>写道:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On Sat, 02 Aug 2003 17:39:26 -0400,
Ravi <rx****@cwru.edu> wrote:


我有大约200GB的数据需要经过并提取
共同的第一部分。这样的事情。
Hi,

I have about 200GB of data that I need to go through and extract the
common first part of a line. Something like this.
a =" abcdefghijklmnopqrstuvwxyz"
b =" abcdefghijklmnopBHLHT"
c = extract(a, b)
print c
a = "abcdefghijklmnopqrstuvwxyz"
b = "abcdefghijklmnopBHLHT"
c = extract(a,b)
print c


abcdefghijklmnop

这里我想提取公共字符串abcdefghijklmnop。基本上我需要一个快速的方法来处理任何两个给定的字符串。对于我的情况,
公共字符串将始终位于两个字符串的开头。我可以使用正则表达式来做到这一点,但据我所知,有很多开销。新数据的生成速度约为每小时1GB
,因此需要相当快的速度,同时为其他进程留出CPU时间。

谢谢
Ravi


"abcdefghijklmnop"

Here I want to extract the common string "abcdefghijklmnop". Basically I
need a fast way to do that for any two given strings. For my situation,
the common string will always be at the beginning of both strings. I can
use regular expressions to do this, but from what I understand there is
a lot of overhead. New data is being generated at the rate of about 1GB
per hour, so this needs to be reasonably fast while leaving CPU time for
other processes.

Thanks
Ravi




您是否尝试将任何字符串匹配?或只有一对如上?

-----开始PGP签名-----

版本:GnuPG v1.2.2(GNU / Linux)

iD8DBQE / LENWd90bcYOAWPYRAtWhAJ4ozTD1G3xLzVkeuJvPDJTsLbkcBQ CfX4E0

YR / + zWSPDwX0uUf8y0QkxJs =

= sGTb

----- END PGP SIGNATURE -----


-

Jim Richardson http://www.eskimo.com/~warlock


Linux,因为最终,你的成长足以让人信任用fork()



Are you trying to match any to any strings? or only a pair as above?
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQE/LENWd90bcYOAWPYRAtWhAJ4ozTD1G3xLzVkeuJvPDJTsLbkcBQ CfX4E0
YR/+zWSPDwX0uUf8y0QkxJs=
=sGTb
-----END PGP SIGNATURE-----

--
Jim Richardson http://www.eskimo.com/~warlock

Linux, because eventually, you grow up enough to be trusted with a fork()


你是否想要匹配任何字符串?或者只有一对如上?
Are you trying to match any to any strings? or only a pair as above?




一次一对,我只想要前缀为N / B
的前N个字符两个字符串。 os.path.commonprefix工作得很好。谢谢

的帮助。


Ravi



Just a pair at a time, and I only want the first N characters that are
common to both strings. The os.path.commonprefix works nicely. Thanks
for your help.

Ravi


这篇关于匹配两个字符串的开头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆