文件中不同行的数量 [英] number of different lines in a file

查看:67
本文介绍了文件中不同行的数量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个百万行的文本文件,每行100个字符,

,只需要确定有多少行不同。


在我的电脑上,这个小程序永远不会落伍:


def number_distinct(fn):

f = file(fn)

x = f.readline()。strip()

L = []

而x<>'''':

如果x不在L中:

L = L + [x]

x = f.readline()。strip()

返回len(L)


有人想指出改进吗?

这样做有更好的算法吗?

I have a million-line text file with 100 characters per line,
and simply need to determine how many of the lines are distinct.

On my PC, this little program just goes to never-never land:

def number_distinct(fn):
f = file(fn)
x = f.readline().strip()
L = []
while x<>'''':
if x not in L:
L = L + [x]
x = f.readline().strip()
return len(L)

Would anyone care to point out improvements?
Is there a better algorithm for doing this?

推荐答案

res写道:
我有一个百万行文本文件,每行100个字符,
只需要确定有多少行是不同的。

我的电脑,这个小程序只是永远不会落地:

def number_distinct(fn):
f = file(fn)
x = f.readline()。 strip()
L = []
而x<>'''':
如果x不在L:
L = L + [x]
x = f.readline()。strip()
返回len(L)

有人想指出改进吗?
有没有更好的算法来做到这一点?
I have a million-line text file with 100 characters per line,
and simply need to determine how many of the lines are distinct.

On my PC, this little program just goes to never-never land:

def number_distinct(fn):
f = file(fn)
x = f.readline().strip()
L = []
while x<>'''':
if x not in L:
L = L + [x]
x = f.readline().strip()
return len(L)

Would anyone care to point out improvements?
Is there a better algorithm for doing this?




听起来像家庭作业,但我会咬人。


def number_distinct(fn):

hash_dict = {}

total_lines = 0

for line in open(fn,''r''):

total_lines + = 1

key = hash(line.strip())

如果hash_dict.has_key(key):continue

hash_dict [key] = 1


返回tal_lines,len(hash_dict.keys())

if __name __ ==" __ main __":

fn =''c:\\ test .txt''

total_lines,distinct_lines = number_distinct(fn)

print" Total lines =%i,distinct lines =%i" %(total_lines,distinct_lines)

-Larry Bates



Sounds like homework, but I''ll bite.

def number_distinct(fn):
hash_dict={}
total_lines=0
for line in open(fn, ''r''):
total_lines+=1
key=hash(line.strip())
if hash_dict.has_key(key): continue
hash_dict[key]=1

return total_lines, len(hash_dict.keys())

if __name__=="__main__":
fn=''c:\\test.txt''
total_lines, distinct_lines=number_distinct(fn)
print "Total lines=%i, distinct lines=%i" % (total_lines, distinct_lines)
-Larry Bates


" r.e.s." < R * @ ZZmindspring.com>。写道:
"r.e.s." <r.*@ZZmindspring.com> writes:
我有一个百万行的文本文件,每行100个字符,
只需要确定有多少行是不同的。
I have a million-line text file with 100 characters per line,
and simply need to determine how many of the lines are distinct.




我会通过允许调用者传递任何可迭代的

项来概括它。文件句柄可以通过这种方式迭代,但任何

序列也可以迭代。


def count_distinct(seq):

"""计算不同项目的数量"

count = dict()

for seq中的项目:

如果不是项目计数:

计数[item] = 0

计数[item] + = 1

返回len(计数)



I''d generalise it by allowing the caller to pass any iterable set of
items. A file handle can be iterated this way, but so can any
sequence or iterable.

def count_distinct(seq):
""" Count the number of distinct items """
counts = dict()
for item in seq:
if not item in counts:
counts[item] = 0
counts[item] += 1
return len(counts)

infile = file(''foo.txt'')
for line in file(''foo.txt'' ):
...打印行,

...

abc

def

ghi

abc

ghi

def

xyz

abc

abc

def

infile = file(''foo.txt'')
print count_distinct(infile)
infile = file(''foo.txt'')
for line in file(''foo.txt''): ... print line,
...
abc
def
ghi
abc
ghi
def
xyz
abc
abc
def
infile = file(''foo.txt'')
print count_distinct(infile)






-

\一个男人可能是个傻瓜,不知道 - - 但不是如果他是| b $ b'\结婚。 - Henry L. Mencken |
_o__)|

Ben Finney


5

--
\ "A man may be a fool and not know it -- but not if he is |
`\ married." -- Henry L. Mencken |
_o__) |
Ben Finney


r.e.s。写道:
r.e.s. wrote:
我有一个百万行的文本文件,每行100个字符,
只需要确定有多少行是不同的。
<在我的电脑上,这个小程序永远不会落地:

def number_distinct(fn):
f = file(fn)
x = f。 readline()。strip()
L = []
而x<>'''':
如果x不在L中:
L = L + [x]
x = f.readline()。strip()
返回len(L)


ouch。

有人关心指出改进措施?
有没有更好的算法呢?
I have a million-line text file with 100 characters per line,
and simply need to determine how many of the lines are distinct.

On my PC, this little program just goes to never-never land:

def number_distinct(fn):
f = file(fn)
x = f.readline().strip()
L = []
while x<>'''':
if x not in L:
L = L + [x]
x = f.readline().strip()
return len(L)
ouch.
Would anyone care to point out improvements?
Is there a better algorithm for doing this?




试试这个:


def number_distinct (fn):

返回len(set(s.strip()for s in open(fn)))


< / F>



try this:

def number_distinct(fn):
return len(set(s.strip() for s in open(fn)))

</F>


这篇关于文件中不同行的数量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆