文件中不同行的数量 [英] number of different lines in a file

查看：67 发布时间：2019/6/5 1:17:37 python

本文介绍了文件中不同行的数量的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个百万行的文本文件，每行100个字符，

，只需要确定有多少行不同。

在我的电脑上，这个小程序永远不会落伍：

def number_distinct（fn）：

f = file（fn）

x = f.readline（）。strip（）

L = []

而x<>''''：

如果x不在L中：

L = L + [x]

x = f.readline（）。strip（）

返回len（L）

有人想指出改进吗？

这样做有更好的算法吗？

I have a million-line text file with 100 characters per line,
and simply need to determine how many of the lines are distinct.

On my PC, this little program just goes to never-never land:

def number_distinct(fn):
f = file(fn)
x = f.readline().strip()
L = []
while x<>'''':
if x not in L:
L = L + [x]
x = f.readline().strip()
return len(L)

Would anyone care to point out improvements?
Is there a better algorithm for doing this?

推荐答案

res写道：

我有一个百万行文本文件，每行100个字符，
只需要确定有多少行是不同的。

我的电脑，这个小程序只是永远不会落地：

def number_distinct（fn）：
f = file（fn）
x = f.readline（）。 strip（）
L = []
而x<>''''：
如果x不在L：
L = L + [x]
x = f.readline（）。strip（）
返回len（L）

有人想指出改进吗？
有没有更好的算法来做到这一点？

I have a million-line text file with 100 characters per line,
and simply need to determine how many of the lines are distinct.

On my PC, this little program just goes to never-never land:

def number_distinct(fn):
f = file(fn)
x = f.readline().strip()
L = []
while x<>'''':
if x not in L:
L = L + [x]
x = f.readline().strip()
return len(L)

Would anyone care to point out improvements?
Is there a better algorithm for doing this?

听起来像家庭作业，但我会咬人。

def number_distinct（fn）：

hash_dict = {}

total_lines = 0

for line in open（fn，''r''）：

total_lines + = 1

key = hash（line.strip（））

如果hash_dict.has_key（key）：continue

hash_dict [key] = 1

返回tal_lines，len（hash_dict.keys（））

if __name __ ==" __ main __"：

fn =''c：\\ test .txt''

total_lines，distinct_lines = number_distinct（fn）

print" Total lines =％i，distinct lines =％i" ％（total_lines，distinct_lines）

-Larry Bates

Sounds like homework, but I''ll bite.

def number_distinct(fn):
hash_dict={}
total_lines=0
for line in open(fn, ''r''):
total_lines+=1
key=hash(line.strip())
if hash_dict.has_key(key): continue
hash_dict[key]=1

return total_lines, len(hash_dict.keys())

if __name__=="__main__":
fn=''c:\\test.txt''
total_lines, distinct_lines=number_distinct(fn)
print "Total lines=%i, distinct lines=%i" % (total_lines, distinct_lines)
-Larry Bates

" r.e.s." < R * @ ZZmindspring.com>。写道：

"r.e.s." <r.*@ZZmindspring.com> writes:

我有一个百万行的文本文件，每行100个字符，
只需要确定有多少行是不同的。

I have a million-line text file with 100 characters per line,
and simply need to determine how many of the lines are distinct.

我会通过允许调用者传递任何可迭代的

项来概括它。文件句柄可以通过这种方式迭代，但任何

序列也可以迭代。

def count_distinct（seq）：

"""计算不同项目的数量"

count = dict（）

for seq中的项目：

如果不是项目计数：

计数[item] = 0

计数[item] + = 1

返回len（计数）

I''d generalise it by allowing the caller to pass any iterable set of
items. A file handle can be iterated this way, but so can any
sequence or iterable.

def count_distinct(seq):
""" Count the number of distinct items """
counts = dict()
for item in seq:
if not item in counts:
counts[item] = 0
counts[item] += 1
return len(counts)

infile = file（''foo.txt''）
for line in file（''foo.txt'' ）：
...打印行，

...

abc

def

ghi

abc

ghi

def

xyz

abc

abc

def

infile = file（''foo.txt''）
print count_distinct（infile）

infile = file(''foo.txt'')
for line in file(''foo.txt''): ... print line,
...
abc
def
ghi
abc
ghi
def
xyz
abc
abc
def
infile = file(''foo.txt'')
print count_distinct(infile)

-

\一个男人可能是个傻瓜，不知道 - - 但不是如果他是| b $ b'\结婚。 - Henry L. Mencken |
_o__）|

Ben Finney

5

--
\ "A man may be a fool and not know it -- but not if he is |
`\ married." -- Henry L. Mencken |
_o__) |
Ben Finney

r.e.s。写道：

r.e.s. wrote:

我有一个百万行的文本文件，每行100个字符，
只需要确定有多少行是不同的。
<在我的电脑上，这个小程序永远不会落地：

def number_distinct（fn）：
f = file（fn）
x = f。 readline（）。strip（）
L = []
而x<>''''：
如果x不在L中：
L = L + [x]
x = f.readline（）。strip（）
返回len（L）

ouch。

有人关心指出改进措施？
有没有更好的算法呢？

I have a million-line text file with 100 characters per line,
and simply need to determine how many of the lines are distinct.

On my PC, this little program just goes to never-never land:

def number_distinct(fn):
f = file(fn)
x = f.readline().strip()
L = []
while x<>'''':
if x not in L:
L = L + [x]
x = f.readline().strip()
return len(L)
ouch.
Would anyone care to point out improvements?
Is there a better algorithm for doing this?

试试这个：

def number_distinct （fn）：

返回len（set（s.strip（）for s in open（fn）））

< / F>

try this:

def number_distinct(fn):
return len(set(s.strip() for s in open(fn)))

</F>

这篇关于文件中不同行的数量的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

文件中不同行的数量 [英] number of different lines in a file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

文件中不同行的数量 [英] number of different lines in a file

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭