快速/高效地计算Python中空格分隔的字符串列表 [英] Fast/Efficient counting of list of space delimited strings in Python

查看:224
本文介绍了快速/高效地计算Python中空格分隔的字符串列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

给出输入:

x = ['foo bar', 'bar blah', 'black sheep']

我可以这样做来获取以空格分隔的字符串列表中每个单词的计数:

I could do this to get the count of each word in the list of space delimited string:

from itertools import chain
from collections import Counter
c = Counter(chain(*map(str.split, x)))

或者我可以简单地遍历并获得:

Or I could simple iterate through and get:

c = Counter()
for sent in x:
    for word in sent.split():
        c[word]+=1

[出]:

Counter({'bar': 2, 'sheep': 1, 'blah': 1, 'foo': 1, 'black': 1})

问题是如果字符串的输入列表非常大,效率更高?还有其他方法吗?来实现相同的计数器对象?

The question is which is more efficient if the input list of string is extremely huge? Are there other ways to achieve the same counter object?

想象一下,这是一个文本文件对象,其中包含数十亿行,每行10-20个单词。

Imagine it's a text file object that has billions of lines with 10-20 words each.

推荐答案

您的问题的答案是分析

以下是一些分析工具:

  • print time.time() in strategic places. (or use Unix time)
  • cProfile
  • line_profiler
  • heapy tracks all objects inside Python’s memory (good for memory leaks)
  • For long-running systems, use dowser: allows live objects introspection (web browser interface)
  • memory_profiler for RAM usage
  • examine Python bytecode with dis

这篇关于快速/高效地计算Python中空格分隔的字符串列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆