如何在Python中逐字输入行? [英] How to input a line word by word in Python?

查看:65
本文介绍了如何在Python中逐字输入行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个文件,每个文件都有一行,每个数字都有大约10M.我想检查每个文件,并为每个重复编号的文件打印0,为没有重复编号的文件打印1.

I have multiple files, each with a line with, say ~10M numbers each. I want to check each file and print a 0 for each file that has numbers repeated and 1 for each that doesn't.

我正在使用一个列表来计数频率.由于每行有大量数字,因此我想在接受每个数字后更新频率,并在找到重复的数字后立即中断.尽管这在C语言中很简单,但是我不知道如何在Python中做到这一点.

I am using a list for counting frequency. Because of the large amount of numbers per line I want to update the frequency after accepting each number and break as soon as I find a repeated number. While this is simple in C, I have no idea how to do this in Python.

如何在不存储(或作为输入)整行的情况下逐字输入一行?

How do I input a line in a word-by-word manner without storing (or taking as input) the whole line?

我还需要一种从实时输入而不是文件中执行此操作的方法.

I also need a way for doing this from live input rather than a file.

推荐答案

读取行,拆分行,将数组结果复制到一组中.如果集合的大小小于数组的大小,则文件包含重复的元素

Read the line, split the line, copy the array result into a set. If the size of the set is less than the size of the array, the file contains repeated elements

with open('filename', 'r') as f:
    for line in f:
        # Here is where you do what I said above

要逐字读取文件,请尝试

To read the file word by word, try this

import itertools

def readWords(file_object):
    word = ""
    for ch in itertools.takewhile(lambda c: bool(c), itertools.imap(file_object.read, itertools.repeat(1))):
        if ch.isspace():
            if word: # In case of multiple spaces
                yield word
                word = ""
            continue
        word += ch
    if word:
        yield word # Handles last word before EOF

然后您可以执行以下操作:

Then you can do:

with open('filename', 'r') as f:
    for num in itertools.imap(int, readWords(f)):
        # Store the numbers in a set, and use the set to check if the number already exists

此方法也应适用于流,因为它一次只能读取一个字节,并从输入流中输出一个以空格分隔的字符串.

This method should also work for streams because it only reads one byte at a time and outputs a single space delimited string from the input stream.

给出答案后,我已经对该方法进行了相当多的更新.看看

After giving this answer, I've updated this method quite a bit. Have a look

<script src="https://gist.github.com/smac89/bddb27d975c59a5f053256c893630cdc.js"></script>

这篇关于如何在Python中逐字输入行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆