如何按行数限制(或截断)文本文件? [英] How do I limit (or truncate) text file by number of lines?

查看:74
本文介绍了如何按行数限制(或截断)文本文件?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用终端/外壳来截断或以其他方式将文本文件限制为一定数量的行.

我有一个完整的文本文件目录,每个目录只有前 ~50k 行有用.

如何删除超过 50000 行的所有行?

解决方案

就地截断

要使用 sed 就地截断文件,您可以执行以下操作:

sed -i '50001,$d' 文件名

  • -i 表示就地.
  • d 表示删除.
  • 50001,$ 表示从 50001 到末尾的行.

您可以通过向 -i 添加扩展参数来备份文件,例如,.backup.bak:

sed -i.backup '50001,$d' 文件名

OS-X 或 FreeBSD 中,您必须提供一个参数-i - 避免备份:

sed -i '' '50001,$d' 文件名

长参数名称版本如下,带和不带备份参数:

sed --in-place '50001,$ d' 文件名sed --in-place=.backup '50001,$ d' 文件名

新建文件

要创建一个新的截断文件,只需从 head 重定向到新文件:

head -n50000 oldfilename >新文件名

  • -n50000 表示行数,head 否则默认为 10.
  • > 表示重定向到,覆盖可能存在的任何其他内容.
  • 如果您想附加到新文件中,请将 >> 替换为 >.

很遗憾,您无法重定向到同一个文件,这就是建议使用 sed 进行就地截断的原因.

没有sed?试试 Python!

这比 sed 多一点打字.毕竟 Sed 是流编辑器"的缩写,这是使用它的另一个原因,它是该工具适合的用途.

这是在 Linux 和 Windows 上使用 Python 3 测试的:

from collections import deque从 itertools 导入 islicedef 截断(文件名,行):with open(filename, 'r+') as f:blackhole = deque((),0).extendfile_iterator = iter(f.readline, '')黑洞(islice(文件迭代器,行))f.truncate(f.tell())

解释Python:

黑洞的工作原理类似于/dev/null.它是 deque 上的一个绑定 extend 方法,带有 maxlen=0,这是在 Python 中耗尽迭代器的最快方法(我是意识到).

我们不能简单地遍历文件对象,因为 tell 方法会被阻塞,所以我们需要 iter(f.readline, '') 技巧.

这个函数演示了上下文管理器,但它有点多余,因为 Python 会在退出函数时关闭文件.用法很简单:

<预><代码>>>>truncate('文件名', 50000)

I would like to use a terminal/shell to truncate or otherwise limit a text file to a certain number of lines.

I have a whole directory of text files, for each of which only the first ~50k lines are useful.

How do I delete all lines over 50000?

解决方案

In-place truncation

To truncate the file in-place with sed, you can do the following:

sed -i '50001,$ d' filename

  • -i means in place.
  • d means delete.
  • 50001,$ means the lines from 50001 to the end.

You can make a backup of the file by adding an extension argument to -i, for example, .backup or .bak:

sed -i.backup '50001,$ d' filename

In OS-X or FreeBSD you must provide an argument to -i - so to do this while avoiding making a backup:

sed -i '' '50001,$ d' filename

The long argument name version is as follows, with and without the backup argument:

sed --in-place '50001,$ d' filename
sed --in-place=.backup '50001,$ d' filename

New File

To create a new truncated file, just redirect from head to the new file:

head -n50000 oldfilename > newfilename

  • -n50000 means the number of lines, head otherwise defaults to 10.
  • > means to redirect into, overwriting anything else that might be there.
  • Substitute >> for > if you mean to append into the new file.

It is unfortunate that you cannot redirect into the same file, which is why sed is recommended for in-place truncation.

No sed? Try Python!

This is a bit more typing than sed. Sed is short for "Stream Editor" after all, and that's another reason to use it, it's what the tool is suited for.

This was tested on Linux and Windows with Python 3:

from collections import deque
from itertools import islice

def truncate(filename, lines):
    with open(filename, 'r+') as f:
        blackhole = deque((),0).extend
        file_iterator = iter(f.readline, '')
        blackhole(islice(file_iterator, lines))
        f.truncate(f.tell())

To explain the Python:

The blackhole works like /dev/null. It's a bound extend method on a deque with maxlen=0, which is the fastest way to exhaust an iterator in Python (that I'm aware of).

We can't simply loop over the file object because the tell method would be blocked, so we need the iter(f.readline, '') trick.

This function demonstrates the context manager, but it's a bit superfluous since Python would close the file on exiting the function. Usage is simply:

>>> truncate('filename', 50000)

这篇关于如何按行数限制(或截断)文本文件?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆