使用剪切在bash上的文件具有唯一deliminter [英] Using cut in bash on a file with a unique deliminter

查看:265
本文介绍了使用剪切在bash上的文件具有唯一deliminter的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

剪切在bash用于与¬定界符?

Can cut be used in bash with the ¬ delimiter?

这问题是涉及的话题<一的延伸href=\"http://stackoverflow.com/questions/492090/least-used-delimiter-character-in-normal-text-ascii-128\">here.在该链路的目标之一间pretation是使用不能在人类文本中找到(或很少发现)的分隔符。假设我们选择'不签(¬)作为分隔符。我的问题是关于使用剪切的拉动与文件中的特定列说分隔符。

This question is an extension of the topic covered here. One interpretation of the goal in that link is to use a delimiter that can not be found (or very rarely found) in human text. Say we choose the 'Not Sign' (¬) as a delimiter. My question is regarding the use of cut to pull specific columns of a file with said delimiter.

举例来说,假设我们创建一个¬分隔符的文件。该文件prac.txt可能类似于:

For example, say that we create a file with the ¬ delimiter. The file prac.txt might look like:

$cat prac.txt
"Billy""Car"¬"Red"¬"Garage"¬"3"
"Rob"¬"Truck"¬"Blue"¬"Street"¬"14" 

下面的过程中产生的错误:

The following process produces an error:

$cut -d'¬' -f1 prac.txt  
cut: the delimiter must be a single character
Try `cut --help' for more information.

正确的输出是:

"Billy"
"Rob"

从蟒蛇可能有用的信息:

Possibly useful info from python:

import unicodedata
>>>unicodedata.lookup('Not sign')
u'\xac'

可能有用的字符转换链接

我的猜测是, -d 标志使用,我还没有尝试过¬的一些重新presentation否则它只能与一个ascii字符工作。感谢您事先的任何帮助。

My guess is that the -d flag uses some representation of '¬' that I have not tried yet or else it only works with single ascii characters. Thanks in advance for any help.

推荐答案

在UTF-8,不签是连接两个字节codeD C2 AC 。和剪切不解决这个问题,这无疑是一个错误。请参见这个讨论上unix.stackexchange。

In UTF-8, the "not sign" is encoded in two bytes c2 ac. and cut doesn't handle this, which is arguably a bug. See this discussion on unix.stackexchange.

这篇关于使用剪切在bash上的文件具有唯一deliminter的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆