使用剪切在bash上的文件具有唯一deliminter [英] Using cut in bash on a file with a unique deliminter
问题描述
能剪切
在bash用于与¬
定界符?
Can cut
be used in bash with the ¬
delimiter?
这问题是涉及的话题<一的延伸href=\"http://stackoverflow.com/questions/492090/least-used-delimiter-character-in-normal-text-ascii-128\">here.在该链路的目标之一间pretation是使用不能在人类文本中找到(或很少发现)的分隔符。假设我们选择'不签(¬
)作为分隔符。我的问题是关于使用剪切
的拉动与文件中的特定列说分隔符。
This question is an extension of the topic covered here. One interpretation of the goal in that link is to use a delimiter that can not be found (or very rarely found) in human text. Say we choose the 'Not Sign' (¬
) as a delimiter. My question is regarding the use of cut
to pull specific columns of a file with said delimiter.
举例来说,假设我们创建一个¬
分隔符的文件。该文件prac.txt可能类似于:
For example, say that we create a file with the ¬
delimiter. The file prac.txt might look like:
$cat prac.txt
"Billy""Car"¬"Red"¬"Garage"¬"3"
"Rob"¬"Truck"¬"Blue"¬"Street"¬"14"
下面的过程中产生的错误:
The following process produces an error:
$cut -d'¬' -f1 prac.txt
cut: the delimiter must be a single character
Try `cut --help' for more information.
正确的输出是:
"Billy"
"Rob"
从蟒蛇可能有用的信息:
Possibly useful info from python:
import unicodedata
>>>unicodedata.lookup('Not sign')
u'\xac'
可能有用的字符转换链接。
我的猜测是, -d
标志使用,我还没有尝试过¬的一些重新presentation否则它只能与一个ascii字符工作。感谢您事先的任何帮助。
My guess is that the -d
flag uses some representation of '¬' that I have not tried yet or else it only works with single ascii characters. Thanks in advance for any help.
推荐答案
在UTF-8,不签是连接两个字节codeD C2 AC
。和剪切
不解决这个问题,这无疑是一个错误。请参见这个讨论上unix.stackexchange。
In UTF-8, the "not sign" is encoded in two bytes c2 ac
. and cut
doesn't handle this, which is arguably a bug. See this discussion on unix.stackexchange.
这篇关于使用剪切在bash上的文件具有唯一deliminter的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!