sort:字符串比较失败多字节或宽字符无效或不完整 [英] sort: string comparison failed Invalid or incomplete multibyte or wide character

查看:3034
本文介绍了sort:字符串比较失败多字节或宽字符无效或不完整的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图对文本文件使用以下命令:

  $ sort< m.txt | uniq -c | sort -nr> m.dict 

但是,我收到以下错误信息:



sort:字符串比较失败:多字节或宽字符无效或不完整
sort:设置LC_ALL ='C'来解决问题。
sort:比较的字符串是'enwedig\r'和'mwy \r'。

我在Windows 7上使用Cygwin,并且在编辑m.txt时遇到麻烦,在文件内新行。请参阅:



使用AWK将文本文件中的每个单词放在一个新行上



我不知道如果我得到这些错误,由于这个,或者因为m.txt包含字符从威尔士字母表(当我在Python中使用威尔士文本,我需要更改编码为拉丁语 - 1')。



我尝试按照错误消息的建议和更改LC_ALL ='C',但这没有帮助。任何人都可以详细解释我收到的错误,并提供任何建议,如何我会尝试解决这个问题。



更新:



尝试dos2unix时,在某些行显示有关无效字符的错误。原来,这些不是威尔士字符,但其他奇怪的字符(箭头等)。我通过我的文本文件删除这些字符,直到我能够使用dos2unix命令没有错误。但是,在使用dos2unix命令后,所有的文本被连接(没有空格/换行符或任何东西,而应该是这样,使文件中的每个单词在一个单独的行)然后使用unix2dos和文本文件恢复正常。我如何在自己的每一行上的每个单词,并使用排序命令,而不给我关于'\r'字符的错误。

解决方案

看起来像Windows行结尾相关问题( \r\\\
对比 \\\
) 。您可以使用


将$ m.txt 转换为Unix行末尾。
$ b

  dos2unix m.txt 

,然后重新运行您的命令。


I'm trying to use the following command on a text file:

$ sort <m.txt | uniq -c | sort -nr >m.dict 

However I get the following error message:

sort: string comparison failed: Invalid or incomplete multibyte or wide character
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were ‘enwedig\r’ and ‘mwy\r’.

I'm using Cygwin on Windows 7 and was having trouble earlier editing m.txt to put each word within the file on a new line. Please see:

Using AWK to place each word in a text file on a new line

I'm not sure if I'm getting these errors due to this, or because m.txt contains characters from the Welsh alphabet (When I was working with Welsh text in Python, I was required t change the encoding to 'Latin-1').

I tried following the error message's advice and changing LC_ALL='C' however this has not helped. Can anyone elaborate on the errors I'm receiving and provide any advice on how I might go about trying to solve this problem.

UPDATE:

When trying dos2unix, errors were being displayed about invalid characters at certain lines. It turns out these were not Welsh characters, but other strange characters (arrows etc). I went through my text file removing these characters until I was able to use the dos2unix command without error. However, after using the dos2unix command all the text was concatenated (no spaces/newlines or anything, whereas it should have been so that each word in the file was on a seperate line) I then used unix2dos and the text file was back to normal. How can I each word on its own individual line and use the sort command without it giving me errors about '\r' characters?

解决方案

Looks like a Windows line-ending related problem (\r\n versus \n). You can convert m.txt to Unix line-endings with

dos2unix m.txt

and then rerun your command.

这篇关于sort:字符串比较失败多字节或宽字符无效或不完整的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆