sort：字符串比较失败多字节或宽字符无效或不完整 [英] sort: string comparison failed Invalid or incomplete multibyte or wide character

查看：3034 发布时间：2017/3/9 21:27:09 string sorting unix cygwin uniq

本文介绍了sort：字符串比较失败多字节或宽字符无效或不完整的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图对文本文件使用以下命令：

  $ sort< m.txt | uniq -c | sort -nr> m.dict

但是，我收到以下错误信息：

 sort：字符串比较失败：多字节或宽字符无效或不完整
 sort：设置LC_ALL ='C'来解决问题。 
 sort：比较的字符串是'enwedig\r'和'mwy \r'。 
 我在Windows 7上使用Cygwin，并且在编辑m.txt时遇到麻烦，在文件内新行。请参阅：
 
 
  使用AWK将文本文件中的每个单词放在一个新行上 
 
 
 我不知道如果我得到这些错误，由于这个，或者因为m.txt包含字符从威尔士字母表（当我在Python中使用威尔士文本，我需要更改编码为拉丁语 - 1'）。 
 
 
 我尝试按照错误消息的建议和更改LC_ALL ='C'，但这没有帮助。任何人都可以详细解释我收到的错误，并提供任何建议，如何我会尝试解决这个问题。
 
 
 更新：
 
 
 尝试dos2unix时，在某些行显示有关无效字符的错误。原来，这些不是威尔士字符，但其他奇怪的字符（箭头等）。我通过我的文本文件删除这些字符，直到我能够使用dos2unix命令没有错误。但是，在使用dos2unix命令后，所有的文本被连接（没有空格/换行符或任何东西，而应该是这样，使文件中的每个单词在一个单独的行）然后使用unix2dos和文本文件恢复正常。我如何在自己的每一行上的每个单词，并使用排序命令，而不给我关于'\r'字符的错误。
解决方案
看起来像Windows行结尾相关问题（ \r\\\
 对比 \\\
 ） 。您可以使用
 
将$  m.txt 转换为Unix行末尾。
 $ b 
  dos2unix m.txt 
  
，然后重新运行您的命令。
 
I'm trying to use the following command on a text file:
$ sort <m.txt | uniq -c | sort -nr >m.dict 
However I get the following error message:
sort: string comparison failed: Invalid or incomplete multibyte or wide character
sort: Set LC_ALL='C' to work around the problem.
sort: The strings compared were ‘enwedig\r’ and ‘mwy\r’.
I'm using Cygwin on Windows 7 and was having trouble earlier editing m.txt to put each word within the file on a new line. Please see:

Using AWK to place each word in a text file on a new line

I'm not sure if I'm getting these errors due to this, or because m.txt contains characters from the Welsh alphabet (When I was working with Welsh text in Python, I was required t change the encoding to 'Latin-1'). 

I tried following the error message's advice and changing LC_ALL='C' however this has not helped. Can anyone elaborate on the errors I'm receiving and provide any advice on how I might go about trying to solve this problem.

UPDATE:

When trying dos2unix, errors were being displayed about invalid characters at certain lines. It turns out these were not Welsh characters, but other strange characters (arrows etc). I went through my text file removing these characters until I was able to use the dos2unix command without error. However, after using the dos2unix command all the text was concatenated (no spaces/newlines or anything, whereas it should have been so that each word in the file was on a seperate line) I then used unix2dos and the text file was back to normal. How can I each word on its own individual line and use the sort command without it giving me errors about '\r' characters?
 解决方案 
Looks like a Windows line-ending related problem (\r\n versus \n). You can convert m.txt to Unix line-endings with
dos2unix m.txt
and then rerun your command.

                        这篇关于sort：字符串比较失败多字节或宽字符无效或不完整的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

sort：字符串比较失败多字节或宽字符无效或不完整 [英] sort: string comparison failed Invalid or incomplete multibyte or wide character

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录关闭

sort：字符串比较失败多字节或宽字符无效或不完整 [英] sort: string comparison failed Invalid or incomplete multibyte or wide character

问题描述

相关文章

服务器开发最新文章

热门教程

热门工具

登录 关闭

登录关闭