如何删除非ASCII字符并在非ASCII字符使用Perl单线程的字段中追加空格？ [英] How-to remove non-ascii characters and append a space in the field where the non-ascii characters were using a Perl one-liner?

查看：516 发布时间：2018/2/4 11:40:12 regex perl formatting

本文介绍了如何删除非ASCII字符并在非ASCII字符使用Perl单线程的字段中追加空格？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

嗨Stack Overflow社区，

我有以下问题。

我得到了这个名为 bad 的文件，内容如下：

 垃圾邮件箱邮政信箱5555假人街
垃圾箱邮箱1234 LOLLERCOASTER VILLAGE 
 LOL MAN PO BOX 9876 NEXT DOOR

我想从中删除非ASCII字符（在第二条记录的第二列的开头），为了获得一个没有奇怪的字符和所有列对齐的文件。此外，还有一个要求是使用 Perl单线程来实现这一点 - 所以，没有 awk ，sed code>或类似的命令都可以使用。

  $ perl -plne's / [^ [ ：ascii：]] // g'bad> bad.clean 
 
 $ cat bad.clean 
垃圾邮件邮箱5555假邮箱
邮箱邮箱1234 LOLLERCOASTER VILLAGE 
 LOL MAN PO BOX 9876 NEXT DOOR

我也尝试过使用相同的单行，但这次用空格替换非ascii字符。在这种情况下，记录在第二列中增加了两个空格，在第三列中增加了一个空格：

  $ perl -plne's / [^ [：ascii：]] / / g'bad> bad.clean.space 
 
 $ cat bad.clean.space 
垃圾邮件箱邮政信箱5555假人街
垃圾邮箱1234 LOLLERCOASTER村
 LOL MAN PO PO BOX 9876 NEXT DOOR

不知何故，非ascii字符似乎是取2个字节 - strong>这是正确的，还是我错过了什么？

 
 
 预期的输出是这样的：
 
 
 垃圾邮件箱邮政信箱5555假人街
垃圾邮箱1234 LOLLERCOASTER VILLAGE 
 LOL MAN PO BOX 9876 NEXT DOOR 
  
有没有办法，使用Perl单行程来获得预期的结果？我正在考虑在删除非ASCII字符之后添加一个空格的方式，在已经进行了更改的字段中，但我找不到方法来执行此操作。另外，非ASCII字符可以出现在任何字段上，而不是在第二个字段中。顺便说一下，一些可能有用的信息：这是一个 AIX 机器，运行 Perl v5.8.8 。
 
 
 谢谢！ 
 
 
 
 
 编辑：
 
由于@ThisSuitIsBlackNot提到，有两个非ascii字符。因此，如果至少一个非ascii字符被命令删除，我想我只想在该字段的末尾添加一个空格。 有没有办法让这个额外的空间包括在同一个句子中，所以它也可以作为一个单线程来完成？  
 
 
 
 
 编辑：
 
 
查看大量数据后，我可以看出，非ascii字符总是以成对出现，和原始文件中的下一个字段（在运行单行程之前）总是与其他列相比右边一个空格。所以，我改变了这个问题的标题，以符合要求： Perl单线程去除非ASCII字符并在非ASCII字符的字段中追加空格> 
 
解决方案
取出2个非ASCII字符，在字段后面添加一个空格。
使用非ASCII字符和3个空格分隔符对。 
 ＃s / [^ [：ascii：]] {2}（。*？[] {3}）/ $ 1 / g 
 
 [^ [：ascii：]] {2} 
（。*？[] {3}）
  
 $ b  Perl测试用例
 
 
  $ / = undef; 
 $ str =< DATA>; 
 $ str =〜s / [^ [：ascii：]] {2}（。*？[] {3}）/ $ 1 / g; 
 print $ str; 
 
 __DATA__ 
垃圾邮件箱邮政信箱5555假人街
垃圾箱* 1234 LOLLERCOASTER VILLAGE 
 LOL MAN PO BOX 9876 NEXT DOOR 
  
输出>> 
  SPAM EATER PO BOX 5555 FAKE STREET 
 FOO BAR PO BOX 1234 LOLLERCOASTER VILLAGE 
 LOL MAN PO BOX 9876 NEXT DOOR 
  
 
Hi Stack Overflow community, 


I have the following problem.  

I got this file called bad, with the following contents:
SPAM EATER       PO BOX 5555          FAKE STREET
FOO BAR          Ã¬PO BOX 1234         LOLLERCOASTER VILLAGE
LOL MAN          PO BOX 9876          NEXT DOOR
I want to remove the non-ascii character from it (at the start of the second column of the second record), in order to get a file free of strange characters and with all its columns aligned.  Plus, there's this one requirement to achieve this using a Perl one-liner - so, no awk, sed, or alike commands can be used.  I tried the following, but got short by one space in the third column:
$ perl -plne 's/[^[:ascii:]]//g' bad > bad.clean

$ cat bad.clean
SPAM EATER       PO BOX 5555          FAKE STREET
FOO BAR          PO BOX 1234         LOLLERCOASTER VILLAGE
LOL MAN          PO BOX 9876          NEXT DOOR
I also tried using the same one-liner, but this time replacing the non-ascii character by a space.  In this case, the record ended up with two extra spaces in the second column, and one extra space in the third:
$ perl -plne 's/[^[:ascii:]]/ /g' bad > bad.clean.space

$ cat bad.clean.space
SPAM EATER       PO BOX 5555          FAKE STREET
FOO BAR            PO BOX 1234         LOLLERCOASTER VILLAGE
LOL MAN          PO BOX 9876          NEXT DOOR
Somehow, the non-ascii character seems to be taking 2 bytes instead of one - Is this correct, or am I missing something?

The expected output is this:
SPAM EATER       PO BOX 5555          FAKE STREET
FOO BAR          PO BOX 1234          LOLLERCOASTER VILLAGE
LOL MAN          PO BOX 9876          NEXT DOOR
Is there a way, using a Perl one-liner, to get the results as expected?  I was thinking of a way to add one space after removing the non-ascii character, in the field in which the change has been made, but I can't find the way to do it.  In addition, the non-ascii character can appear on any field, not only in the second one. 

By the way, some info that might be useful:  This is an AIX machine, running Perl v5.8.8.

Thank you!



Edit:

As @ThisSuitIsBlackNot mentions, there are two non-ascii characters.  Therefore, I guess I just want to add one space to the end of that field, if at least one non-ascii character gets removed by the command.  Is there a way to get this extra space included in the same sentence, so it can be done as a one-liner as well?



Edit:

After reviewing a large set of data, I can tell that the non-ascii characters always appears as pairs, and the next field in the original file (before running the one-liner) is always one space to the right compared to the other columns.  So, I'm changing the title of this question to match the requirement:  Perl one-liner to remove non-ascii characters and append a space in the field where the non-ascii characters were
 解决方案 
Take out 2 non-ascii, add one space after field.

Uses non-ascii and 3 spaces as delimiter pairs.  
 #  s/[^[:ascii:]]{2}(.*?[ ]{3})/$1 /g

 [^[:ascii:]]{2} 
 ( .*? [ ]{3} )
Perl test case  
$/ = undef;
$str = <DATA>;
$str =~ s/[^[:ascii:]]{2}(.*?[ ]{3})/$1 /g;
print $str;

__DATA__
SPAM EATER       PO BOX 5555          FAKE STREET
FOO BAR          Ã¬PO BOX 1234         LOLLERCOASTER VILLAGE
LOL MAN          PO BOX 9876          NEXT DOOR
Output >>  
SPAM EATER       PO BOX 5555          FAKE STREET
FOO BAR          PO BOX 1234          LOLLERCOASTER VILLAGE
LOL MAN          PO BOX 9876          NEXT DOOR


                        
这篇关于如何删除非ASCII字符并在非ASCII字符使用Perl单线程的字段中追加空格？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何删除非ASCII字符并在非ASCII字符使用Perl单线程的字段中追加空格？ [英] How-to remove non-ascii characters and append a space in the field where the non-ascii characters were using a Perl one-liner?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何删除非ASCII字符并在非ASCII字符使用Perl单线程的字段中追加空格？ [英] How-to remove non-ascii characters and append a space in the field where the non-ascii characters were using a Perl one-liner?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭