使用 sed 跳过/删除非 ascii 字符 [英] Skip/remove non-ascii character with sed

查看:101
本文介绍了使用 sed 跳过/删除非 ascii 字符的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

Chip,Dirkland,DrobæSphere Inc,cdirkland@hotmail.com,美国

Chip,Dirkland,DrobæSphere Inc,cdirkland@hotmail.com,usa

我一直在尝试使用 sed 修改 .csv 中的电子邮件地址,但上面的行一直让我绊倒,使用如下命令:

I've been trying to use sed to modify email addresses in a .csv but the line above keeps tripping me up, using commands like:

sed -i 's/[\d128-\d255]//' FILENAME

来自这个 stackoverflow 问题

似乎不起作用,因为我收到无效的排序规则字符"错误.

doesn't seem to work as I get an 'invalid collation character' error.

理想情况下,我根本不想更改组合的 AE 字符,我宁愿直接跳过它,因为我不是要操纵该文本,而是要操纵电子邮件地址.只要那个 AE 在那里,虽然它会导致我的 sed 替换在一行后失败,删除该字符并处理整个文件.

Ideally I don't want to change that combined AE character at all, I'd rather sed just skip right over it as I'm not trying to manipulate that text but rather the email addresses. As long as that AE is in there though it causes my sed substitution to fail after one line, delete the character and it processes the whole file fine.

有什么想法吗?

推荐答案

这可能对你有用(GNU sed):

This might work for you (GNU sed):

echo "Chip,Dirkland,DrobæSphere Inc,cdirkland@hotmail.com,usa" |
sed 's/\o346/a+e/g'
Chip,Dirkland,Droba+eSphere Inc,cdirkland@hotmail.com,usa

然后做你必须做的,然后恢复做:

Then do what you have to do and after to revert do:

echo "Chip,Dirkland,Droba+eSphere Inc,cdirkland@hotmail.com,usa" | 
sed 's/a+e/\o346/g'
Chip,Dirkland,DrobæSphere Inc,cdirkland@hotmail.com,usa

如果您在字符串中有棘手的字符并想了解 sed 如何看待它们,请使用 l0 命令(请参阅 此处).对于调试困难的正则表达式也非常有用.

If you have tricky characters in strings and want to understand how sed sees them use the l0 command (see here). Also very useful for debugging difficult regexps.

echo "Chip,Dirkland,DrobæSphere Inc,cdirkland@hotmail.com,usa" | 
sed -n 'l0'
Chip,Dirkland,Drob\346Sphere Inc,cdirkland@hotmail.com,usa$

这篇关于使用 sed 跳过/删除非 ascii 字符的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆