正则表达式删除数字和 - 在开头 [英] regex remove digits and - in beginning

查看:31
本文介绍了正则表达式删除数字和 - 在开头的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个字符串列表,但我想改变这些字符串,这样它们对用户来说就不难看.一个示例列表是

I'm treating a list of strings, but I want to alter the strings so they don't look ugly to the user. An example list would be

2736162 Magazines
23-2311 Numbers
1-38122 Faces
5-231123 Newspapers
31-31235 Armynews
33-12331 Celebrities 1
33-22113 Celebrities 2
Cars
Glasses

我想要的是剪掉开头,这样丑陋的数字序列和-"被排除在外,用户只能看到有意义的数据,如:

And what I want is to trim out the beginning so that the ugly sequence of numbers and "-" are left out, and the user only sees the data that makes sense like:

Magazines
Numbers
Faces
Newspapers
Armynews
Celebrities 1
Celebrities 2
Cars
Glasses

如何用正则表达式去掉开头的数字/-?

How would I trim out the digits/-'s in the beginning with regex ?

编辑是否可以设计相同的 REGEX 来从以下内容中删除这些值:

EDIT Would it be possible to design the same REGEX to also strip these values from:

FFKKA9101U- Aquatic Environmental Chemistry
FLVKB0381U- Clinical Drug Development
4761-F-Filosofisk kulturkritik
B22-1U-Dynamic biochemistry

到:

Aquatic Environmental Chemistry
Clinical Drug Development
Filosofisk kulturkritik
Dynamic biochemistry

我想到的规则是,如果 a - 前只有大写字母、数字和 - 或 + 符号,则它只对机器有意义,而不是实际单词,因此应该删除,我不知道如何在正则表达式中表达这个.

the rule I would think of is that if there are only capital letters, digits and - or + signs before a - it only makes sense to the machine, and is not an actual word, and therefore should be stripped out, I don't know how to formulate this in regex though.

推荐答案

看起来你可以用空字符串匹配和替换 ^[\d-]*\s*.

It looks like you can match and replace ^[\d-]*\s* with the empty string.

[…] 是一个 字符类.[aeiou] 之类的东西匹配任何小写元音之一.\d 是数字字符类的简写,所以 [\d-] 匹配数字或破折号.\s 是空白字符类的简写.

The […] is a character class. Something like [aeiou] matches one of any of the lowercase vowels. \d is the shorthand for the digit character class, so [\d-] matches either a digit or a dash. The \s is the shorthand for the whitespace character class.

^线锚的开始.* 是零或更多"重复.

The ^ is the beginning of the line anchor. The * is "zero-or-more" repetition.

因此,该模式匹配一​​行开头的数字或破折号序列,后跟空格序列.

Thus the pattern matches, at the beginning of a line, a sequence of digits or dash, followed by a sequence of whitespaces.

从问题中不清楚,但如果输入是多行文本(而不是一次应用正则表达式一行),那么您需要启用 多行模式.

It's not clear from the question, but if the input is a multiline text (instead of applying the regex one line at a time), then you'd want to enable the multiline mode as well.

这是 C# 中的示例代码段:

Here's an example snippet in C#:

var text = @"
2736162 Magazines
23-2311 Numbers
1-38122 Faces
5-231123 Newspapers
31-31235 Armynews
33-12331 Celebrities 1
33-22113 Celebrities 2
Cars
Glasses
";

Console.WriteLine(
  Regex.Replace(
     text,
     @"^[\d-]*\s*",
     "",
     RegexOptions.Multiline
  )
);

输出是(如在 ideone.com 上看到的):

Magazines
Numbers
Faces
Newspapers
Armynews
Celebrities 1
Celebrities 2
Cars
Glasses

根据风格,您可能需要将多行模式指定为 /m 标志(或嵌入的 (?m) ).如果您将模式表示为字符串文字,您可能还需要将反斜杠加倍,例如在 Java 中,您可以使用 text.replaceAll("(?m)^[\\d-]*\\s*", "").

Depending on flavor, you may have to specify the multiline mode as a /m flag (or (?m) embedded). You may also have to double the backslash if you're representing the pattern as a string literal, e.g. in Java you can use text.replaceAll("(?m)^[\\d-]*\\s*", "").

[…] 字符类中包含 - 时要小心,因为它可以表示 range 而不是文字 <代码>- 字符.[a-z] 之类的东西匹配小写字母.[az-] 之类的东西匹配 'a''z''-'.

Do be careful when including the - inside a […] character class, since it can signify a range instead of a literal - character. Something like [a-z] matches a lowercase letter. Something like [az-] matches either 'a', 'z', or '-'.

这篇关于正则表达式删除数字和 - 在开头的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆