我该如何提取特定字符串的一部分从丑陋的字符串大块? [英] How do I extract PART of a specific string from a huge chunk of ugly strings?

查看:227
本文介绍了我该如何提取特定字符串的一部分从丑陋的字符串大块?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有,有一个网页的所有数据源的变量。这是一个大的字符串有很多的话,字符串,特殊字符等。

I have a variable that has all the data source of a web page. It's a large string with lots of words, strings, special characters, etc.

我想通过这个变量,并提取了票号。这是门票后/和前.json。在以下的情况下,我的列表将只有1,这是值15

I want to go through this variable and extract the ticket number. Which is after tickets/ and before .json. In the following case, my list would be only 1, which is the value 15.

https://company.zendesk.com/api/v2/tickets/15.json

此网页都会有这个链接的倍数的大量文字之间。在以下情况下,我的名单将有2项,值19和20。

This web page will have multiples of this link in between lots of text. In the following case, my list would have 2 items, the values 19 and 20.

https://company.zendesk.com/api/v2/tickets/19.json blahblahblajlkdfjfaiofjd3289239lkdj
2398283j;lkjfe89j2pefj2efljefkj
https://company.zendesk.com/api/v2/tickets/20.json blah blhahblbahlhkaldk

我怎么会去从这些链接中提取仅仅是票号在这个巨大的文件,并把它们放入一个列表?

How would I go about extracting JUST the ticket numbers from these links in this huge file and put them into a list?

我会用正则表达式?我真的不知道我怎么会处理这个。

Would I use Regex? I'm not really sure how I'd approach this.

顺便说一句,没有格式到这个页面。它不象它是一个XML文档或任何东西。

By the way, there is no format to this page. It's not like it's an XML doc or anything.

谢谢!

推荐答案

这样的事情应该让你开始工作

Something like this should get you started work

        string pattern = @"https://company.zendesk.com/api/v2/tickets/\d+.json";
        Regex regex = new Regex(pattern);
        MatchCollection mc = regex.Matches("input string here");

        foreach(Match m in mc)
        {
            Console.Write(m.Value);
        }

@ https://company.zendesk.com/api/v2/tickets/\d+.json;

@"https://company.zendesk.com/api/v2/tickets/\d+.json";

注意一下粗体部分。在 @ 意味着它是一个字符串,这样你就不必双重逃脱你的 \ 。在 \ D 是一台在任何数字。在 + 表示previous字符出现1次或更多次。 * 将意味着它发生的 0 或更多次。

take note of the bolded parts. the @ means that it's a literal string, so you don't have to double-escape your \. the \d is a stand-in for any digit. the + means the previous character occurs 1 or more times. * would mean that it occurs 0 or more times.

这里是你如何能futher自定义模式 HTTP引用:// MSDN .microsoft.com / EN-US /库/ az24scfc.aspx

here's a reference on how you can futher customize the pattern http://msdn.microsoft.com/en-us/library/az24scfc.aspx

要得到公正的票号,就可以把\ D +括号
https://company.zendesk.com/api/v2/tickets/(\ d +).json

To get just the ticket numbers, you can put the "\d+" in parenthesis
https://company.zendesk.com/api/v2/tickets/(\d+).json"

,然后你的对手就会有一个叫做财产您的准考证号码将是这些团体之一

and then your match will have a property called Groups your ticket number will be one of those groups

            Console.Write(m.Groups[i].Value);

在这一点上,你可以从组的票号筛选出全场比赛组使用了一些启发式包括但不限向字符串长度,也可以使用另一个正则表达式。

At that point, you can filter out the full match group from the ticket number of groups using a number of heuristics including but limited to the string length, or you can use another regex.

这篇关于我该如何提取特定字符串的一部分从丑陋的字符串大块?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆