如何提取一些特定的“文本/数值"?准确地从字符串中提取? [英] How to extract Some Specific "Text/Numeric Values" from a String Accurately?

查看:110
本文介绍了如何提取一些特定的“文本/数值"?准确地从字符串中提取?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

实际上,我从Exchange网站上获得了一些HTML内容,并且我将代码从"USD -US Dollar"向前拆分为String.现在,我想从String下方获取汇率的值(126.0,125.2739,130.0)一个单独的变量.我使用了索引"方法.但是由于在HTML内容(字符串)中重复了相同的标签,因此很难提取上述值.



如果有人可以帮助我解决这个问题.对我来说将是一个很大的帮助.在此先感谢您!我需要提取的是上面美元/美元标记下的数值(126.0,125.2739,130.0).

Actually i got some HTML content from a Exchange website and i Splite code from "USD -US Dollar" onwards to a String .Now i want to get the values of Exchange rates ( 126.0 , 125.2739 ,130.0 ) from below String to a separate variables.I used "Index of" method.but due to repeating of same tags in the HTML content(in the string) it is hard to extract above values.



IF can someone help me on this matter. that would be a great help for me. Thanks in Advance!!!what i need is to extract above numeric values(126.0 , 125.2739 ,130.0) under USD-US Dollar tag.

<b>USD -US Dollar</B></td>
<td bgcolor='gold' align='right' width='15%'>126.0</td>
<td bgcolor='gold' align='right' width='20%'>125.2739</td>
<td bgcolor='gold' align='right' width='20%'>130.0</td>
</tr>
</body>
</table>
</html>

推荐答案

使用以下Regex:
Use the following Regex:
public static Regex regex = new Regex(
    @"\<td.*\>(?<num>.*)\<\/td\>",
    RegexOptions.IgnoreCase
    | RegexOptions.Multiline
    | RegexOptions.IgnorePatternWhitespace
    | RegexOptions.Compiled
    );


这将为您提供名为num的命名匹配.


Which will give you named matches called num.


Mehdi Gholam 给出的解决方案1非常好.唯一的问题是regex 中给出的.*会捕获包括< and >在内的所有内容,因此我们只能得到最后一个数字,即130.0.因此,必须将.*替换为[^<>]*.
此外,在用matches捕获所有数字之后,在每个匹配项中捕获的数字将为Groups[1].因此,OP在上述注释中声明的Groups[2], Groups[3]可能无法给出正确的结果.
因此,以下代码可用于根据需要设置值.
The Solution 1 given by Mehdi Gholam is excellent. Only thing is that .* given in the regex captures everything including < and > so that we get only the last number i.e. 130.0. So, .* has to be replaced by [^<>]*.
Further, after capturing all the numbers with matches, the captured number will be Groups[1] in each match. So, Groups[2], Groups[3] stated by OP in the comment above, may not give the correct result.
Hence, the following code can be used to set the values as required.
Regex regex = new Regex(@"\<td[^<>]*\>([^<>]*)\<\/td\>",RegexOptions.IgnoreCase | RegexOptions.Multiline);
var matches = regex.Matches(comp);
if (matches.Count==3){
    string v=matches[0].Groups[1].Value;
    string r=matches[1].Groups[1].Value;
    string s=matches[2].Groups[1].Value;
}


这篇关于如何提取一些特定的“文本/数值"?准确地从字符串中提取?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆