使用起始字符串和结束字符串从长字符串中提取子字符串? [英] Extract substring from a long string using starting string and ending string?
问题描述
我有这个长字符串(它是一个长的连续字符串):
I have this long string (its a one, long, continuous string):
家庭住址H.NO- 12 SECTOR- 12 GAUTAM BUDH NAGAR NOIDA- 121212,UTTAR PRADESH INDIA +911112121212最近更新:12-JUN-12学期/学期 - 住宿类型:居住大厅(私人提供者)学期/学期时间地址A121A某些公寓部分LANE CITY COUNTY OX3 7FJ +91 1212121212最近更新:12-SEP-12移动电话号码:01212121212
如果查看上面的字符串,可以使用以下模式生产:
If you look at the string above, the following pattern can be produced:
< home_address_text>< space>< the_address>< space>< last_updated_text>< last_updated_date><空间>< accomodation_t ype_text>< accomodation_type><空>< semester_time_address_text>< semester_time_address><空> last_updated_text><时间:LAST_UPDATED_DATELAST_UPDATED_TIME><空>< mobile_number_text>< MOBILE_NUMBER>
我想提取s这个字符串的特定部分,如:
1. H.NO- 12 SECTOR-12 GAUTAM BUDH NAGAR NOIDA- 121212,UTTAR PRADESH INDIA
2. Hall of住所(私人提供者)
3. A121A SOME PARARTMENT SOMELANE CITY COUNTY OX3 7FJ
4. 01212121212
I want to extract specific parts of this string, like:
1. H.NO- 12 SECTOR- 12 GAUTAM BUDH NAGAR NOIDA- 121212, UTTAR PRADESH INDIA
2. Hall of residence (private provider)
3. A121A SOME APARTMENT SOMELANE CITY COUNTY OX3 7FJ
4. 01212121212
这个信息是可变的,所以它因人而异,所以我不能只计算长度并使用子字符串来提取它,因为整个字符串的长度&我要提取的部分是可变的。
This information is variable, so it differs from person to person, so I can't just compute the length and use substring to extract it, because the length of the whole string & the part I want to extract is variable.
如何使用Java提取字符串的特定部分,如上所述?我很久以来一直在寻找方法,却找不到办法。非常感谢任何帮助
How can I extract specific parts of the string, as explained above, using Java? I've been looking for ways since a long time but couldn't find a way. Any help would be very much appreciated
推荐答案
这对我有用,基于您的(单个)示例。学习使用不情愿的修饰符来表达正则表达式。在这种情况下,他们会帮助你很多。
This worked for me, based on your (single) example. Learn to use the reluctant modifiers for regular expressions. They'll help you a lot in situations like this.
例如,要获得与第一部分相匹配的字符串:Home地址(。+?)\ + \d +最后更新:
此正则表达式不会跳过我们不想要的上次更新字符串或+ dd(数字)。正则表达式表达式(。+?)是不情愿的(不是贪婪的),不会跳过+号或数字,让它们与表达式的其余部分匹配。
For example, to get a string of characters to match the first part: "Home address (.+?) \+\d+ Last Updated:
this regex will not skip the "Last Updated" string or the "+dd" (digits) we don't want. The regex expression "(.+?)" is reluctant (not greedy) and won't skip over the + sign or the digits, leaving them to be matched by the rest of the expression.
你可以使用它来匹配静态文本包围的正则表达式中的子串。这里我使用捕获组来找到我想要的文本。(捕获组是括号中的部分。)
You can use this to match substrings in a regular expression that is surrounded by static text. Here I'm using capturing groups to locate the text I want. (Capturing groups are the parts in parenthesis.)
class Goofy
{
public static void main( String[] args )
{
final String input
= "Home address H.NO- 12 SECTOR- 12 GAUTAM BUDH NAGAR " +
"NOIDA- 121212, UTTAR PRADESH INDIA +911112121212 " +
"Last Updated: 12-JUN-12 Semester/Term-time " +
"Accommodation Type: Hall of residence (private " +
"provider) Semester/Term-time address A121A SOME " +
"APPARTMENT SOME LANE CITY COUNTY OX3 7FJ +91 " +
"1212121212 Last Updated: 12-SEP-12 Mobile Telephone " +
"Number : 01212121212";
final String regex = "Home address (.+?) \\+\\d+ Last Updated: " +
"\\S+ Semester/Term-time Accommodation Type: (.+?) " +
"Semester/Term-time address (.+?) \\+\\d\\d \\d+ " +
"Last Updated.+ Number : (\\d+)";
Pattern pattern = Pattern.compile( regex );
Matcher matcher = pattern.matcher( input );
if( matcher.find() ) {
System.out.println("Found: "+matcher.group() );
for( int i = 1; i <= matcher.groupCount(); i++ ) {
System.out.println( " Match " + i + ": " + matcher.group( i ));
}
}
}
}
这篇关于使用起始字符串和结束字符串从长字符串中提取子字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!