使用Jsoup提取字符串 [英] Extract strings using Jsoup

查看:101
本文介绍了使用Jsoup提取字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用<$ c在网站 html 页面中获取一些名称格式 class 属性$ c> Jsoup 库,问题是我使用 getElementsByClass(name)按类获取元素并将其存储到字符串变量和
结果就像这样的mike andro rob banks maria gerardo louis .... etc。
但我想要的是将各个名称分开并将它们存储到数组中。
以下是代码片段:

  public String processText(String htmlPage){

文件html = Jsoup.parse(htmlPage);
String names = html.body()。getElementsByClass(name)。text();
返回姓名;
}

更多信息:



源页面是 html 页面,我将完整的html代码保存在字符串中,然后处理字符串以仅提取元素在 class =name下



htmlPage 哪个我传递给 processText 方法类似于以下内容:



 < div class =name> Rob Kardashian< / div> < / DIV> < / A> < / DIV> < div class =channelListEntry> < a href =/ zayn_malik> < div class =image> < img src =http://cdn.posh24.com/images/:profile/014cf47ca44daf8f44a3e0720929ee327\"alt =Zayn Malik/> < / DIV> < div class =info> < div class =status-container> < div class =position> 4< / div> < div class =img pos>< / div> < div class =value> + 12< / div> < / DIV> < div class =name> Zayn Malik< / div> < / DIV> < / A> < / DIV> < div class =channelListEntry> < a href =/ kanye_west> < div class =image> < img src =http://cdn.posh24.com/images/:profile/03f352f71ffab135cd81821eb190d4832\"alt =Kanye West/> < / DIV> < div class =info> < div class =status-container> < div class =position> 5< / div> < div class =img pos>< / div> < div class =value> + 16< / div> < / DIV> < div class =name> Kanye West< / div> < / DIV> < / A> < / DIV> < div class =channelListEntry> < a href =/ kendall_jenner> < div class =image> < img src =http://cdn.posh24.com/images/:profile/066d5c02547c4357f1bc5f633c68f4085\"alt =Kendall Jenner/> < / div>  

解决方案

<你可以简单地使用 split 函数从字符串中获取数组

  String arr [] = names.trim()。split(\\\\); 

加上如果您在名称之间加上空格和制表符,则使用

  String arr [] = names.split(\\\\ +); 

更新:

  ArrayList< String> name = new ArrayList< String>(); 
for(元素输出:html.body()。getElementsByClass(name)){
name.add(output.text());
}



将列表转换为数组的链接


I'm trying to get some name form class attribute within a website html page by using Jsoup Library, The problem is that i'm getting the elements by class using getElementsByClass("name") and store it into a string variable and the result coming like this "mike andro rob banks maria gerardo louis....etc". but what i want is to separate the individual names and store them into array. the following is the code snippet:

public String processText(String htmlPage) {

    Document html = Jsoup.parse(htmlPage);
    String names = html.body().getElementsByClass("name").text();
    return names;
}

More information:

The source page is an html page and i am saving the full html code in a string and then process the string to extract only the Elements under the class="name"

htmlPage which i am passing to processText method is similar to the following:

<div class="name">
							Rob Kardashian
						</div>
					</div>
				</a>
			</div>
					<div class="channelListEntry">
				<a href="/zayn_malik">
					<div class="image">
						<img src="http://cdn.posh24.com/images/:profile/014cf47ca44daf8f44a3e0720929ee327" alt="Zayn Malik"/>
					</div>
					
					 
										<div class="info">
						<div class="status-container">
							<div class="position">4</div>
							 
								<div class="img pos"></div>
								<div class="value">+12</div>
													
						</div>
						<div class="name">
							Zayn Malik
						</div>
					</div>
				</a>
			</div>
					<div class="channelListEntry">
				<a href="/kanye_west">
					<div class="image">
						<img src="http://cdn.posh24.com/images/:profile/03f352f71ffab135cd81821eb190d4832" alt="Kanye West"/>
					</div>
					
					 
										<div class="info">
						<div class="status-container">
							<div class="position">5</div>
							 
								<div class="img pos"></div>
								<div class="value">+16</div>
													
						</div>
						<div class="name">
							Kanye West
						</div>
					</div>
				</a>
			</div>
					<div class="channelListEntry">
				<a href="/kendall_jenner">
					<div class="image">
						<img src="http://cdn.posh24.com/images/:profile/066d5c02547c4357f1bc5f633c68f4085" alt="Kendall Jenner"/>
					</div>

解决方案

you can simply use split function to get an array from string

String arr[]=names.trim().split("\\s");

plus if you have spaces and tab combined between name then use

  String arr[]=names.split("\\s+");

Update:

      ArrayList<String>  name=new ArrayList<String>();
      for (Element output: html.body().getElementsByClass("name")) {
          name.add(output.text());
          }

example link

Output :

link to convert list to array

这篇关于使用Jsoup提取字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆