使用Jsoup提取字符串 [英] Extract strings using Jsoup
问题描述
我正在尝试使用<$ c在网站 html
页面中获取一些名称格式 class
属性$ c> Jsoup 库,问题是我使用 getElementsByClass(name)
按类获取元素并将其存储到字符串变量和
结果就像这样的mike andro rob banks maria gerardo louis .... etc。
但我想要的是将各个名称分开并将它们存储到数组中。
以下是代码片段:
public String processText(String htmlPage){
文件html = Jsoup.parse(htmlPage);
String names = html.body()。getElementsByClass(name)。text();
返回姓名;
}
更多信息:
源页面是 html
页面,我将完整的html代码保存在字符串中,然后处理字符串以仅提取元素在 class =name下
htmlPage
哪个我传递给 processText
方法类似于以下内容:
< div class =name> Rob Kardashian< / div> < / DIV> < / A> < / DIV> < div class =channelListEntry> < a href =/ zayn_malik> < div class =image> < img src =http://cdn.posh24.com/images/:profile/014cf47ca44daf8f44a3e0720929ee327\"alt =Zayn Malik/> < / DIV> < div class =info> < div class =status-container> < div class =position> 4< / div> < div class =img pos>< / div> < div class =value> + 12< / div> < / DIV> < div class =name> Zayn Malik< / div> < / DIV> < / A> < / DIV> < div class =channelListEntry> < a href =/ kanye_west> < div class =image> < img src =http://cdn.posh24.com/images/:profile/03f352f71ffab135cd81821eb190d4832\"alt =Kanye West/> < / DIV> < div class =info> < div class =status-container> < div class =position> 5< / div> < div class =img pos>< / div> < div class =value> + 16< / div> < / DIV> < div class =name> Kanye West< / div> < / DIV> < / A> < / DIV> < div class =channelListEntry> < a href =/ kendall_jenner> < div class =image> < img src =http://cdn.posh24.com/images/:profile/066d5c02547c4357f1bc5f633c68f4085\"alt =Kendall Jenner/> < / div>
<你可以简单地使用 split
函数从字符串中获取数组
String arr [] = names.trim()。split(\\\\);
加上如果您在名称之间加上空格和制表符,则使用
String arr [] = names.split(\\\\ +);
更新:
ArrayList< String> name = new ArrayList< String>();
for(元素输出:html.body()。getElementsByClass(name)){
name.add(output.text());
}
I'm trying to get some name form class
attribute within a website html
page by using Jsoup
Library, The problem is that i'm getting the elements by class using getElementsByClass("name")
and store it into a string variable and
the result coming like this "mike andro rob banks maria gerardo louis....etc".
but what i want is to separate the individual names and store them into array.
the following is the code snippet:
public String processText(String htmlPage) {
Document html = Jsoup.parse(htmlPage);
String names = html.body().getElementsByClass("name").text();
return names;
}
More information:
The source page is an html
page and i am saving the full html code in a string and then process the string to extract only the Elements under the class="name"
htmlPage
which i am passing to processText
method is similar to the following:
<div class="name">
Rob Kardashian
</div>
</div>
</a>
</div>
<div class="channelListEntry">
<a href="/zayn_malik">
<div class="image">
<img src="http://cdn.posh24.com/images/:profile/014cf47ca44daf8f44a3e0720929ee327" alt="Zayn Malik"/>
</div>
<div class="info">
<div class="status-container">
<div class="position">4</div>
<div class="img pos"></div>
<div class="value">+12</div>
</div>
<div class="name">
Zayn Malik
</div>
</div>
</a>
</div>
<div class="channelListEntry">
<a href="/kanye_west">
<div class="image">
<img src="http://cdn.posh24.com/images/:profile/03f352f71ffab135cd81821eb190d4832" alt="Kanye West"/>
</div>
<div class="info">
<div class="status-container">
<div class="position">5</div>
<div class="img pos"></div>
<div class="value">+16</div>
</div>
<div class="name">
Kanye West
</div>
</div>
</a>
</div>
<div class="channelListEntry">
<a href="/kendall_jenner">
<div class="image">
<img src="http://cdn.posh24.com/images/:profile/066d5c02547c4357f1bc5f633c68f4085" alt="Kendall Jenner"/>
</div>
you can simply use split
function to get an array from string
String arr[]=names.trim().split("\\s");
plus if you have spaces and tab combined between name then use
String arr[]=names.split("\\s+");
Update:
ArrayList<String> name=new ArrayList<String>();
for (Element output: html.body().getElementsByClass("name")) {
name.add(output.text());
}
Output :
这篇关于使用Jsoup提取字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!