如何找到下一个兄弟姐妹而不是第二个兄弟姐妹或找到兄弟姐妹“a”或者是“b”和“b” [英] how to find not the next sibling but the 2nd sibling or find sibling "a" OR sinbling "b"
问题描述
我有一些html看起来像这样我想要刮掉
href的东西( www.cnn.com part)
< div class =" noFood"> Cheese< / div>
< div class =" food"> Blue< / div>
< a class =" btn" href =" http://www.cnn.com">
所以我写了这段代码,它完美地抓住了它:
行b / b
(''div'',{''class'':''noFood''}):
b = incident.findNextSibling(''div'',{''class'':'''食物''})
打印b
n = b.findNextSibling('''',{''class'':''btn''})
打印n
link = n [''href''] +"'',''"
问题是有时是第二个标签,< div class =" food">标签,是
有时被称为食物,有时被称为饮料。所以有时它看起来像这样:
这样:
< div class =" noFood"> Cheese< / div>
< div class =" drink"> Pepsi< / div>
< a class =" btn" href =" http://www.cnn.com">
我如何改变我的脚本以考虑到我将要支付的事实
有时候有食物,有时喝酒作为班级名称?
有一种说寻找食物或饮料的方法。或者说查找
这个事件,然后找不到下一个兄弟但是第二个下一个
兄弟姐妹如果这有意义吗?
谢谢
i have some html which looks like this where i want to scrape out the
href stuff (the www.cnn.com part)
<div class="noFood">Cheese</div>
<div class="food">Blue</div>
<a class="btn" href = "http://www.cnn.com">
so i wrote this code which scrapes it perfectly:
for incident in row(''div'', {''class'':''noFood''}):
b = incident.findNextSibling(''div'', {''class'': ''food''})
print b
n = b.findNextSibling(''a'', {''class'': ''btn''})
print n
link = n[''href''] + "'',''"
problem is that sometimes the 2nd tag , the <div class="food"> tag , is
sometimes called food, sometimes called drink. so sometimes it looks
like this:
<div class="noFood">Cheese</div>
<div class="drink">Pepsi</div>
<a class="btn" href = "http://www.cnn.com">
how do i alter my script to take into account the fact that i will
sometimes have food and sometimes have drink as the class name? is
there a way to say "look for food or drink" or a way to say "look for
this incident and then find not the next sibling but the 2nd next
sibling" if that makes any sense?
thanks
推荐答案
lo ************ @ gmail.com 写道:
我有一些html看起来像这样我想要刮掉
href的东西( www.cnn.com part)
< div class =" noFood"> Cheese< / div>
< div class =" food"> Blue< / div>
< a class =" btn" href =" http://www.cnn.com">
所以我写了这段代码,它完美地抓住了它:
行事件(''div '',{''class'':''noFood''}):
b = incident.findNextSibling(''div'',{''class'':''food''})
n = b.findNextSibling('''',{''class'':''btn''})
打印n
link = n ['' href''] +"'',''"
问题是有时第二个标签,< div class =" food">标签,有时被称为食物,有时被称为饮料。
i have some html which looks like this where i want to scrape out the
href stuff (the www.cnn.com part)
<div class="noFood">Cheese</div>
<div class="food">Blue</div>
<a class="btn" href = "http://www.cnn.com">
so i wrote this code which scrapes it perfectly:
for incident in row(''div'', {''class'':''noFood''}):
b = incident.findNextSibling(''div'', {''class'': ''food''})
print b
n = b.findNextSibling(''a'', {''class'': ''btn''})
print n
link = n[''href''] + "'',''"
problem is that sometimes the 2nd tag , the <div class="food"> tag , is
sometimes called food, sometimes called drink.
显然你正在使用美丽的汤。属性
字典中的值可以是可调用的;试试这个:
def isFoodOrDrink(attr):
返回attr'[''food'',''drink'']
b = incident.findNextSibling(''div'',{''class'':isFoodOrDrink})
或者你可以省略类规范并检查它在代码中。
Kent
Apparently you are using Beautiful Soup. The value in the attribute
dictionary can be a callable; try this:
def isFoodOrDrink(attr):
return attr in [''food'', ''drink'']
b = incident.findNextSibling(''div'', {''class'': isFoodOrDrink})
Alternately you could omit the class spec and check for it in code.
Kent
我实际上意识到类名有3个潜力。
食物或饮料或甜点。所以我的问题是我是否可以改变你的功能看起来像这样?
def isFoodOrDrinkOrDesert(attr):
在'''food'','''',''desert''中返回attr
提前感谢您的帮助
Kent约翰逊写道:
i actually realized there are 3 potentials for class names. either
food or drink or dessert. so my question is whether or not i can alter
your function to look like this?
def isFoodOrDrinkOrDesert(attr):
return attr in [''food'', ''drink'', ''desert'']
thanks in advance for the help
Kent Johnson wrote:
lo ************ @ gmail.com 写道:
我有一些看起来像这样的html我想要刮掉
href的东西( www.cnn.com part)
< div class =" noFood"> Cheese< / div>
< div class =" food"> Blue< / div>
< a class =" btn" href =" http://www.cnn.com">
所以我写了这段代码,它完美地抓住了它:
行事件(''div '',{''class'':''noFood''}):
b = incident.findNextSibling(''div'',{''class'':''food''})
n = b.findNextSibling('''',{''class'':''btn''})
打印n
link = n ['' href''] +"'',''"
问题是有时第二个标签,< div class =" food">标签,有时被称为食物,有时被称为饮料。
i have some html which looks like this where i want to scrape out the
href stuff (the www.cnn.com part)
<div class="noFood">Cheese</div>
<div class="food">Blue</div>
<a class="btn" href = "http://www.cnn.com">
so i wrote this code which scrapes it perfectly:
for incident in row(''div'', {''class'':''noFood''}):
b = incident.findNextSibling(''div'', {''class'': ''food''})
print b
n = b.findNextSibling(''a'', {''class'': ''btn''})
print n
link = n[''href''] + "'',''"
problem is that sometimes the 2nd tag , the <div class="food"> tag , is
sometimes called food, sometimes called drink.
显然你正在使用美丽的汤。属性
字典中的值可以是可调用的;试试这个:
def isFoodOrDrink(attr):
在'''food'',''''返回attr
b = incident.findNextSibling (''div'',{''class'':isFoodOrDrink})
或者你可以省略类规范并在代码中检查它。
Kent
Apparently you are using Beautiful Soup. The value in the attribute
dictionary can be a callable; try this:
def isFoodOrDrink(attr):
return attr in [''food'', ''drink'']
b = incident.findNextSibling(''div'', {''class'': isFoodOrDrink})
Alternately you could omit the class spec and check for it in code.
Kent
lo ** **********@gmail.com 写道:
lo************@gmail.com wrote:
我实际上意识到类名有3个潜力。
食物或饮料或甜点。所以我的问题是我是否可以改变你的功能看起来像这样?
def isFoodOrDrinkOrDesert(attr):
返回attr'[''food'',' 'drink'',''desert'']
i actually realized there are 3 potentials for class names. either
food or drink or dessert. so my question is whether or not i can alter
your function to look like this?
def isFoodOrDrinkOrDesert(attr):
return attr in [''food'', ''drink'', ''desert'']
尝试时会发生什么?
< / F> ;
what happens when you try that ?
</F>
这篇关于如何找到下一个兄弟姐妹而不是第二个兄弟姐妹或找到兄弟姐妹“a”或者是“b”和“b”的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!