从webscrape输出中删除'u [英] Remove 'u from a webscrape output
问题描述
您好,我使用Beautifulsoup解析了一个网站并获得了一个名称作为输出.但是运行脚本后,我得到了 [u'word1',u'word2',u'word3']
输出.我正在寻找的是'word1 word2 word3'
.如何摆脱这个 u'
并将结果变成单个字符串?
Hi ' im using Beautifulsoup to parse a website and get a name as output. But after running the script, i get a [u'word1', u'word2', u'word3']
output. What i'm looking for is 'word1 word2 word3'
. how do get rid of this u'
and make the result a single string?
from bs4 import BeautifulSoup
import urllib2
import re
myfile = open("base/dogs.txt","w+")
myfile.close()
url="http://trackinfo.com/entries-race.jsp?raceid=GBR$20140302A01"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
names=soup.findAll('a',{'href':re.compile("dog")})
myfile = open("base/dogs.txt","w+")
for eachname in names:
d = (str(eachname.string.split()))+"\n"
print [x.encode('ascii') for x in d]
myfile.write(d)
myfile.close()
推荐答案
使用 .encode()
给出的答案是给您您所需要的,但可能不是您所需要的.您可以保留unicode编码,而不用表示事物的编码或类型的方式表示事物.因此,它们仍会成为 [u'word1',u'word2',u'word3']
-避免破坏对无法支持的语言的支持以ASCII表示-但是打印为 word1 word2 word3
.
The answers here using .encode()
are giving you what you ask for, but probably not what you need. You can keep the unicode encoding and not represent things in a way that shows you what their encoding or type is. Thus, they'll still be [u'word1', u'word2', u'word3']
-- which avoids breaking support for languages that can't be represented in ASCII -- but printed as word1 word2 word3
.
只需:
for eachname in names:
d = ' '.join(eachname.string.split()) + '\n'
print d
myfile.write(d)
这篇关于从webscrape输出中删除'u的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!