从webscrape输出中删除'u [英] Remove 'u from a webscrape output

查看:41
本文介绍了从webscrape输出中删除'u的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我使用Beautifulsoup解析了一个网站并获得了一个名称作为输出.但是运行脚本后,我得到了 [u'word1',u'word2',u'word3'] 输出.我正在寻找的是'word1 word2 word3'.如何摆脱这个 u'并将结果变成单个字符串?

Hi ' im using Beautifulsoup to parse a website and get a name as output. But after running the script, i get a [u'word1', u'word2', u'word3'] output. What i'm looking for is 'word1 word2 word3'. how do get rid of this u' and make the result a single string?

from bs4 import BeautifulSoup
import urllib2
import re

myfile = open("base/dogs.txt","w+")
myfile.close()

url="http://trackinfo.com/entries-race.jsp?raceid=GBR$20140302A01"
page=urllib2.urlopen(url)
soup = BeautifulSoup(page.read())
names=soup.findAll('a',{'href':re.compile("dog")})
myfile = open("base/dogs.txt","w+")
for eachname in names:
    d = (str(eachname.string.split()))+"\n"
    print [x.encode('ascii') for x in d]
    myfile.write(d)

myfile.close()

推荐答案

使用 .encode()给出的答案是给您您所需要的,但可能不是您所需要的.您可以保留unicode编码,而不用表示事物的编码或类型的方式表示事物.因此,它们仍会成为 [u'word1',u'word2',u'word3'] -避免破坏对无法支持的语言的支持以ASCII表示-但是打印为 word1 word2 word3 .

The answers here using .encode() are giving you what you ask for, but probably not what you need. You can keep the unicode encoding and not represent things in a way that shows you what their encoding or type is. Thus, they'll still be [u'word1', u'word2', u'word3'] -- which avoids breaking support for languages that can't be represented in ASCII -- but printed as word1 word2 word3.

只需:

for eachname in names:
    d = ' '.join(eachname.string.split()) + '\n'
    print d
    myfile.write(d)

这篇关于从webscrape输出中删除'u的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆