无法从BeautifulSoup文本输出中删除换行符(Python 2.7.5) [英] Can't remove line breaks from BeautifulSoup text output (Python 2.7.5)

查看:235
本文介绍了无法从BeautifulSoup文本输出中删除换行符(Python 2.7.5)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试编写一个程序来解析一系列HTML文件并将结果数据存储在.csv电子表格中,这非常依赖于换行符,而该换行符恰好位于正确的位置.我尝试了所有可以找到的将换行符从某些文本中剥离出来的方法,但都无济于事.相关代码如下:

I'm trying to write a program to parse a series of HTML files and store the resulting data in a .csv spreadsheet, which is incredibly reliant on newlines being in exactly the right place. I've tried every method I can find to strip the linebreaks away from certain pieces of text, to no avail. The relevant code looks like this:

soup = BeautifulSoup(f)
ID = soup.td.get_text()
ID.strip()
ID.rstrip()
ID.replace("\t", "").replace("\r", "").replace("\n", "")
dateCreated = soup.td.find_next("td").get_text()
dateCreated.replace("\t", "").replace("\r", "").replace("\n", "")
dateCreated.strip()
dateCreated.rstrip()
# debug
print('ID:' + ID + 'Date Created:' + dateCreated)

结果代码如下:

ID:
FOO
Date Created:
BAR

这个问题以及同一程序的另一个问题一直困扰着我.帮助将是极好的.谢谢.

This and another problem with the same program have been driving me up the wall. Help would be fantastic. Thanks.

弄清楚了,这是一个非常愚蠢的错误.不仅仅是

Figured it out, and it was a pretty stupid mistake. Instead of just doing

ID.replace("\t", "").replace("\r", "").replace("\n", "")

我应该做的

ID = ID.replace("\t", "").replace("\r", "").replace("\n", "")

推荐答案

您面临的问题是,您期望从返回新值的实际操作中进行就地操作.

Your issue at hand is that you're expecting in-place operations from what are actually operations that return new values.

ID.strip() # returns the rstripped value, doesn't change ID.
ID = ID.strip() # Would be more appropriate.

您可以使用正则表达式,尽管在此过程中正则表达式有些过分.实际上,特别是如果它是开头和结尾字符,只需将它们传递给strip:

You could use regex, though regex is overkill for this process. Realistically, especially if it's beginning and ending characters, just pass them to strip:

ID = ID.strip('\t\r\n')

这篇关于无法从BeautifulSoup文本输出中删除换行符(Python 2.7.5)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆