正则表达式用于循环遍历python中的列表 [英] regex for loop over list in python

查看：35 发布时间：2021/4/15 19:05:53 python html for-loop beautifulsoup

本文介绍了正则表达式用于循环遍历python中的列表的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我拥有此列表

[<th align="left">
 <a href="blablabla">F</a>ojweousa</th>,
 <th align="left">
 <a href="blablabla">S</a>awdefrgt</th>, ...]

和想要

</a> 和</th>，

要串联起来，这样我才能继续生活.

to be concatenated so that i can move on with my life.

这是我的代码

item2 = []
for element in items2:
    first_letter = re.search('">.</a', str(items2))
    second_letter = re.search(r'</a>[a-zA-Z0-9]</th>,', str(items2))
    item2.append([str(first_letter) + str(second_letter)])

我知道我应该做类似 item2.group 或 item2.join 的事情，但是如果我这样做了，输出会变得更加混乱.这是当前代码的输出

I know i should do something like item2.group or item2.join but if i do, the output gets even more messy. Here is the output with the current code

[['<re.Match object; span=(155, 161), match=\'">F</a\'>None'],
 ['<re.Match object; span=(155, 161), match=\'">F</a\'>None'],
 ...]]

我会喜欢输出看起来像这样，以便我可以在pd.dataframe中使用它:

I would like the output to look like this so that i can use it in pd.dataframe:

[Fojweousa, Sawdefrgt, ...]

这是一个列表，这就是为什么我不能使用html bs4或select方法的原因.

It is a list, that is why i cant use html bs4 or select methods.

推荐答案

您可以使用BeautifulSoup get_text() 从使用 find_all 和 找到的每个元素中获取纯文本去除以摆脱前导和尾随空格:

You can use the BeautifulSoup get_text() to get plain text from each element you found with find_all and strip to get rid of leading and trailing whitespace:

items2 = table.find_all('th', attrs={'align': 'left'})[1:]
result = [x.get_text().strip() for x in items2]

在这里， .find_all('th'，attrs = {'align':'left'})查找具有属性 align 的所有 th 元素code>等于 left ，而 [1:] 跳过第一次出现的情况.


Here, .find_all('th', attrs={'align': 'left'}) finds all th elements with attribute align equal to left, and [1:] skips the first occurrence.
接下来， [items2中x的x.get_text().strip())是一个列表理解，它迭代找到的项( items2 ，x 是每个找到的元素)，并使用 x.get_text()和 strip()从每个 x 元素获取纯文本删除前导/尾随空格.

Next, [x.get_text().strip() for x in items2] is a list comprehension that iterates over the found items (items2, x is each single found element) and gets plain text from each x element using x.get_text() and strip() removes leading/trailing whitespace.

                        这篇关于正则表达式用于循环遍历python中的列表的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

正则表达式用于循环遍历python中的列表 [英] regex for loop over list in python

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

正则表达式用于循环遍历python中的列表 [英] regex for loop over list in python

问题描述

推荐答案

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭