在列表Python上使用Apply时出错 [英] Getting error while using apply on a list python
问题描述
我有一个数据框,其中txt
列包含一个列表.我想使用函数clean_text()清理txt
列.
I have data frame in which txt
column contains a list. I want to clean the txt
column using function clean_text().
data = {'value':['abc.txt', 'cda.txt'], 'txt':['['2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart']',
'['2019/02/01-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart']']}
df = pandas.DataFrame(data=data)
df
value txt
abc.txt ['2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart']
cda.txt ['2019/02/01-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart']
def clean_text(text):
"""
:param text: it is the plain text
:return: cleaned text
"""
patterns = [r"^.{53}",
r"[A-Za-z]+[\d]+[\w]*|[\d]+[A-Za-z]+[\w]*",
r"[-=/':,?${}\[\]-_()>.~" ";+]"]
for p in patterns:
text = re.sub(p, '', text)
return text
我的解决方案:
df['txt'] = df['txt'].apply(lambda x: clean_text(x))
但是我遇到了以下错误: 错误
But I am getting below error: Error
df['txt'] = df['txt'].apply(lambda x: clean_text(x))
AttributeError: 'list' object has no attribute 'apply'
clean_text(df['txt'][1]
TypeError: expected string or bytes-like object
我不确定在此问题中如何使用numpy.where
.
I am not sure how to use numpy.where
in this problem.
推荐答案
基于对您的问题的修订以及注释中的讨论,我相信您需要使用以下行:
Based on the revision to your question, and discussion in the comments, I believe you need to use the following line:
df['txt'] = df['txt'].apply(lambda x: [clean_text(z) for z in x])
在这种方法中,apply
与lambda
一起使用来循环txt
系列的每个元素,而简单的for循环(使用Python的列表推导表示)用于遍历txt
子列表.
In this approach, apply
is used with lambda
to loop each element of the txt
series, while a simple for-loop (expressed using Python's list comprehension) is utilized to iterate over each item in the txt
sub-list.
我已经用data
的以下值测试了该代码段:
I have tested that snippet with the following value for data
:
data = {
'value': [
'abc.txt',
'cda.txt',
],
'txt':[
[
'2019/01/31-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart',
],
[
'2019/02/01-11:56:23.288258 1886 7F0ED4CDC704 asfasnfs: remove datepart',
],
]
}
以下是控制台输出的片段,显示了转换前后的数据帧:
Here is a snippet of console output showing the dataframe before and after transformation:
>>> df
value txt
0 abc.txt [2019/01/31-11:56:23.288258 1886 7F0ED4CDC...
1 cda.txt [2019/02/01-11:56:23.288258 1886 7F0ED4CDC...
>>> df['txt'] = df['txt'].apply(lambda x: [clean_text(z) for z in x])
>>> df
value txt
0 abc.txt [asfasnfs remove datepart]
1 cda.txt [asfasnfs remove datepart]
>>>
这篇关于在列表Python上使用Apply时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!