python中带有字符串列表的列 [英] Column with list of strings in python

查看:81
本文介绍了python中带有字符串列表的列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个如下所示的pandas数据框:

I have a pandas dataframe like the following:

                                          categories  review_count
0                  [Burgers, Fast Food, Restaurants]           137
1                         [Steakhouses, Restaurants]           176
2  [Food, Coffee & Tea, American (New), Restaurants]           390
...                                          ....              ...
...                                          ....              ...
...                                          ....              ...

我想从此dataFrame中仅提取其中该行的类别"列中的列表包含餐馆"类别的那些行.到目前为止,我已经尝试过: df[[df.categories.isin('Restaurants'),review_count]]

From this dataFrame,I would like to extract only those rows wherein the list in the 'categories' column of that row contains the category 'Restaurants'. I have so far tried: df[[df.categories.isin('Restaurants'),review_count]],

因为我在dataFrame中还有其他列,所以我指定了我要提取的这两列.但是我得到了错误:

as I also have other columns in the dataFrame, I specified these two columns that I want to extract. But I get the error:

TypeError: unhashable type: 'list'

我不太了解这个错误的含义,因为我是熊猫的新手.请让我知道我如何实现只从dataFrame中提取那些行的目标,其中,该行的类别"列中包含字符串"Restaurants"作为Categories_list的一部分. 任何帮助将不胜感激.

I don't have much idea what this error means as I am very new to pandas. Please let me know how I can achieve my goal of extracting only those rows from the dataFrame wherein the 'categories' column for that row has the string 'Restaurants' as part of the categories_list. Any help would be much appreciated.

提前谢谢!

推荐答案

我认为您可能必须为此使用lambda函数,因为您可以测试列isin中的值是否是某些顺序,但是pandas似乎没有提供测试列中的序列是否包含某些值的功能:

I think you may have to use a lambda function for this, since you can test whether a value in your column isin some sequence, but pandas doesn't seem to provide a function for testing whether the sequence in your column contains some value:

import pandas as pd
categories = [['fast_food', 'restaurant'], ['coffee', 'cafe'], ['burger', 'restaurant']]
counts = [137, 176, 390]
df = pd.DataFrame({'categories': categories, 'review_count': counts})
# Show which rows contain 'restaurant'
df.categories.map(lambda x: 'restaurant' in x)
# Subset the dataframe using this:
df[df.categories.map(lambda x: 'restaurant' in x)]

输出:

Out[11]: 
                categories  review_count
0  [fast_food, restaurant]           137
2     [burger, restaurant]           390

这篇关于python中带有字符串列表的列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆