使用 spacy 从数据框中提取实体 [英] Extract entity from dataframe using spacy
问题描述
我使用 Pandas 从 excel 文件中读取内容::
I read contents from excel file using pandas::
import pandas as pd
df = pd.read_excel("FAM_template_Update 1911274_JS.xlsx" )
df
尝试使用 spacy 提取实体时::
While trying to extract entities using spacy::
import spacy
nlp = spacy.load("en_core_web_sm")
doc = nlp(df)
for enitity in doc.ents:
print((entity.text))
Got Error:: TypeError: Argument 'string' 的类型不正确(预期 str,得到 DataFrame)
Got Error:: TypeError: Argument 'string' has incorrect type (expected str, got DataFrame)
On line(3)-----> doc = nlp(df)
推荐答案
这是意料之中的,因为 Spacy
不准备按原样处理数据帧.在能够打印实体之前,您需要做一些工作.首先确定包含要在其上使用 nlp
的文本的列.之后,将其值提取为列表,现在您可以开始了.假设包含文本的列名称名为 Text
.
This is expected as Spacy
is not prepared to deal with a dataframe as-is. You need to do some work before being able to print the entities. Start by identifying the column that contains the text you want to use nlp
on. After that, extract its value as list, and now you're ready to go. Let's suppose the column name that contains the text is named Text
.
for i in df['Question'].tolist():
doc = nlp(i)
for entity in doc.ents:
print((entity.text))
这将遍历数据框中的每个文本(行)并打印实体.
This will iterate over each text (row) for in your dataframe and print the entities.
这篇关于使用 spacy 从数据框中提取实体的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!