根据特定关键字从CSV文件中提取行 [英] extracting rows from CSV file based on specific keywords
问题描述
我创建了一个代码来帮助我从csv文件中检索数据
I have created a code to help me retrieving the data from csv file
import re
keywords = {"metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists",
"electronic", "workers"} # all your keywords
keyre=re.compile("energy",re.IGNORECASE)
with open("2006-data-8-8-2016.csv") as infile:
with open("new_data.csv", "w") as outfile:
outfile.write(infile.readline()) # Save the header
for line in infile:
if len(keyre.findall(line))>0:
outfile.write(line)
我需要它在"position"和"Job description"两个主要列中查找每个关键字,然后将包含这些单词的整个行写到新文件中.关于如何以最简单的方式完成操作的任何想法?
I need it to look for each keyword in two main columns which are "position" and "Job description" , and then take the whole row that includes these words and write them in the new file. Any ideas on how this can be done in the simplest way?
推荐答案
如果您要从关键字列表中查找仅包含一个单词的行,则可以使用pandas进行以下操作:
You can do this using pandas as follows, if you are looking for rows that contain exactly one word from the list of keywords:
keywords = ["metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists",
"electronic", "workers"]
# read the csv data into a dataframe
# change "," to the data separator in your csv file
df = pd.read_csv("2006-data-8-8-2016.csv", sep=",")
# filter the data: keep only the rows that contain one of the keywords
# in the position or the Job description columns
df = df[df["position"].isin(keywords) | df["Job description"].isin(keywords)]
# write the data back to a csv file
df.to_csv("new_data.csv",sep=",", index=False)
如果要在行中查找子字符串(例如,在financial engineering
中查找financial
),则可以执行以下操作:
If you are looking for substrings in the rows (e.g looking financial
in financial engineering
) then you can do the following:
keywords = ["metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists",
"electronic", "workers"]
searched_keywords = '|'.join(keywords)
# read the csv data into a dataframe
# change "," to the data separator in your csv file
df = pd.read_csv("2006-data-8-8-2016.csv", sep=",")
# filter the data: keep only the rows that contain one of the keywords
# in the position or the Job description columns
df = df[df["position"].str.contains(searched_keywords) | df["Job description"].str.contains(searched_keywords)]
# write the data back to a csv file
df.to_csv("new_data.csv",sep=",", index=False)
这篇关于根据特定关键字从CSV文件中提取行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!