我应该使用哪种数据结构从 CSV 中搜索字符串? [英] Which data structure should I use to search a string from CSV?

查看:30
本文介绍了我应该使用哪种数据结构从 CSV 中搜索字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含近 200000 行的 csv 文件,其中包含两列 - name &工作.然后用户输入一个名称,比如 user_name,我必须搜索整个 csv 以找到包含模式 user_name 的名称,最后将输出打印到屏幕上.我已经使用 Java 中的 ArrayList 实现了这一点,我将整个名称从 csv 放入 ArrayList,然后在其中搜索模式.但在那种情况下,搜索的整体时间复杂度是 O(n).Java 中是否还有其他数据结构可用于在 o(logn) 或比 ArrayList 更有效的情况下执行搜索?顺便说一下,我不能使用任何数据库方法.另外,如果我可以使用任何其他语言的良好数据结构来实现我的目标,那么请向我推荐它?

I have a csv file with nearly 200000 rows containing two columns- name & job. The user then inputs a name, say user_name, and I have to search the entire csv to find the names that contain the pattern user_name and finally print the output to screen. I have implemented this using ArrayList in Java where I put the entire names from csv to ArrayList and then searched for the pattern in it. But in that case the overall time complexity for searching is O(n). Is there any other data strucure in Java that I can use to perform the searching in o(logn) or something more efficient than ArrayList? I can't use any database approach by the way. Also if there is a good data structure in any other language that I can use to accomplish my goal, then kindly suggest it to me?

编辑 - 输出应该是包含模式 user_name 作为最后一部分的 csv 中的名称.例如:如果我的输入是son",那么它应该返回jackson"等.现在我所做的就是将 csv 的 name 列读取到字符串 ArrayList,然后读取 ArrayList 的每个元素并使用正则表达式(Java 的模式匹配器)查看该元素是否将 user_name 作为最后一部分.如果是,则打印它.如果我在多线程环境中实现它,它会增加我程序的可扩展性和性能吗?

Edit- The output should be the names in the csv that contains the pattern user_name as the last part. Eg: If my input is "son", then it should return "jackson",etc. Now what I have done so far is read the name column of csv to a string ArrayList, then read each element of the ArrayList and using the regular expression (pattern-matcher of Java) to see if the element has the user_name as the last part. If yes, then print it. If I implement this in a multi-threaded environment, will it increase the scalability and performance of my program?

推荐答案

您可以使用:

  • TreeMap,是排序的红黑树,
  • TreeMap, it is sorted red-black tree,

这篇关于我应该使用哪种数据结构从 CSV 中搜索字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆