简单,快速的SQL查询平面文件 [英] Simple, fast SQL queries for flat files
问题描述
是否有人知道使用类似于SQL的声明性查询语言来提供简单,快速的平面文件查询的工具吗?我宁愿不支付将文件加载到数据库中的开销,因为输入数据通常在查询运行后几乎立即被抛出.
Does anyone know of any tools to provide simple, fast queries of flat files using a SQL-like declarative query language? I'd rather not pay the overhead of loading the file into a DB since the input data is typically thrown out almost immediately after the query is run.
考虑数据文件"animals.txt":
Consider the data file, "animals.txt":
dog 15
cat 20
dog 10
cat 30
dog 5
cat 40
假设我想为每只独特的动物提取最高价值.我想写些类似的东西:
Suppose I want to extract the highest value for each unique animal. I would like to write something like:
cat animals.txt | foo "select $1, max(convert($2 using decimal)) group by $1"
使用sort
可以获得几乎相同的结果:
I can get nearly the same result using sort
:
cat animals.txt | sort -t " " -k1,1 -k2,2nr
而且我总是可以从那里进入awk
,但是当类似SQL的语言似乎可以如此干净地解决问题时,这一切都会感觉到awk
领域(无法抗拒).
And I can always drop into awk
from there, but this all feels a bit awk
ward (couldn't resist) when a SQL-like language would seem to solve the problem so cleanly.
我已经考虑过为SQLite编写一个包装程序,该包装程序将根据输入数据自动创建一个表,并且我考虑在单处理器模式下使用Hive,但我不禁感到这个问题已经解决了.之前解决过.我想念什么吗?此功能是否已由其他标准工具实现?
I've considered writing a wrapper for SQLite that would automatically create a table based on the input data, and I've looked into using Hive in single-processor mode, but I can't help but feel this problem has been solved before. Am I missing something? Is this functionality already implemented by another standard tool?
半路!
推荐答案
我从没有找到令人满意的答案,但是我至少使用uniq
的"-f"找到了解决玩具问题的方法选项,我一直没有意识到:
I never managed to find a satisfying answer to my question, but I did at least find a solution to my toy problem using uniq
s "-f" option, which I had been unaware of:
cat animals.txt | sort -t " " -k1,1 -k2,2nr \
| awk -F' ' '{print $2, " ", $1}' | uniq -f 1
如果输入文件的创建顺序相反,显然可以完全跳过上面的awk
部分.
The awk
portion above could, obviously, be skipped entirely if the input file were created with columns in the opposite order.
不过,我仍然希望有一个类似SQL的工具.
I'm still holding out hope for a SQL-like tool, though.
这篇关于简单,快速的SQL查询平面文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!