在PySpark文本文件命令定期EX pressions [英] Regular expressions in PySpark textFile command

查看：162 发布时间：2016/5/22 16:35:43 python regex apache-spark glob pyspark

本文介绍了在PySpark文本文件命令定期EX pressions的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我试图找出多远我可以选择感兴趣的多个文件推此命令。比如我使用下列通配符拿起那是在多个目录中所有感兴趣的文件，但我想使用常规的前pressions或类似的地方限制上说，目录名的长度。

I'm trying to figure out how far I can push this command in selecting multiple files of interest. For example I'm using the following wildcard to pick up all files that are of interest across multiple directories, but I'd like to use regular expressions or the like to place limitations on say the length of the directory name.

lines = sc.textFile("/home/spark-1.4.0/A/B_2*/Output/CSV.csv")

但不是 * ，我可以限制目录名的长度？例如用 ^ [0-9] {8} $ ？或这样做，而不诉诸pre-过滤建立有效的目录列表的任何方式。

But instead of *, can I restrict the length of the directory name? For example with ^[0-9]{8}$? Or any way of doing this without resorting to pre-filtering to build a list of valid directories.

推荐答案

只是为了让事情你想在这里直接的是一个简单的水珠没有正规的前pression。你可以做这样的事情：

Just to keep things straight what you want here is a simple glob no a regular expression. You can do something like this:

glob = "/home/spark-1.4.0/A/B_2{0}/Output/CSV.csv".format("[0-9]" * 8)
lines = sc.textFile(glob)

这篇关于在PySpark文本文件命令定期EX pressions的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

在PySpark文本文件命令定期EX pressions [英] Regular expressions in PySpark textFile command

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在PySpark文本文件命令定期EX pressions [英] Regular expressions in PySpark textFile command

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭