Spark-Java:如何转换格式为“yyyy-MM-ddThh:mm:ss.SSS+0000"的数据集字符串列用格式时间戳? [英] Spark-Java:How to convert Dataset string column of format "yyyy-MM-ddThh:mm:ss.SSS+0000" to timestamp with a format?
问题描述
我有一个数据集,其中一列 lastModified
类型为字符串,格式为yyyy-MM-ddThh:mm:ss.SSS+0000"(示例数据:<代码>2018-08-17T19:58:46.000+0000).
我必须通过将 lastModified
的值转换为格式yyyy-MM-dd hh:mm"来添加时间戳类型的新列 lastModif_mapped:ss.SSS".
我尝试了下面的代码,但新列在其中获取值 null
:
数据集<行>过滤=空;过滤 = ds1.select(ds1.col("id"),ds1.col("lastmodified")).withColumn("lastModif_mapped", functions.unix_timestamp(ds1.col("lastmodified"), "yyyy-MM-dd HH:mm:ss.SSS").cast("timestamp")).alias("lastModif_mapped");
我哪里出错了?
- 正如我在您的原始问题中回答的那样,您的输入数据字符串字段与 unix_timestamp(Column s, String p):
<块引用>
如果是字符串,则数据必须是可以转换为时间戳的格式,例如 yyyy-MM-dd 或 yyyy-MM-dd HH:mm:ss.SSSS
- 对于您的情况,您需要使用 to_timestamp(Column s, String fmt)
import static org.apache.spark.sql.functions.to_timestamp;...to_timestamp(ds1.col("lastmodified"), "yyyy-MM-dd'T'HH:mm:ss.SSSXXX")
并且您不需要显式转换为时间戳,因为 to_timestamp
已经返回时间戳.
- 当您使用
withColumn("lastModif_mapped",...)
时,您不需要添加alias("lastModif_mapped")
,因为withColumn
将使用提供的名称创建一个新列.
I have a Dataset with one column lastModified
of type string with format "yyyy-MM-ddThh:mm:ss.SSS+0000" (sample data: 2018-08-17T19:58:46.000+0000
).
I have to add a new column lastModif_mapped of type Timestamp by converting the lastModified
's value to format "yyyy-MM-dd hh:mm:ss.SSS".
I tried the code below, but the new column is getting the value null
in it:
Dataset<Row> filtered = null;
filtered = ds1.select(ds1.col("id"),ds1.col("lastmodified"))
.withColumn("lastModif_mapped", functions.unix_timestamp(ds1.col("lastmodified"), "yyyy-MM-dd HH:mm:ss.SSS").cast("timestamp")).alias("lastModif_mapped");
Where am I going wrong?
- As I have answered in your original question, your input data String field didn't correspond to allowed formats of the unix_timestamp(Column s, String p):
If a string, the data must be in a format that can be cast to a timestamp, such as yyyy-MM-dd or yyyy-MM-dd HH:mm:ss.SSSS
- For you case, you need to use to_timestamp(Column s, String fmt)
import static org.apache.spark.sql.functions.to_timestamp;
...
to_timestamp(ds1.col("lastmodified"), "yyyy-MM-dd'T'HH:mm:ss.SSSXXX")
And you don't need to cast explicitly to Timestamp since to_timestamp
returns already Timestamp.
- When you use
withColumn("lastModif_mapped",...)
you don't need to addalias("lastModif_mapped")
, becausewithColumn
would create a new column with the provided name.
这篇关于Spark-Java:如何转换格式为“yyyy-MM-ddThh:mm:ss.SSS+0000"的数据集字符串列用格式时间戳?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!