SparkSQL和Hive在做cast boolean存在的不同

如题所述

今天在看一些数据的时候发现,一些SparkSQL与Hive之间在进行cast转化时候存在一些差异。
HiveVersion 1.2.1
SparkSQL 1.6.0
总结:
在Hive中, boolean类型的隐式转化,Hive中非boolean非null转化默认为True,
而在SparkSQL中,则根据传入的不同数据类型判断值后返回结果.
Hive
Converts the results of the expression expr to . For example,
cast(‘1’ as BIGINT) will convert the string ‘1’ to its integral representation.
A null is returned if the conversion does not succeed.
If cast(expr as boolean) Hive returns true for a non-empty string.
hive> select cast('false' as boolean) from default.dule;
OK
true123

SparkSQL
在SparkSQL中如果是string的话,会检查StringUtils中枚举的;其他原子类型数据进行是否不等于0,不等于0返回true,否则为false
具体代码逻辑如下
classname: org.apache.spark.sql.catalyst.expressions.Cast

// UDFToBoolean
private[this] def castToBoolean(from: DataType): Any => Any = from match {
case StringType =>
buildCast[UTF8String](_, s => {
if (StringUtils.isTrueString(s)) {
true
} else if (StringUtils.isFalseString(s)) {
false
} else {
null
}
})
case TimestampType =>
buildCast[Long](_, t => t != 0)
case DateType =>
// Hive would return null when cast from date to boolean
buildCast[Int](_, d => null)
case LongType =>
buildCast[Long](_, _ != 0)
case IntegerType =>
buildCast[Int](_, _ != 0)
case ShortType =>
buildCast[Short](_, _ != 0)
case ByteType =>
buildCast[Byte](_, _ != 0)
case DecimalType() =>
buildCast[Decimal](_, !_.isZero)
case DoubleType =>
buildCast[Double](_, _ != 0)
case FloatType =>
buildCast[Float](_, _ != 0)
}

classname: org.apache.spark.sql.catalyst.util.StringUtils
//
private[this] val trueStrings = Set("t", "true", "y", "yes", "1").map(UTF8String.fromString)
private[this] val falseStrings = Set("f", "false", "n", "no", "0").map(UTF8String.fromString)

def isTrueString(s: UTF8String): Boolean = trueStrings.contains(s.toLowerCase)
def isFalseString(s: UTF8String): Boolean = falseStrings.contains(s.toLowerCase)
温馨提示:答案为网友推荐,仅供参考
相似回答