Row(Catalyst Row)表示关系运算符的一行输出。它是一个通用行对象,具有有序的字段集合,可以通过索引(generic access by ordinal),字段名(primitive access)或使用Scala的模式匹配来访问。 要创建新Row,请在Java中使用RowFactory.create()或在Scala中使用Row.apply()。
Row的伴生对象提供工厂方法,可以从元素集合(apply),元素序列(fromSeq)和元组(fromTuple)创建Row实例。
import org.apache.spark.sql.Row // Create a Row from values. scala> Row(1, "hello") res0: org.apache.spark.sql.Row = [1,hello] // by apply scala> Row.apply(1, "hello") res0: org.apache.spark.sql.Row = [1,hello] // Created a Row from a Seq of values. scala> Row.fromSeq(Seq(1, "hello")) res1: org.apache.spark.sql.Row = [1,hello] // Created a Row from a Tuple of values. scala> Row.fromTuple((0, "hello")) res2: org.apache.spark.sql.Row = [0,hello]一般来说,我们可以通过索引(generic access by ordinal)的通用访问来访问Row值
import org.apache.spark.sql.Row scala> val row = Row(1, true, "a string", null) row: org.apache.spark.sql.Row = [1,true,a string,null] // by index scala> val firstValue = row(0) firstValue: Any = 1 scala> val fourthValue = row(3) fourthValue: Any = null // by get scala> val firstValue = row.get(0) firstValue: Any = 1 scala> val fourthValue = row.get(3) fourthValue: Any = null // by apply scala> val firstValue = row.apply(0) firstValue: Any = 1 scala> val fourthValue = row.apply(3) fourthValue: Any = null按顺序进行的通用访问(使用索引、apply或get)返回Any类型的值。可以使用带索引的getAs查询具有适当类型的字段。
val row = Row(1, "hello") scala> row.getAs[Int](0) res1: Int = 1 scala> row.getAs[String](1) res2: String = hello而在Scala中,还可以在模式匹配中提取Row对象中的字段。 例子如下:
scala> val res4= Row(1, "hello") match { case Row(key: Int, value: String) => key -> value } res4: (Int, String) = (1,hello)Row (Spark 2.4.3 JavaDoc) Row · The Internals of Spark SQL
