Spark-MLlib 学习入门到掌握-Interaction特征交互[22]

    技术2022-07-12  88

    Interaction是一个Transformer。它使用向量或double列,并生成单个向量列,其中包含每个输入列的一个值的所有组合的乘积。例如,如果您有两个向量类型列,每个列有3个维度作为输入列,那么您将获得一个9维向量作为输出列。

    def Interaction(){ import org.apache.spark.ml.feature.Interaction import org.apache.spark.ml.feature.VectorAssembler val spark: SparkSession = SparkSession.builder().appName("implicits").master("local[2]").getOrCreate() val df = spark.createDataFrame(Seq( (1, 1, 2, 3, 8, 4, 5), (2, 4, 3, 8, 7, 9, 8), (3, 6, 1, 9, 2, 3, 6), (4, 10, 8, 6, 9, 4, 5), (5, 9, 2, 7, 10, 7, 3), (6, 1, 1, 4, 2, 8, 4) )).toDF("id1", "id2", "id3", "id4", "id5", "id6", "id7") val assembler1 = new VectorAssembler(). setInputCols(Array("id2", "id3", "id4")). setOutputCol("vec1") val assembled1 = assembler1.transform(df) val assembler2 = new VectorAssembler(). setInputCols(Array("id5", "id6", "id7")). setOutputCol("vec2") val assembled2 = assembler2.transform(assembled1).select("id1", "vec1", "vec2") val interaction = new Interaction() .setInputCols(Array("id1", "vec1", "vec2")) .setOutputCol("interactedCol") val interacted = interaction.transform(assembled2) interacted.show(truncate = false) }

    运行结果

    Processed: 0.011, SQL: 9