XGBoost

    技术2022-07-11  76

    文章目录

    Bagging和BoostingXGboost提升树——基于残差的训练模型构造目标函数树 f K ? f_{K}? fK? 树的复杂度 Ω ( f K ) ? \Omega(f_{K})? Ω(fK)?树的复杂度新的目标函数如何寻找树的形状?

    Bagging和Boosting

    相同点:都是集成模型。 不同点:1. Bagging是过拟合的弱分类器的集成,算各模型预测值的加权平均,如随机森林; 2. Boosting是欠拟合弱分类器的集成,算各模型预测值的和。

    XGboost提升树——基于残差的训练

    Example:

    训练数据 ( x , y ) (x, y) (x,y),得到预测值 y ^ 1 \hat{y}_{1} y^1,残差 y − y ^ 1 y-\hat{y}_{1} yy^1,较大;继续训练数据 ( x , y − y ^ 1 ) (x, y-\hat{y}_{1}) (x,yy^1),得到预测值 y ^ 2 \hat{y}_{2} y^2,残差 y − y ^ 1 − y ^ 2 y-\hat{y}_{1}-\hat{y}_{2} yy^1y^2,还挺大;继续训练数据 ( x , y − y ^ 1 − y ^ 2 ) (x, y-\hat{y}_{1}-\hat{y}_{2}) (x,yy^1y^2),得到预测值 y ^ 3 \hat{y}_{3} y^3,残差 y − y ^ 1 − y ^ 2 − y ^ 3 y-\hat{y}_{1}-\hat{y}_{2}-\hat{y}_{3} yy^1y^2y^3,小了.

    这样,就可以用 y ^ 1 + y ^ 2 + y ^ 3 \hat{y}_{1}+\hat{y}_{2}+\hat{y}_{3} y^1+y^2+y^3更准确地预测 y y y.

    模型

    y ^ ( 0 ) = f 0 ( x ) = 0 y ^ ( 1 ) = y ^ ( 0 ) + f 1 ( x ) = f 1 ( x ) ⋮ y ^ = y ^ ( K ) = y ^ ( K − 1 ) + f K ( x ) = ∑ k = 1 K f k ( x i ) \begin{aligned} \hat{y}^{(0)} &= f_{0}(x) = 0 \\ \hat{y}^{(1)} &= \hat{y}^{(0)} + f_{1}(x) = f_{1}(x)\\ \vdots \\ {\color{red}\hat{y}} = \hat{y}^{(K)} &= \hat{y}^{(K-1)} + f_{K}(x) = \sum_{k=1}^{K}f_{k}(x_{i})\\ \end{aligned} y^(0)y^(1)y^=y^(K)=f0(x)=0=y^(0)+f1(x)=f1(x)=y^(K1)+fK(x)=k=1Kfk(xi)

    构造目标函数

    L = ∑ i = 1 n l ( y i , y ^ i ) + ∑ k = 1 K Ω ( f k ) \begin{aligned} L &= \sum_{i=1}^{n}l(y_{i},\hat{y}_{i}) + \sum_{k=1}^{K}\Omega(f_{k}) \end{aligned} L=i=1nl(yi,y^i)+k=1KΩ(fk) 其中, l ( y i , y ^ i ) l(y_{i},\hat{y}_{i}) l(yi,y^i) 表示第i个样本的损失函数, Ω ( f k ) \Omega(f_{k}) Ω(fk) 表示第k棵树的复杂度。 在生成前K-1棵树之后,要训练第K棵树,那么: L = ∑ i = 1 n l ( y i , y ^ i ( K − 1 ) + f K ( x i ) ) + ∑ k = 1 K Ω ( f k ) ≈ ∑ i = 1 n { l ( y i , y ^ i ( K − 1 ) ) + ∂ y ^ i ( K − 1 ) l ( y i , y ^ i ( K − 1 ) ) f K ( x i ) + 1 2 ∂ y ^ i ( K − 1 ) 2 l ( y i , y ^ i ( K − 1 ) ) f K 2 ( x i ) } + ∑ k = 1 K Ω ( f k ) \begin{aligned} L &=\sum_{i=1}^{n} l\left(y_{i},\hat{y}_{i}^{(K-1)}+f_{K}(x_{i})\right) + \sum_{k=1}^{K}\Omega(f_{k})\\ &\approx \sum_{i=1}^{n} \left\{ l\left(y_{i},\hat{y}_{i}^{(K-1)}\right) + \partial_{\hat{y}_{i}^{(K-1)}} l\left(y_{i},\hat{y}_{i}^{(K-1)}\right)f_{K}(x_{i}) \right. \\ &\left.\quad+ \frac{1}{2}\partial^{2}_{\hat{y}_{i}^{(K-1)}} l\left(y_{i},\hat{y}_{i}^{(K-1)}\right)f_{K}^{2}(x_{i}) \right\}+ \sum_{k=1}^{K}\Omega(f_{k}) \end{aligned} L=i=1nl(yi,y^i(K1)+fK(xi))+k=1KΩ(fk)i=1n{l(yi,y^i(K1))+y^i(K1)l(yi,y^i(K1))fK(xi)+21y^i(K1)2l(yi,y^i(K1))fK2(xi)}+k=1KΩ(fk) 由于前K-1棵树的部分已知,所以上述目标函数可以简写为: L ( K ) = ∑ i = 1 n { g i f K ( x i ) + 1 2 h i f K 2 ( x i ) } + Ω ( f K ) , L^{(K)} = \sum_{i=1}^{n} \left\{ g_{i}f_{K}(x_{i}) + \frac{1}{2}h_{i}f_{K}^{2}(x_{i}) \right\} + \Omega(f_{K}), L(K)=i=1n{gifK(xi)+21hifK2(xi)}+Ω(fK), 其中, g i = ∂ y ^ i ( K − 1 ) l ( y i , y ^ i ( K − 1 ) ) , h i = ∂ y ^ i ( K − 1 ) 2 l ( y i , y ^ i ( K − 1 ) ) . \begin{aligned} g_{i} &= \partial_{\hat{y}_{i}^{(K-1)}} l\left(y_{i},\hat{y}_{i}^{(K-1)}\right), \\ h_{i} &= \partial^{2}_{\hat{y}_{i}^{(K-1)}} l\left(y_{i},\hat{y}_{i}^{(K-1)}\right). \end{aligned} gihi=y^i(K1)l(yi,y^i(K1)),=y^i(K1)2l(yi,y^i(K1)).

    f K ? f_{K}? fK? 树的复杂度 Ω ( f K ) ? \Omega(f_{K})? Ω(fK)?

    在树的形状已知的情况下,用 w = ( w 1 , ⋯   , w T ) w=(w_{1}, \cdots, w_{T}) w=(w1,,wT)表示每个叶节点的值,用 q ( x i ) q(x_{i}) q(xi)表示每个样本点所在的叶节点的位置,记 I j = { i ∣ q ( x i ) = j } I_{j} = \{ i \mid q(x_{i})=j \} Ij={iq(xi)=j} f K ( x i ) = w q ( x i ) f_{K}(x_{i}) = w_{q(x_{i})} fK(xi)=wq(xi)

    树的复杂度

    Ω ( f K ) = γ T + 1 2 λ ∑ j = 1 T w j 2 \Omega(f_{K}) = \gamma T + \frac{1}{2}\lambda\sum_{j=1}^{T}w_{j}^{2} Ω(fK)=γT+21λj=1Twj2 其中,T是叶节点个数。

    新的目标函数

    f K f_{K} fK Ω ( f K ) \Omega(f_{K}) Ω(fK) 代入损失函数得: o b j = L ( K ) = ∑ i = 1 n { g i f K ( x i ) + 1 2 h i f K 2 ( x i ) } + γ T + 1 2 λ ∑ j = 1 T w j 2 = ∑ i = 1 n { g i w q ( x i ) + 1 2 h i w q ( x i ) 2 } + γ T + 1 2 λ ∑ j = 1 T w j 2 = ∑ j = 1 T [ ( ∑ i ∈ I j g i ) w j + 1 2 ( ∑ i ∈ I j h i + λ ) w j 2 ] + γ T \begin{aligned} obj = L^{(K)} &= \sum_{i=1}^{n} \left\{ g_{i}f_{K}(x_{i}) + \frac{1}{2}h_{i}f_{K}^{2}(x_{i}) \right\} + \gamma T + \frac{1}{2}\lambda\sum_{j=1}^{T}w_{j}^{2} \\ &= \sum_{i=1}^{n} \left\{ g_{i}w_{q(x_{i})}+ \frac{1}{2}h_{i}w_{q(x_{i})}^{2} \right\} + \gamma T + \frac{1}{2}\lambda\sum_{j=1}^{T}w_{j}^{2} \\ &= \sum_{j=1}^{T}\left[ \left(\sum_{i\in I_{j}}g_{i}\right)w_{j} + \frac{1}{2}\left(\sum_{i\in I_{j}}h_{i}+\lambda\right)w_{j}^{2}\right] + \gamma T \end{aligned} obj=L(K)=i=1n{gifK(xi)+21hifK2(xi)}+γT+21λj=1Twj2=i=1n{giwq(xi)+21hiwq(xi)2}+γT+21λj=1Twj2=j=1TiIjgiwj+21iIjhi+λwj2+γT 所以 w j ∗ = − G j H j + λ o b j ∗ = − 1 2 ∑ j = 1 T G j 2 H j + λ + γ T w_{j}^* = -\frac{G_{j}}{H_{j}+\lambda} \\ obj^*=-\frac{1}{2}\sum_{j=1}^{T}\frac{G_{j}^{2}}{H_{j}+\lambda} + \gamma T wj=Hj+λGjobj=21j=1THj+λGj2+γT 其中 G j = ∑ i ∈ I j g i , H j = ∑ i ∈ I j h i G_{j} = \sum\limits_{i\in I_{j}}g_{i}, H_{j} = \sum\limits_{i\in I_{j}}h_{i} Gj=iIjgi,Hj=iIjhi. 所以在树的形状已知的情况下可以求出最优参数 (即每个叶节点的值)。

    如何寻找树的形状?

    贪心算法:每次都找到一个特征k,使得最小化 o b j ∗ ( n e w ) obj^{*(new)} obj(new),也就是最大化 o b j ∗ ( o l d ) − o b j ∗ ( n e w ) obj^{*(old)}-obj^{*(new)} obj(old)obj(new) 其中, o b j ∗ ( o l d ) − o b j ∗ ( n e w ) = − 1 2 [ ( ∑ i ∈ I L g i ) 2 ∑ i ∈ I L h i + λ + ( ∑ i ∈ I R g i ) 2 ∑ i ∈ I R h i + λ − ( ∑ i ∈ I g i ) 2 ∑ i ∈ I h i + λ ] − γ \begin{aligned} obj^{*(old)}-obj^{*(new)} = -\frac{1}{2}\left[\frac{(\sum_{i\in I_L}g_{i})^{2}}{\sum_{i\in I_L}h_{i}+\lambda} + \frac{(\sum_{i\in I_R}g_{i})^{2}}{\sum_{i\in I_R}h_{i}+\lambda} - \frac{(\sum_{i\in I}g_{i})^{2}}{\sum_{i\in I}h_{i}+\lambda} \right] - \gamma \end{aligned} obj(old)obj(new)=21[iILhi+λ(iILgi)2+iIRhi+λ(iIRgi)2iIhi+λ(iIgi)2]γ

    Processed: 0.014, SQL: 9