Skleran-线性模型-logistic 回归

技术2022-07-10 228

logistic 回归sklearn.linear_model.LogisticRegression

logistic 回归

logistic 回归，虽然名字里有 “回归” 二字，但实际上是解决分类问题的一类线性模型。在某些文献中，logistic 回归又被称作 logit 回归，maximum-entropy classification（MaxEnt，最大熵分类），或 log-linear classifier（对数线性分类器）。该模型利用函数 logistic function 将单次试验（single trial）的可能结果输出为概率。 scikit-learn 中 logistic 回归在 LogisticRegression 类中实现了二分类（binary）、一对多分类（one-vs-rest）及多项式 logistic 回归，并带有可选的 L1 和 L2 正则化。

在 LogisticRegression 类中实现了这些优化算法: liblinear， newton-cg， lbfgs， sag 和 saga。 liblinear应用了坐标下降算法（Coordinate Descent, CD），并基于 scikit-learn 内附的高性能 C++ 库 LIBLINEAR library 实现。不过 CD 算法训练的模型不是真正意义上的多分类模型，而是基于 “one-vs-rest” 思想分解了这个优化问题，为每个类别都训练了一个二元分类器。因为实现在底层使用该求解器的 LogisticRegression 实例对象表面上看是一个多元分类器。 sklearn.svm.l1_min_c 可以计算使用 L1时 C 的下界，以避免模型为空（即全部特征分量的权重为零）。 lbfgs, sag 和 newton-cg 求解器只支持 L2罚项以及无罚项，对某些高维数据收敛更快。这些求解器的参数 multi_class设为 multinomial 即可训练一个真正的多项式 logistic 回归 [5] ，其预测的概率比默认的 “one-vs-rest” 设定更为准确。 sag 求解器基于平均随机梯度下降算法（Stochastic Average Gradient descent） [6]。在大数据集上的表现更快，大数据集指样本量大且特征数多。 saga 求解器 [7] 是 sag 的一类变体，它支持非平滑（non-smooth）的 L1 正则选项 penalty=“l1” 。因此对于稀疏多项式 logistic 回归，往往选用该求解器。saga求解器是唯一支持弹性网络正则选项的求解器。 lbfgs是一种近似于Broyden–Fletcher–Goldfarb–Shanno算法[8]的优化算法，属于准牛顿法。lbfgs求解器推荐用于较小的数据集，对于较大的数据集，它的性能会受到影响。

默认情况下，lbfgs求解器鲁棒性占优。对于大型数据集，saga求解器通常更快。对于大数据集，还可以用 SGDClassifier ，并使用对数损失（log loss）这可能更快，但需要更多的调优。

LogisticRegressionCV 对 logistic 回归的实现内置了交叉验证（cross-validation），可以找出最优的 C和l1_ratio参数。newton-cg， sag， saga 和 lbfgs 在高维数据上更快，这是因为采用了热启动（warm-starting）。

sklearn.linear_model.LogisticRegression

class sklearn.linear_model.LogisticRegression(penalty='l2', *, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=100, multi_class='auto', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None) penalty：{‘l1’, ‘l2’, ‘elasticnet’, ‘none’}, default=’l2’ 用于指定处罚中使用的规范。 dual：bool, default=False 双重的或原始的公式。对偶公式仅在使用liblinear解算器的l2罚函数中实现。当n个样本>n个特征时，首选dual=False。 tol：float, default=1e-4 停止标准的公差。 C：float, default=1.0 正则化强度的倒数；必须是正浮点数。与支持向量机一样，较小的值指定更强的正则化。 fit_intercept：bool, default=True 指定是否应将常数（即偏差或截距）添加到决策函数中。 intercept_scaling：float, default=1 仅当使用解算器“liblinear”并且自拟合截距设置为True。 class_weight：dict or ‘balanced’, default=None 与{class_label:weight}形式的类关联的权重。如果没有给出，所有的类都应该有一个权重。 “平衡”模式使用y值自动调整与输入数据中的类频率成反比的权重，作为n_samples / (n_classes * np.bincount(y)). 注意，如果指定了样本权重，则这些权重将与样本权重相乘（通过拟合方法）。 random_state：int, RandomState instance, default=None 当solver==‘sag’、‘saga’或‘liblinear’时使用，以洗牌数据。 solver：{‘newton-cg’, ‘lbfgs’, ‘liblinear’, ‘sag’, ‘saga’}, default=’lbfgs’ 用于优化问题的算法。对于小型数据集，“liblinear”是一个不错的选择，而“sag”和“sag a”对于大型数据集则更快。对于多类问题，只有“newton cg”、“sag”、“saga”和“lbfgs”处理多项式损失； “liblinear”仅限于一对rest格式。 “牛顿CG”、“LBFGS”、“SAG”和“SAGA”句柄L2或没有惩罚 “liblinear”和“saga”也处理L1惩罚 “saga”也支持“elasticnet”惩罚 'liblinear'不支持设置惩罚'none' max_iter：int, default=100 解算器收敛所需的最大迭代次数。 multi_class：{‘auto’, ‘ovr’, ‘multinomial’}, default=’auto’ 如果选择的选项是“ovr”，则每个标签都适合一个二进制问题。对于“多项式”而言，即使数据是二进制的，损失最小化也是整个概率分布的多项式损失。当solver=“liblinear”时，“多项式”不可用。如果数据是二进制的，或者solver=liblinear，则“auto”选择“ovr”，否则选择“多项式”。 verbose：int, default=0 对于liblinear和lbfgs解算器，将verbose设置为verbosity的任何正数。 warm_start：bool, default=False 当设置为True时，重用上一个调用的解决方案以适应初始化，否则，只需删除上一个解决方案。对liblinear解算器无效。 n_jobs：int, default=None 如果multi“ovr”，则在类上并行时使用的CPU核心数。当解算器设置为“liblinear”时，不管是否指定了“multi_class”，都会忽略此参数。-1表示使用所有处理器。 l1_ratio：float, default=None 弹性净混合参数

属性：

classes_：ndarray of shape (n_classes, ) 分类器已知的类标签列表。 coef_：ndarray of shape (1, n_features) or (n_classes, n_features) 决策函数中的特征系数。 intercept_：ndarray of shape (1,) or (n_classes,) 将截距（也称为偏差）添加到决策函数中。 n_iter_：ndarray of shape (n_classes,) or (1, ) 所有类的实际迭代次数。

另见： SGDClassifier LogisticRegressionCV 例子：

>>> from sklearn.datasets import load_iris >>> from sklearn.linear_model import LogisticRegression >>> X, y = load_iris(return_X_y=True) >>> clf = LogisticRegression(random_state=0).fit(X, y) >>> clf.predict(X[:2, :]) array([0, 0]) >>> clf.predict_proba(X[:2, :]) array([[9.8...e-01, 1.8...e-02, 1.4...e-08], [9.7...e-01, 2.8...e-02, ...e-08]]) >>> clf.score(X, y) 0.97...

方法：

__init__(self, penalty='l2', *, dual=False, tol=0.0001, C=1.0, fit_intercept=True, intercept_scaling=1, class_weight=None, random_state=None, solver='lbfgs', max_iter=100, multi_class='auto', verbose=0, warm_start=False, n_jobs=None, l1_ratio=None) decision_function(self, X) 预测样本的置信度得分。参数： X：array_like or sparse matrix, shape (n_samples, n_features) 样本返回： array, shape=(n_samples,) if n_classes == 2 else (n_samples, n_classes) 每个（样本，类）组合的置信度得分 densify(self) 将系数矩阵转换为密集数组格式。返回 self 拟合估计量。 fit(self, X, y, sample_weight=None) 根据给定的训练数据拟合模型。参数： X：array-like of shape (n_samples, n_features) 训练数据 Y：array-like of shape (n_samples,) or (n_samples, n_targets) 目标值返回： self：object 返回self的实例。 get_params(self, deep=True) 获取此估计器的参数。参数： deep：bool, default=True 如果为True，则返回此估计器的参数以及包含的子对象（即估计器）。返回： params：mapping of string to any 映射到其值的参数名。 predict(self, X) 参数： X：array_like or sparse matrix, shape (n_samples, n_features) 返回： C：array, shape (n_samples,) 返回预测值 predict_log_proba(self, X) 预测概率估计的对数。所有类的返回估计值按类的标签排序。参数 X：array-like of shape (n_samples, n_features) 返回 T：array-like of shape (n_samples, n_classes) 返回模型中每个类的样本的日志概率， predict_proba(self, X) 概率估计。所有类的返回估计值按类的标签排序。参数： X：array-like of shape (n_samples, n_features) 待评分向量，其中n_samples是样本数，n_features是特征数。返回 T：array-like of shape (n_samples, n_classes) 返回模型中每个类的样本概率，其中类按其所在的顺序排列 score(self, X, y, sample_weight=None) 返回预测的决定系数R^2。系数R^2定义为(1-u/v)，其中u是残差平方和((y_true-y_pred)**2).sum()， v是平方和的总和((y_true-y_true.mean())**2).sum()。最好的分数是1.0，它可以是负的(因为模型可以任意恶化)。一个常数模型总是预测y的期望值，而不考虑输入特性，则得到R^2分数为0.0。参数： X：array-like of shape (n_samples, n_features) 测试样本。 y：array-like of shape (n_samples,) or (n_samples, n_outputs) X的真值。 sample_weight：array-like of shape (n_samples,), default=None 样本权重。返回： scorefloat 得分 set_params(self, **params) 设置此估计器的参数。参数： **params：dict 估计参数返回： Self：object 估计实例

例子： Logistic回归中的L1罚项和稀疏系数

#!/usr/bin/env python # -*- coding: utf-8 -*- # @Time : 2020/6/30 20:37 # @Author : LaoChen """ 通过对不同C值采用L1、L2和弹性净罚时解的稀疏性(零系数百分比)的比较，可以看出C的较大值给模型带来了更大的自由度。相反，C的较小值对模型的约束更大。在L1中，这将导致更稀疏的解决方案。 Elastic-Net 惩罚的稀疏性介于L1和L2之间。 """ import numpy as np import matplotlib.pyplot as plt from sklearn.linear_model import LogisticRegression from sklearn import datasets from sklearn.preprocessing import StandardScaler X, y = datasets.load_digits(return_X_y=True) X = StandardScaler().fit_transform(X) # classify small against large digits y = (y > 4).astype(np.int) l1_ratio = 0.5 # L1 weight in the Elastic-Net regularization fig, axes = plt.subplots(3, 3) # Set regularization parameter for i, (C, axes_row) in enumerate(zip((1, 0.1, 0.01), axes)): # turn down tolerance for short training time clf_l1_LR = LogisticRegression(C=C, penalty='l1', tol=0.01, solver='saga') clf_l2_LR = LogisticRegression(C=C, penalty='l2', tol=0.01, solver='saga') clf_en_LR = LogisticRegression(C=C, penalty='elasticnet', solver='saga', l1_ratio=l1_ratio, tol=0.01) clf_l1_LR.fit(X, y) clf_l2_LR.fit(X, y) clf_en_LR.fit(X, y) coef_l1_LR = clf_l1_LR.coef_.ravel() coef_l2_LR = clf_l2_LR.coef_.ravel() coef_en_LR = clf_en_LR.coef_.ravel() # coef_l1_LR contains zeros due to the # L1 sparsity inducing norm sparsity_l1_LR = np.mean(coef_l1_LR == 0) * 100 sparsity_l2_LR = np.mean(coef_l2_LR == 0) * 100 sparsity_en_LR = np.mean(coef_en_LR == 0) * 100 print("C=%.2f" % C) print("{:<40} {:.2f}%".format("Sparsity with L1 penalty:", sparsity_l1_LR)) print("{:<40} {:.2f}%".format("Sparsity with Elastic-Net penalty:", sparsity_en_LR)) print("{:<40} {:.2f}%".format("Sparsity with L2 penalty:", sparsity_l2_LR)) print("{:<40} {:.2f}".format("Score with L1 penalty:", clf_l1_LR.score(X, y))) print("{:<40} {:.2f}".format("Score with Elastic-Net penalty:", clf_en_LR.score(X, y))) print("{:<40} {:.2f}".format("Score with L2 penalty:", clf_l2_LR.score(X, y))) if i == 0: axes_row[0].set_title("L1 penalty") axes_row[1].set_title("Elastic-Net\nl1_ratio = %s" % l1_ratio) axes_row[2].set_title("L2 penalty") for ax, coefs in zip(axes_row, [coef_l1_LR, coef_en_LR, coef_l2_LR]): ax.imshow(np.abs(coefs.reshape(8, 8)), interpolation='nearest', cmap='binary', vmax=1, vmin=0) ax.set_xticks(()) ax.set_yticks(()) axes_row[0].set_ylabel('C = %s' % C) plt.show()

Processed: 0.013, SQL: 9