python实现smote处理正负样本失衡问题

    技术2023-07-05  110

           机器学习中难免遇到正负样本不平衡问题,处理办法通常有梁总,一:过采样,增加正样本数据;二:欠采样,减少负样本数据,缺点是会丢失一些重要信息。smote属于过采样。

    代码

    # from imblearn.over_sampling import BorderlineSMOTE # from imblearn.over_sampling import SMOTENC # from imblearn.over_sampling import SVMSMOTE # from imblearn.over_sampling import KMeansSMOTE # from imblearn.over_sampling import ADASYN # from imblearn.over_sampling import RandomOverSampler import pandas as pd import numpy as np from collections import Counter from imblearn.over_sampling import SMOTE# 使用imlbearn库中上采样方法中的SMOTE接口 import matplotlib.pyplot as plt # 生成一组0和1比例为9比1的样本,X为特征,y为对应的标签 x1=[np.random.randint(1,31) for i in range(90)]+[np.random.randint(50,81) for i in range(10)] x2=[np.random.randint(1,31) for i in range(90)]+[np.random.randint(50,81) for i in range(10)] y=[0 for i in range(90)]+[1 for i in range(10)] x=pd.DataFrame({'x1':x1,'x2':x2}) y=pd.DataFrame(y) # 查看所生成的样本类别分布,0和1样本比例9比1,属于类别不平衡数据 print(Counter(list(y[0]))) fig1=plt.figure(1) plt.scatter(x['x1'],x['x2']) plt.show # 定义SMOTE模型,random_state相当于随机数种子的作用 smo = SMOTE(sampling_strategy='auto',random_state=10) x_smo, y_smo = smo.fit_sample(x, y) print(Counter(list(y_smo[0]))) fig2=plt.figure(2) plt.scatter(x_smo['x1'],x_smo['x2']) plt.show

    结果

    处理前

    处理后

    如果对你有帮助,请点下赞,予人玫瑰手有余香!

    时时仰望天空,理想就会离现实越来越近!

     

    Processed: 0.018, SQL: 9