seaborn变量分析+热力图

技术2022-07-12 103

单变量

直方图看数据分布

plt.figure(figsize=(12,5)) #默认图形 plt.subplot(141) sns.distplot(x) #不画直方图 plt.subplot(142) sns.distplot(x,hist=False) #不画线图 plt.subplot(143) sns.distplot(x,kde=False,bins=20) #设置fit plt.subplot(144) sns.distplot(x, kde=False, fit=stats.gamma)

多变量

使用seaborn自带数据集进行分析

sns.jointplot(x="sepal_length", y="sepal_width", data=iris);

当数据较多时，需要看数据聚集程度，可以用下面的方法

with sns.axes_style("white"): sns.jointplot(x="sepal_length", y="sepal_width", data=iris, kind="hex", color="k")

查看所有变量之间的相关性

sns.pairplot(iris)

分类变量

数图

#基本图 plt.subplot(221) sns.stripplot(x="day", y="total_bill", data=tips)#如果数据堆叠可以使用 jitter=True #树图 plt.subplot(222) sns.swarmplot(x="day", y="total_bill", data=tips) #查看分类 plt.subplot(223) sns.swarmplot(x="day", y="total_bill", hue="sex",data=tips) #横置 plt.subplot(224) sns.swarmplot(x="total_bill", y="day", hue="sex", data=tips);

盒图/小提琴图

#盒图 plt.subplot(121) sns.boxplot(x="day", y="total_bill", hue="sex", data=tips) #小提琴图 plt.subplot(122) sns.violinplot(x="day", y="total_bill", hue="sex", data=tips)

#在一个小提琴图里绘制变量对比 plt.subplot(121) sns.violinplot(x="day", y="total_bill", hue="sex", data=tips, split=True) #组合 plt.subplot(122) sns.violinplot(x="day", y="total_bill", data=tips, inner=None) sns.swarmplot(x="day", y="total_bill", data=tips, color="w", alpha=.5)

factorplot

factorplot是一封装了多种图形的函数

seaborn.factorplot(x=None, y=None, hue=None, data=None, row=None, col=None, col_wrap=None, estimator=, ci=95, n_boot=1000, units=None, order=None, hue_order=None, row_order=None, col_order=None, kind=‘point’, size=4, aspect=1, orient=None, color=None, palette=None, legend=True, legend_out=True, sharex=True, sharey=True, margin_titles=False, facet_kws=None, **kwargs)

x,y,hue 数据集变量变量名date 数据集数据集名row,col 更多分类变量进行平铺显示变量名col_wrap 每行的最高平铺数整数estimator 在每个分类中进行矢量到标量的映射矢量ci 置信区间浮点数或Nonen_boot 计算置信区间时使用的引导迭代次数整数units 采样单元的标识符，用于执行多级引导和重复测量设计数据变量或向量数据order, hue_order 对应排序列表字符串列表row_order, col_order 对应排序列表字符串列表kind : 可选：point 默认, bar 柱形图, count 频次, box 箱体, violin 提琴, strip 散点，swarm 分散点 size 每个面的高度（英寸）标量 aspect 纵横比标量 orient 方向 “v”/“h” color 颜色 matplotlib颜色 palette 调色板 seaborn颜色色板或字典 legend hue的信息面板 True/False legend_out 是否扩展图形，并将信息框绘制在中心右边 True/False share{x,y} 共享轴线 True/False

example

#盒图 sns.factorplot(x="time", y="total_bill", hue="smoker", col="day", data=tips, kind="box", size=4, aspect=.5)

热力图

#默认图形 plt.subplot(231) sns.heatmap(heat_data) #设置阈值，最大最小 plt.subplot(232) sns.heatmap(heat_data, vmin=0.2, vmax=0.5) #设置中心值 plt.subplot(233) sns.heatmap(heat_data, center=0) #显示数字 plt.subplot(234) sns.heatmap(heat_data, annot=True,fmt="f")#fmt="d"显示整数 #设置间隔 plt.subplot(235) sns.heatmap(heat_data, linewidths=.5) #设置颜色 plt.subplot(236) sns.heatmap(heat_data, cmap="YlGnBu")

Processed: 0.024, SQL: 9