Pandas 计算相关性系数corr()方式_Python

Pandas 计算相关性系数corr()方式

2022-07-16 18:38胡桃の壶 Python

这篇文章主要介绍了Pandas 计算相关性系数corr()方式，具有很好的参考价值，希望对大家有所帮助。如有错误或未考虑完全的地方，望不吝赐教

Pandas 计算相关性系数corr()

构造如下数据

				?

									import pandas as pd

									data = pd.DataFrame(

									    data={

									        'age': [2,7,10,16,20],

									        'height': [70,90,143,166,178],

									        'score': [98,37,103,76,53]

									    }

									)

使用corr()方法计算相关性系数：

				?

									# 计算相关性系数的列 必须都是数值型的数据！

									corr = data.loc[:, ["age","height"]].corr()        # 会计算两两列之间的相关性系数

									print("corr\n", corr)

									"""

									corr

									            age   height

									age     1.00000  0.96043

									height  0.96043  1.00000

									"""

自身与自身的相关性系数为 1

计算多列相关性系数：

				?

									corr = data.loc[:, ["age","height","score"]].corr()

									print("corr\n", corr)

									"""

									corr

									             age    height     score

									age     1.000000  0.960430 -0.340053

									height  0.960430  1.000000 -0.096782

									score  -0.340053 -0.096782  1.000000

									"""

pandas相关系数-DataFrame.corr()参数

				?

									DataFrame.corr(method='pearson', min_periods=1)

参数说明

method：可选值为{‘pearson’, ‘kendall’, ‘spearman’}
pearson：Pearson相关系数来衡量两个数据集合是否在一条线上面，即针对线性数据的相关系数计算，针对非线性数据便会有误差。
kendall：用于反映分类变量相关性的指标，即针对无序序列的相关系数，非正太分布的数据
spearman：非线性的，非正太分布的数据的相关系数
min_periods：样本最少的数据量
返回值：各类型之间的相关系数DataFrame表格。

为区分不同参数之间的区别，我们实验如下：

				?

									from pandas import DataFrame

									import pandas as pd

									x=[a for a in range(100)]

									#构造一元二次方程，非线性关系

									def y_x(x):

									    return 2*x**2+4

									y=[y_x(i) for i in x]

									data=DataFrame({'x':x,'y':y})

									#查看下data的数据结构

									data.head()

									Out[34]: 

									   x   y

									0  0   4

									1  1   6

									2  2  12

									3  3  22

									4  4  36

									data.corr()

									Out[35]: 

									          x         y

									x  1.000000  0.967736

									y  0.967736  1.000000

									data.corr(method='spearman')

									Out[36]: 

									     x    y

									x  1.0  1.0

									y  1.0  1.0

									data.corr(method='kendall')

									Out[37]: 

									     x    y

									x  1.0  1.0

									y  1.0  1.0