三、区间估计:使用Python进行两个正态总体参数的区间估计

    技术2022-07-11  112

    设样本 ( X 1 , . . . , X n 1 ) (X_1, ..., X_{n1}) (X1,...,Xn1) ( Y 1 , . . . , Y n 2 ) (Y_1,...,Y_{n2}) (Y1,...,Yn2)分别来自总体 N ( μ 1 , σ 1 2 ) N(\mu_1, \sigma1^2) N(μ1,σ12) N ( μ 2 , σ 2 2 ) N(\mu_2, \sigma_2^2) N(μ2,σ22),并且它们相互独立. 样本均值分别为 X ‾ , Y ‾ \overline X, \overline Y X,Y; 样本方差分别为 S 1 2 , S 2 2 S_1^2, S_2^2 S12,S22. 置信水平为 1 − α 1-\alpha 1α.

    1. μ 1 − μ 2 \mu_1-\mu_2 μ1μ2的置信区间

    1.1. σ 1 2 , σ 2 2 \sigma_1^2 , \sigma_2^2 σ12,σ22已知时

    μ 1 − μ 2 \mu_1 - \mu_2 μ1μ2的估计是 X ‾ − Y ‾ \overline X - \overline Y XY的分布,得枢轴量: ( x ‾ − y ‾ ) − ( μ 1 − μ 2 ) σ 1 2 n 1 + σ 2 2 n 2 ∼ N ( 0 , 1 ) \frac{(\overline x - \overline y)-(\mu_1-\mu_2)}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}}\sim N(0,1) n1σ12+n2σ22 (xy)(μ1μ2)N(0,1) 得其置信区间为: ( X ‾ − Y ‾ ) ± Z α / 2 σ 1 2 n 1 + σ 2 2 n 2 (\overline X - \overline Y) \pm Z_{\alpha/2}\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}} (XY)±Zα/2n1σ12+n2σ22

    1.2. σ 1 2 = σ 2 2 \sigma_1^2 = \sigma_2 ^2 σ12=σ22且未知

    S w 2 = ( n 1 − 1 ) S 1 2 + ( n 2 − 1 ) S 2 2 n 1 + n 2 − 2 S_w^2=\frac{(n_1-1)S_1^2 +(n_2-1)S_2^2}{n_1+n_2-2} Sw2=n1+n22(n11)S12+(n21)S22代替 σ 2 \sigma^2 σ2得到枢轴量: ( X ‾ − Y ‾ ) − ( μ 1 − μ 2 ) S w 1 n 1 + 1 n 2 ∼ t ( n 1 + n 2 − 2 ) \frac{(\overline X - \overline Y)-(\mu_1 - \mu_2)}{S_w\sqrt{\frac{1}{n_1}+\frac{1}{n_2}}}\sim t(n_1+n_2 -2) Swn11+n21 (XY)(μ1μ2)t(n1+n22) 得其置信区间为: ( X ‾ − Y ‾ ) ± t α / 2 ( n 1 + n 2 − 2 ) S w 1 n 1 + 1 n 2 (\overline X - \overline Y)\pm t_{\alpha/2}(n_1+n_2 -2)S_w\sqrt{\frac{1}{n_1}+\frac{1}{n_2}} (XY)±tα/2(n1+n22)Swn11+n21

    1.3. σ 1 2 ≠ σ 2 2 \sigma_1^2 \neq \sigma_2^2 σ12=σ22且未知

    S 1 2 S_1^2 S12估计 σ 1 2 \sigma_1^2 σ12, 以 S 2 2 估 计 σ 2 2 S_2^2估计\sigma_2^2 S22σ22 当样本量 n 1 n_1 n1 n 2 n_2 n2都充分大时(一般要>30), ( X ‾ − Y ‾ ) − ( μ 1 − μ 2 ) S 1 2 n 1 + S 1 2 n 2 ∼ N ( 0 , 1 ) \frac{(\overline X - \overline Y)-(\mu_1 - \mu_2)}{\sqrt{\frac{S_1^2}{n_1}+\frac{S_1^2}{n_2}}}\sim N(0,1) n1S12+n2S12 (XY)(μ1μ2)N(0,1) 得其近似置信区间: ( X ‾ − Y ‾ ) ± Z α / 2 S 1 2 n 1 + S 2 2 n 2 (\overline X - \overline Y)\pm Z_{\alpha/2}\sqrt{\frac{S_1^2}{n_1}+\frac{S_2^2}{n_2}} (XY)±Zα/2n1S12+n2S22 当样本量很小的时 ( X ‾ − Y ‾ ) − ( μ 1 − μ 2 ) S 1 2 n 1 + S 2 2 n 2 ∼ t ( k ) \frac{(\overline X - \overline Y)-(\mu_1-\mu_2)}{\sqrt{\frac{S_1^2}{n_1}+\frac{S_2^2}{n_2}}}\sim t(k) n1S12+n2S22 (XY)(μ1μ2)t(k) 其中 k ≈ m i n ( n 1 − 1 , n 2 − 1 ) k \approx min(n_1-1, n_2-1) kmin(n11,n21) 则其近似置信区间为: ( X ‾ − Y ‾ ) ± t α / 2 ( k ) S 1 2 n 1 + S 2 2 n 2 (\overline X - \overline Y) \pm t_{\alpha/2}(k)\sqrt{\frac{S_1^2}{n_1}+\frac{S_2^2}{n_2}} (XY)±tα/2(k)n1S12+n2S22

    2. σ 1 2 σ 2 2 \frac{\sigma_1^2}{\sigma_2^2} σ22σ12的置信区间( μ 1 , μ 2 \mu_1, \mu_2 μ1,μ2未知)

    σ 1 2 σ 2 2 \frac{\sigma_1^2}{\sigma_2^2} σ22σ12的估计 S 1 2 S 2 2 \frac{S_1^2}{S_2^2} S22S12得到枢轴量: S 1 1 / S 2 2 σ 1 2 / σ 2 2 ∼ F ( n 1 − 1 , n 2 − 1 ) \frac{S_1^1/S_2^2}{\sigma_1^2/\sigma_2^2}\sim F(n_1-1, n_2-1) σ12/σ22S11/S22F(n11,n21) 得其置信区间为: S 1 2 S 2 2 1 F α / 2 ( n 1 − 1 , n 2 − 1 ) , S 1 2 S 2 2 1 F 1 − α / 2 ( n 1 − 1 , n 2 − 1 ) \frac{S_1^2}{S_2^2}\frac{1}{F_{\alpha/2}(n_1-1, n_2-1)}, \frac{S_1^2}{S_2^2}\frac{1}{F_{1-\alpha/2}(n_1-1, n_2-1)} S22S12Fα/2(n11,n21)1,S22S12F1α/2(n11,n21)1

    3. Python代码对区间估计的实现

    3.1. 均值差的估计

    def confidence_interval_udif(data1, data2, sigma1=-1, sigma2=-2, alpha=0.05): xb1=np.mean(data1) xb2 = np.mean(data2) n1 = len(data1) n2 = len(data2) if sigma1>0 and sigma2 >0: # 方差已知 tmp = np.sqrt(sigma1**2/n1 + sigma2**2/n2) Z = stats.norm(loc=0., scale=1.) return ( (xb1-xb2) + tmp*Z.ppf(alpha/2), (xb1-xb2) - tmp*Z.ppf(alpha/2)) else: # 方差未知 if sigma1 == sigma2: #未知且相等 sw = ((n1-1)*np.var(data1, ddof=1) + (n2-1)*np.var(data2, ddof=1))/(n1+n2-2) tmp = np.sqrt(sw) * np.sqrt(1/n1 + 1/n2) T = stats.t(df=n1+n2-2) return ((xb1-xb2)+tmp*T.ppf(alpha/2), (xb1-xb2)-tmp*T.ppf(alpha/2)) else: # 未知且不等 tmp = np.sqrt(np.var(data1, ddof=1)/n1 + np.var(data2, ddof=1)/n2) k = np.min([n1-1, n2-1]) T = stats.t(df=k) return ((xb1-xb2)+tmp*T.ppf(alpha/2), (xb1-xb2)-tmp*T.ppf(alpha/2))

    3.2. 方差比的估计

    def confidence_interval_varRatio(data1, data2,alpha=0.05): n1 = len(data1) n2 = len(data2) tmp = np.var(data1, ddof=1)/np.var(data2, ddof=1) F = stats.f(dfn=n1-1, dfd=n2-1) return tmp/F.ppf(1-alpha/2),tmp/F.ppf(alpha/2)

    4 实例验证

    例: 两台机床生产同一型号滚珠,从甲机床生产的滚珠中取8个,从乙机床生产的滚珠中取9个,测得这些滚珠的直径(单位:毫米)如下: 甲机床:15.0, 14.8, 15.2, 15.4, 14.9, 15.1, 15.2, 14.8 乙机床:15.2, 15.0, 14.8, 15.1, 14.6, 14.8, 15.1, 14.5, 15.0 设两机床生产的滚珠直径分别为X, Y, 且 X ∼ N ( μ 1 , σ 1 2 ) , Y ∼ N ( μ 2 , σ 2 2 ) X\sim N(\mu_1, \sigma_1^2), Y\sim N(\mu_2, \sigma_2^2) XN(μ1,σ12),YN(μ2,σ22) 求置信水平为0.9的双侧置信区间: (1) σ 1 = 0.8 , σ 2 = 0.24 , \sigma_1=0.8, \sigma_2=0.24, σ1=0.8,σ2=0.24, μ 1 − μ 2 \mu_1 - \mu_2 μ1μ2的置信区间; (2) 若 σ 1 = σ 2 \sigma_1=\sigma_2 σ1=σ2且未知,求 μ 1 − μ 2 \mu_1 - \mu_2 μ1μ2的置信区间; (3) 若 σ 1 ≠ σ 2 \sigma_1 \neq \sigma_2 σ1=σ2, 求 μ 1 − μ 2 \mu_1 - \mu_2 μ1μ2的置信区间; (4) 若 μ 1 , μ 2 \mu_1, \mu_2 μ1,μ2未知, 求 σ 1 2 σ 2 2 \frac{\sigma_1^2}{\sigma_2^2} σ22σ12的置信区间. 解:(1)

    data1 = np.array([15.0, 14.8, 15.2, 15.4, 14.9, 15.1, 15.2, 14.8]) data2 = np.array([15.2, 15.0, 14.8, 15.1, 14.6, 14.8, 15.1, 14.5, 15.0]) confidence_interval_udif(data1, data2, 0.18, 0.24, 0.1) # 结果: (-0.018145559249408555, 0.31814555924941279)

    (2)

    data1 = np.array([15.0, 14.8, 15.2, 15.4, 14.9, 15.1, 15.2, 14.8]) data2 = np.array([15.2, 15.0, 14.8, 15.1, 14.6, 14.8, 15.1, 14.5, 15.0]) confidence_interval_udif(data1, data2, -1, -1, 0.1) # 结果: (-0.044246980022314808, 0.34424698002231907)

    (3)

    data1 = np.array([15.0, 14.8, 15.2, 15.4, 14.9, 15.1, 15.2, 14.8]) data2 = np.array([15.2, 15.0, 14.8, 15.1, 14.6, 14.8, 15.1, 14.5, 15.0]) confidence_interval_udif(data1, data2, -1, -2, 0.1) # 结果: (-0.058430983560407906, 0.35843098356041214)

    (4)

    data1 = np.array([15.0, 14.8, 15.2, 15.4, 14.9, 15.1, 15.2, 14.8]) data2 = np.array([15.2, 15.0, 14.8, 15.1, 14.6, 14.8, 15.1, 14.5, 15.0]) confidence_interval_varRatio(data1, data2,alpha=0.1) # 结果: (0.22712162982480297, 2.9620673328677332)

    5. 参考文件

    《概率论与数理统计》 浙大numpy and scipy documents

    6. 欢迎交流学习

    email: hflag@163.comqq: 532843488 本人一直从事《概率论与数理统计》的教学,欢迎遇到问题的童靴们联系我。
    Processed: 0.038, SQL: 9