libs/math/doc/sf_and_dist/html/math_toolkit/dist/stat_tut/weg/cs_eg/chi_sq_size.html


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180

<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
<title>Estimating the Required Sample Sizes for a Chi-Square Test for the Standard Deviation</title>
<link rel="stylesheet" href="../../../../../../../../../../doc/src/boostbook.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.76.1">
<link rel="home" href="../../../../../index.html" title="Math Toolkit">
<link rel="up" href="../cs_eg.html" title="Chi Squared Distribution Examples">
<link rel="prev" href="chi_sq_test.html" title="Chi-Square Test for the Standard Deviation">
<link rel="next" href="../f_eg.html" title="F Distribution Examples">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<table cellpadding="2" width="100%"><tr>
<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../../../../boost.png"></td>
<td align="center"><a href="../../../../../../../../../../index.html">Home</a></td>
<td align="center"><a href="../../../../../../../../../../libs/libraries.htm">Libraries</a></td>
<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
<td align="center"><a href="../../../../../../../../../../more/index.htm">More</a></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="chi_sq_test.html"><img src="../../../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../cs_eg.html"><img src="../../../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../../index.html"><img src="../../../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="../f_eg.html"><img src="../../../../../../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
<div class="section math_toolkit_dist_stat_tut_weg_cs_eg_chi_sq_size">
<div class="titlepage"><div><div><h6 class="title">
<a name="math_toolkit.dist.stat_tut.weg.cs_eg.chi_sq_size"></a><a class="link" href="chi_sq_size.html" title="Estimating the Required Sample Sizes for a Chi-Square Test for the Standard Deviation">Estimating
            the Required Sample Sizes for a Chi-Square Test for the Standard Deviation</a>
</h6></div></div></div>
<p>
              Suppose we conduct a Chi Squared test for standard deviation and the
              result is borderline, a legitimate question to ask is "How large
              would the sample size have to be in order to produce a definitive result?"
            </p>
<p>
              The class template <a class="link" href="../../../dist_ref/dists/chi_squared_dist.html" title="Chi Squared Distribution">chi_squared_distribution</a>
              has a static method <code class="computeroutput"><span class="identifier">find_degrees_of_freedom</span></code>
              that will calculate this value for some acceptable risk of type I failure
              <span class="emphasis"><em>alpha</em></span>, type II failure <span class="emphasis"><em>beta</em></span>,
              and difference from the standard deviation <span class="emphasis"><em>diff</em></span>.
              Please note that the method used works on variance, and not standard
              deviation as is usual for the Chi Squared Test.
            </p>
<p>
              The code for this example is located in <a href="../../../../../../../../example/chi_square_std_dev_test.cpp" target="_top">chi_square_std_dev_test.cpp</a>.
            </p>
<p>
              We begin by defining a procedure to print out the sample sizes required
              for various risk levels:
            </p>
<pre class="programlisting"><span class="keyword">void</span> <span class="identifier">chi_squared_sample_sized</span><span class="special">(</span>
     <span class="keyword">double</span> <span class="identifier">diff</span><span class="special">,</span>      <span class="comment">// difference from variance to detect</span>
     <span class="keyword">double</span> <span class="identifier">variance</span><span class="special">)</span>  <span class="comment">// true variance</span>
<span class="special">{</span>
</pre>
<p>
              The procedure begins by printing out the input data:
            </p>
<pre class="programlisting"><span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">std</span><span class="special">;</span>
<span class="keyword">using</span> <span class="keyword">namespace</span> <span class="identifier">boost</span><span class="special">::</span><span class="identifier">math</span><span class="special">;</span>

<span class="comment">// Print out general info:</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span>
   <span class="string">"_____________________________________________________________\n"</span>
   <span class="string">"Estimated sample sizes required for various confidence levels\n"</span>
   <span class="string">"_____________________________________________________________\n\n"</span><span class="special">;</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">setprecision</span><span class="special">(</span><span class="number">5</span><span class="special">);</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">40</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">left</span> <span class="special">&lt;&lt;</span> <span class="string">"True Variance"</span> <span class="special">&lt;&lt;</span> <span class="string">"=  "</span> <span class="special">&lt;&lt;</span> <span class="identifier">variance</span> <span class="special">&lt;&lt;</span> <span class="string">"\n"</span><span class="special">;</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">40</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">left</span> <span class="special">&lt;&lt;</span> <span class="string">"Difference to detect"</span> <span class="special">&lt;&lt;</span> <span class="string">"=  "</span> <span class="special">&lt;&lt;</span> <span class="identifier">diff</span> <span class="special">&lt;&lt;</span> <span class="string">"\n"</span><span class="special">;</span>
</pre>
<p>
              And defines a table of significance levels for which we'll calculate
              sample sizes:
            </p>
<pre class="programlisting"><span class="keyword">double</span> <span class="identifier">alpha</span><span class="special">[]</span> <span class="special">=</span> <span class="special">{</span> <span class="number">0.5</span><span class="special">,</span> <span class="number">0.25</span><span class="special">,</span> <span class="number">0.1</span><span class="special">,</span> <span class="number">0.05</span><span class="special">,</span> <span class="number">0.01</span><span class="special">,</span> <span class="number">0.001</span><span class="special">,</span> <span class="number">0.0001</span><span class="special">,</span> <span class="number">0.00001</span> <span class="special">};</span>
</pre>
<p>
              For each value of alpha we can calculate two sample sizes: one where
              the sample variance is less than the true value by <span class="emphasis"><em>diff</em></span>
              and one where it is greater than the true value by <span class="emphasis"><em>diff</em></span>.
              Thanks to the asymmetric nature of the Chi Squared distribution these
              two values will not be the same, the difference in their calculation
              differs only in the sign of <span class="emphasis"><em>diff</em></span> that's passed
              to <code class="computeroutput"><span class="identifier">find_degrees_of_freedom</span></code>.
              Finally in this example we'll simply things, and let risk level <span class="emphasis"><em>beta</em></span>
              be the same as <span class="emphasis"><em>alpha</em></span>:
            </p>
<pre class="programlisting"><span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="string">"\n\n"</span>
        <span class="string">"_______________________________________________________________\n"</span>
        <span class="string">"Confidence       Estimated          Estimated\n"</span>
        <span class="string">" Value (%)      Sample Size        Sample Size\n"</span>
        <span class="string">"                (lower one         (upper one\n"</span>
        <span class="string">"                 sided test)        sided test)\n"</span>
        <span class="string">"_______________________________________________________________\n"</span><span class="special">;</span>
<span class="comment">//</span>
<span class="comment">// Now print out the data for the table rows.</span>
<span class="comment">//</span>
<span class="keyword">for</span><span class="special">(</span><span class="keyword">unsigned</span> <span class="identifier">i</span> <span class="special">=</span> <span class="number">0</span><span class="special">;</span> <span class="identifier">i</span> <span class="special">&lt;</span> <span class="keyword">sizeof</span><span class="special">(</span><span class="identifier">alpha</span><span class="special">)/</span><span class="keyword">sizeof</span><span class="special">(</span><span class="identifier">alpha</span><span class="special">[</span><span class="number">0</span><span class="special">]);</span> <span class="special">++</span><span class="identifier">i</span><span class="special">)</span>
<span class="special">{</span>
   <span class="comment">// Confidence value:</span>
   <span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">fixed</span> <span class="special">&lt;&lt;</span> <span class="identifier">setprecision</span><span class="special">(</span><span class="number">3</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">10</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">right</span> <span class="special">&lt;&lt;</span> <span class="number">100</span> <span class="special">*</span> <span class="special">(</span><span class="number">1</span><span class="special">-</span><span class="identifier">alpha</span><span class="special">[</span><span class="identifier">i</span><span class="special">]);</span>
   <span class="comment">// calculate df for a lower single sided test:</span>
   <span class="keyword">double</span> <span class="identifier">df</span> <span class="special">=</span> <span class="identifier">chi_squared</span><span class="special">::</span><span class="identifier">find_degrees_of_freedom</span><span class="special">(</span>
      <span class="special">-</span><span class="identifier">diff</span><span class="special">,</span> <span class="identifier">alpha</span><span class="special">[</span><span class="identifier">i</span><span class="special">],</span> <span class="identifier">alpha</span><span class="special">[</span><span class="identifier">i</span><span class="special">],</span> <span class="identifier">variance</span><span class="special">);</span>
   <span class="comment">// convert to sample size:</span>
   <span class="keyword">double</span> <span class="identifier">size</span> <span class="special">=</span> <span class="identifier">ceil</span><span class="special">(</span><span class="identifier">df</span><span class="special">)</span> <span class="special">+</span> <span class="number">1</span><span class="special">;</span>
   <span class="comment">// Print size:</span>
   <span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">fixed</span> <span class="special">&lt;&lt;</span> <span class="identifier">setprecision</span><span class="special">(</span><span class="number">0</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">16</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">right</span> <span class="special">&lt;&lt;</span> <span class="identifier">size</span><span class="special">;</span>
   <span class="comment">// calculate df for an upper single sided test:</span>
   <span class="identifier">df</span> <span class="special">=</span> <span class="identifier">chi_squared</span><span class="special">::</span><span class="identifier">find_degrees_of_freedom</span><span class="special">(</span>
      <span class="identifier">diff</span><span class="special">,</span> <span class="identifier">alpha</span><span class="special">[</span><span class="identifier">i</span><span class="special">],</span> <span class="identifier">alpha</span><span class="special">[</span><span class="identifier">i</span><span class="special">],</span> <span class="identifier">variance</span><span class="special">);</span>
   <span class="comment">// convert to sample size:</span>
   <span class="identifier">size</span> <span class="special">=</span> <span class="identifier">ceil</span><span class="special">(</span><span class="identifier">df</span><span class="special">)</span> <span class="special">+</span> <span class="number">1</span><span class="special">;</span>
   <span class="comment">// Print size:</span>
   <span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">fixed</span> <span class="special">&lt;&lt;</span> <span class="identifier">setprecision</span><span class="special">(</span><span class="number">0</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">setw</span><span class="special">(</span><span class="number">16</span><span class="special">)</span> <span class="special">&lt;&lt;</span> <span class="identifier">right</span> <span class="special">&lt;&lt;</span> <span class="identifier">size</span> <span class="special">&lt;&lt;</span> <span class="identifier">endl</span><span class="special">;</span>
<span class="special">}</span>
<span class="identifier">cout</span> <span class="special">&lt;&lt;</span> <span class="identifier">endl</span><span class="special">;</span>
</pre>
<p>
              For some example output, consider the <a href="http://www.itl.nist.gov/div898/handbook/prc/section2/prc23.htm" target="_top">silicon
              wafer data</a> from the <a href="http://www.itl.nist.gov/div898/handbook/" target="_top">NIST/SEMATECH
              e-Handbook of Statistical Methods.</a>. In this scenario a supplier
              of 100 ohm.cm silicon wafers claims that his fabrication process can
              produce wafers with sufficient consistency so that the standard deviation
              of resistivity for the lot does not exceed 10 ohm.cm. A sample of N
              = 10 wafers taken from the lot has a standard deviation of 13.97 ohm.cm,
              and the question we ask ourselves is "How large would our sample
              have to be to reliably detect this difference?".
            </p>
<p>
              To use our procedure above, we have to convert the standard deviations
              to variance (square them), after which the program output looks like
              this:
            </p>
<pre class="programlisting">_____________________________________________________________
Estimated sample sizes required for various confidence levels
_____________________________________________________________

True Variance                           =  100.00000
Difference to detect                    =  95.16090


_______________________________________________________________
Confidence       Estimated          Estimated
 Value (%)      Sample Size        Sample Size
                (lower one         (upper one
                 sided test)        sided test)
_______________________________________________________________
    50.000               2               2
    75.000               2              10
    90.000               4              32
    95.000               5              51
    99.000               7              99
    99.900              11             174
    99.990              15             251
    99.999              20             330
</pre>
<p>
              In this case we are interested in a upper single sided test. So for
              example, if the maximum acceptable risk of falsely rejecting the null-hypothesis
              is 0.05 (Type I error), and the maximum acceptable risk of failing
              to reject the null-hypothesis is also 0.05 (Type II error), we estimate
              that we would need a sample size of 51.
            </p>
</div>
<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
<td align="left"></td>
<td align="right"><div class="copyright-footer">Copyright &#169; 2006-2010 John Maddock, Paul A. Bristow, Hubert Holin, Xiaogang Zhang, Bruno
      Lalande, Johan R&#229;de, Gautam Sewani, Thijs van den Berg and Benjamin Sobotta<p>
        Distributed under the Boost Software License, Version 1.0. (See accompanying
        file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
      </p>
</div></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="chi_sq_test.html"><img src="../../../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../cs_eg.html"><img src="../../../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../../index.html"><img src="../../../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="../f_eg.html"><img src="../../../../../../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
</body>
</html>