summaryrefslogtreecommitdiff
path: root/libs/math/doc/sf_and_dist/html/math_toolkit/dist/stat_tut/overview/generic.html
blob: 9d20e96040a778ac4cfc86bcc3eac860e479fd92 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
<title>Generic operations common to all distributions are non-member functions</title>
<link rel="stylesheet" href="../../../../../../../../../doc/src/boostbook.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.76.1">
<link rel="home" href="../../../../index.html" title="Math Toolkit">
<link rel="up" href="../overview.html" title="Overview of Distributions">
<link rel="prev" href="objects.html" title="Distributions are Objects">
<link rel="next" href="complements.html" title="Complements are supported too - and when to use them">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<table cellpadding="2" width="100%"><tr>
<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../../../../../../../boost.png"></td>
<td align="center"><a href="../../../../../../../../../index.html">Home</a></td>
<td align="center"><a href="../../../../../../../../../libs/libraries.htm">Libraries</a></td>
<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
<td align="center"><a href="../../../../../../../../../more/index.htm">More</a></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="objects.html"><img src="../../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../overview.html"><img src="../../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../index.html"><img src="../../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="complements.html"><img src="../../../../../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
<div class="section math_toolkit_dist_stat_tut_overview_generic">
<div class="titlepage"><div><div><h5 class="title">
<a name="math_toolkit.dist.stat_tut.overview.generic"></a><a class="link" href="generic.html" title="Generic operations common to all distributions are non-member functions">Generic
          operations common to all distributions are non-member functions</a>
</h5></div></div></div>
<p>
            Want to calculate the PDF (Probability Density Function) of a distribution?
            No problem, just use:
          </p>
<pre class="programlisting"><span class="identifier">pdf</span><span class="special">(</span><span class="identifier">my_dist</span><span class="special">,</span> <span class="identifier">x</span><span class="special">);</span>  <span class="comment">// Returns PDF (density) at point x of distribution my_dist.</span>
</pre>
<p>
            Or how about the CDF (Cumulative Distribution Function):
          </p>
<pre class="programlisting"><span class="identifier">cdf</span><span class="special">(</span><span class="identifier">my_dist</span><span class="special">,</span> <span class="identifier">x</span><span class="special">);</span>  <span class="comment">// Returns CDF (integral from -infinity to point x)</span>
                  <span class="comment">// of distribution my_dist.</span>
</pre>
<p>
            And quantiles are just the same:
          </p>
<pre class="programlisting"><span class="identifier">quantile</span><span class="special">(</span><span class="identifier">my_dist</span><span class="special">,</span> <span class="identifier">p</span><span class="special">);</span>  <span class="comment">// Returns the value of the random variable x</span>
                       <span class="comment">// such that cdf(my_dist, x) == p.</span>
</pre>
<p>
            If you're wondering why these aren't member functions, it's to make the
            library more easily extensible: if you want to add additional generic
            operations - let's say the <span class="emphasis"><em>n'th moment</em></span> - then all
            you have to do is add the appropriate non-member functions, overloaded
            for each implemented distribution type.
          </p>
<div class="tip"><table border="0" summary="Tip">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Tip]" src="../../../../../../../../../doc/src/images/tip.png"></td>
<th align="left">Tip</th>
</tr>
<tr><td align="left" valign="top">
<p>
              <span class="bold"><strong>Random numbers that approximate Quantiles of
              Distributions</strong></span>
            </p>
<p>
              If you want random numbers that are distributed in a specific way,
              for example in a uniform, normal or triangular, see <a href="http://www.boost.org/libs/random/" target="_top">Boost.Random</a>.
            </p>
<p>
              Whilst in principal there's nothing to prevent you from using the quantile
              function to convert a uniformly distributed random number to another
              distribution, in practice there are much more efficient algorithms
              available that are specific to random number generation.
            </p>
</td></tr>
</table></div>
<p>
            For example, the binomial distribution has two parameters: n (the number
            of trials) and p (the probability of success on any one trial).
          </p>
<p>
            The <code class="computeroutput"><span class="identifier">binomial_distribution</span></code>
            constructor therefore has two parameters:
          </p>
<p>
            <code class="computeroutput"><span class="identifier">binomial_distribution</span><span class="special">(</span><span class="identifier">RealType</span>
            <span class="identifier">n</span><span class="special">,</span>
            <span class="identifier">RealType</span> <span class="identifier">p</span><span class="special">);</span></code>
          </p>
<p>
            For this distribution the <a href="http://en.wikipedia.org/wiki/Random_variate" target="_top">random
            variate</a> is k: the number of successes observed. The probability
            density/mass function (pdf) is therefore written as <span class="emphasis"><em>f(k; n,
            p)</em></span>.
          </p>
<div class="note"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../../../../../doc/src/images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top">
<p>
              <span class="bold"><strong>Random Variates and Distribution Parameters</strong></span>
            </p>
<p>
              The concept of a <a href="http://en.wikipedia.org/wiki/Random_variable" target="_top">random
              variable</a> is closely linked to the term <a href="http://en.wikipedia.org/wiki/Random_variate" target="_top">random
              variate</a>: a random variate is a particular value (outcome) of
              a random variable. and <a href="http://en.wikipedia.org/wiki/Parameter" target="_top">distribution
              parameters</a> are conventionally distinguished (for example in
              Wikipedia and Wolfram MathWorld) by placing a semi-colon or vertical
              bar) <span class="emphasis"><em>after</em></span> the <a href="http://en.wikipedia.org/wiki/Random_variable" target="_top">random
              variable</a> (whose value you 'choose'), to separate the variate
              from the parameter(s) that defines the shape of the distribution.<br>
              For example, the binomial distribution probability distribution function
              (PDF) is written as <span class="emphasis"><em>f(k| n, p)</em></span> = Pr(K = k|n, p)
              = probability of observing k successes out of n trials. K is the <a href="http://en.wikipedia.org/wiki/Random_variable" target="_top">random variable</a>,
              k is the <a href="http://en.wikipedia.org/wiki/Random_variate" target="_top">random
              variate</a>, the parameters are n (trials) and p (probability).
            </p>
</td></tr>
</table></div>
<div class="note"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../../../../../doc/src/images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top"><p>
              By convention, <a href="http://en.wikipedia.org/wiki/Random_variate" target="_top">random
              variate</a> are lower case, usually k is integral, x if real, and
              <a href="http://en.wikipedia.org/wiki/Random_variable" target="_top">random variable</a>
              are upper case, K if integral, X if real. But this implementation treats
              all as floating point values <code class="computeroutput"><span class="identifier">RealType</span></code>,
              so if you really want an integral result, you must round: see note
              on Discrete Probability Distributions below for details.
            </p></td></tr>
</table></div>
<p>
            As noted above the non-member function <code class="computeroutput"><span class="identifier">pdf</span></code>
            has one parameter for the distribution object, and a second for the random
            variate. So taking our binomial distribution example, we would write:
          </p>
<p>
            <code class="computeroutput"><span class="identifier">pdf</span><span class="special">(</span><span class="identifier">binomial_distribution</span><span class="special">&lt;</span><span class="identifier">RealType</span><span class="special">&gt;(</span><span class="identifier">n</span><span class="special">,</span> <span class="identifier">p</span><span class="special">),</span> <span class="identifier">k</span><span class="special">);</span></code>
          </p>
<p>
            The ranges of <a href="http://en.wikipedia.org/wiki/Random_variate" target="_top">random
            variate</a> values that are permitted and are supported can be tested
            by using two functions <code class="computeroutput"><span class="identifier">range</span></code>
            and <code class="computeroutput"><span class="identifier">support</span></code>.
          </p>
<p>
            The distribution (effectively the <a href="http://en.wikipedia.org/wiki/Random_variate" target="_top">random
            variate</a>) is said to be 'supported' over a range that is <a href="http://en.wikipedia.org/wiki/Probability_distribution" target="_top">"the
            smallest closed set whose complement has probability zero"</a>.
            MathWorld uses the word 'defined' for this range. Non-mathematicians
            might say it means the 'interesting' smallest range of random variate
            x that has the cdf going from zero to unity. Outside are uninteresting
            zones where the pdf is zero, and the cdf zero or unity.
          </p>
<p>
            For most distributions, with probability distribution functions one might
            describe as 'well-behaved', we have decided that it is most useful for
            the supported range to <span class="bold"><strong>exclude</strong></span> random
            variate values like exact zero <span class="bold"><strong>if the end point
            is discontinuous</strong></span>. For example, the Weibull (scale 1, shape
            1) distribution smoothly heads for unity as the random variate x declines
            towards zero. But at x = zero, the value of the pdf is suddenly exactly
            zero, by definition. If you are plotting the PDF, or otherwise calculating,
            zero is not the most useful value for the lower limit of supported, as
            we discovered. So for this, and similar distributions, we have decided
            it is most numerically useful to use the closest value to zero, min_value,
            for the limit of the supported range. (The <code class="computeroutput"><span class="identifier">range</span></code>
            remains from zero, so you will still get <code class="computeroutput"><span class="identifier">pdf</span><span class="special">(</span><span class="identifier">weibull</span><span class="special">,</span> <span class="number">0</span><span class="special">)</span>
            <span class="special">==</span> <span class="number">0</span></code>).
            (Exponential and gamma distributions have similarly discontinuous functions).
          </p>
<p>
            Mathematically, the functions may make sense with an (+ or -) infinite
            value, but except for a few special cases (in the Normal and Cauchy distributions)
            this implementation limits random variates to finite values from the
            <code class="computeroutput"><span class="identifier">max</span></code> to <code class="computeroutput"><span class="identifier">min</span></code> for the <code class="computeroutput"><span class="identifier">RealType</span></code>.
            (See <a class="link" href="../../../backgrounders/implementation.html#math_toolkit.backgrounders.implementation.handling_of_floating_point_infinity">Handling
            of Floating-Point Infinity</a> for rationale).
          </p>
<div class="note"><table border="0" summary="Note">
<tr>
<td rowspan="2" align="center" valign="top" width="25"><img alt="[Note]" src="../../../../../../../../../doc/src/images/note.png"></td>
<th align="left">Note</th>
</tr>
<tr><td align="left" valign="top">
<p>
              <span class="bold"><strong>Discrete Probability Distributions</strong></span>
            </p>
<p>
              Note that the <a href="http://en.wikipedia.org/wiki/Discrete_probability_distribution" target="_top">discrete
              distributions</a>, including the binomial, negative binomial, Poisson
              &amp; Bernoulli, are all mathematically defined as discrete functions:
              that is to say the functions <code class="computeroutput"><span class="identifier">cdf</span></code>
              and <code class="computeroutput"><span class="identifier">pdf</span></code> are only defined
              for integral values of the random variate.
            </p>
<p>
              However, because the method of calculation often uses continuous functions
              it is convenient to treat them as if they were continuous functions,
              and permit non-integral values of their parameters.
            </p>
<p>
              Users wanting to enforce a strict mathematical model may use <code class="computeroutput"><span class="identifier">floor</span></code> or <code class="computeroutput"><span class="identifier">ceil</span></code>
              functions on the random variate prior to calling the distribution function.
            </p>
<p>
              The quantile functions for these distributions are hard to specify
              in a manner that will satisfy everyone all of the time. The default
              behaviour is to return an integer result, that has been rounded <span class="emphasis"><em>outwards</em></span>:
              that is to say, lower quantiles - where the probablity is less than
              0.5 are rounded down, while upper quantiles - where the probability
              is greater than 0.5 - are rounded up. This behaviour ensures that if
              an X% quantile is requested, then <span class="emphasis"><em>at least</em></span> the
              requested coverage will be present in the central region, and <span class="emphasis"><em>no
              more than</em></span> the requested coverage will be present in the
              tails.
            </p>
<p>
              This behaviour can be changed so that the quantile functions are rounded
              differently, or return a real-valued result using <a class="link" href="../../../policy/pol_overview.html" title="Policy Overview">Policies</a>.
              It is strongly recommended that you read the tutorial <a class="link" href="../../../policy/pol_tutorial/understand_dis_quant.html" title="Understanding Quantiles of Discrete Distributions">Understanding
              Quantiles of Discrete Distributions</a> before using the quantile
              function on a discrete distribtion. The <a class="link" href="../../../policy/pol_ref/discrete_quant_ref.html" title="Discrete Quantile Policies">reference
              docs</a> describe how to change the rounding policy for these distributions.
            </p>
<p>
              For similar reasons continuous distributions with parameters like "degrees
              of freedom" that might appear to be integral, are treated as real
              values (and are promoted from integer to floating-point if necessary).
              In this case however, there are a small number of situations where
              non-integral degrees of freedom do have a genuine meaning.
            </p>
</td></tr>
</table></div>
</div>
<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
<td align="left"></td>
<td align="right"><div class="copyright-footer">Copyright &#169; 2006-2010 John Maddock, Paul A. Bristow, Hubert Holin, Xiaogang Zhang, Bruno
      Lalande, Johan R&#229;de, Gautam Sewani, Thijs van den Berg and Benjamin Sobotta<p>
        Distributed under the Boost Software License, Version 1.0. (See accompanying
        file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
      </p>
</div></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="objects.html"><img src="../../../../../../../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../overview.html"><img src="../../../../../../../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../../../../index.html"><img src="../../../../../../../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="complements.html"><img src="../../../../../../../../../doc/src/images/next.png" alt="Next"></a>
</div>
</body>
</html>