On A Rational Question

| 0 Comments

One of the limitations of Professor B------'s remarkable Experimental Clockwork Mathematical Apparatus[1] is that the numbers upon which it works are but a subset of the decimal fractions[2]. The scheme by which they are represented[3] is extremely flexible and serves us well for most every kind of calculation, but my fellow students and I were curious as to just how badly they might let us down...

A Die Casting Problem

A classic problem in the theory of wager is to determine the probability that the first six in a series of casts of a fair die will appear on the \(k^\mathrm{th}\) cast. To figure it, we need only note that it implies that \(k-1\) casts showed between one and five spots, with probability \(\frac56\), and that one cast showed six spots, with probability \(\frac16\), and so
\[ \Pr(N=k) = \frac16 \times \left(\frac56\right)^{k-1} \]
This can be generalised to an experiment that succeeds with probability \(p\) with the function
\[ p_p(k) = \Pr(N=k) = p \times (1-p)^{k-1} \]
being the probability mass function of the geometric distribution, \(Geom(p)\), so named because its values comprise a geometric sequence.
The probability that the first success occurs no later than the \(k^\mathrm{th}\) experiment, the cumulative distribution function, must consequently be defined by a geometric series
\[ P_p(k) = \Pr(N \leqslant k) = \sum_{i=1}^k p \times (1-p)^{i-1} = 1-(1-p)^k \]
where \(\sum\) is the symbol that we students use to denote a sum. That this is the correct result for this particular sum is easily revealed by noting that
\[ \begin{align*} P_p(1) &= 1 - (1-p) = p = p_p(1)\\ P_p(k)-P_p(k-1) &= \left(1-(1-p)^k\right) - \left(1-(1-p)^{k-1}\right)\\ &= (1-p)^{k-1} - (1-p)^k\\ &= (1 - (1-p)) \times (1-p)^{k-1}\\ &= p \times (1-p)^{k-1} = p_p(k)\\ \end{align*} \]
We can spare ourselves the burden of such repeated experimentation by exploiting the inverse of this CDF
\[ P^{-1}_p(u) = 1+\Bigg\lfloor \frac{\ln(1-u)}{\ln(1-p)} \Bigg\rfloor \]
where the odd looking braces stand for the largest integer less than or equal to the number that they embrace. That this holds is demonstrated by deck 1.

Deck 1: Geometric Random Numbers

A Rational Distribution

The problem that we posed to demonstrate the limits of ECMA's numbers was that if \(N_1\) and \(N_2\) are distributed as \(Geom(p)\), then how is \(\frac{N_1}{N_2}\) distributed?
Trivially, every ECMA number is rational and so has a non-zero probability of being the result of just such a random ratio. However, the rationals are but an infinitesimal subset of the real numbers, meaning that the probability that a randomly chosen real number might be result of such a ratio is almost surely zero, a term of art meaning that, whilst such ratios may fall within the range of that random real number, the probability that we might witness one chosen is zero.

For example, given an observation of the standard log normal distribution, \(X\), then
\[ \Pr\left(\frac{N_1}{N_2} = X\right) \underset{a.s.}{=} 0 \]
but for any attempt to approximate them using ECMA, it must be the case that
\[ \Pr\left(\frac{N_1}{N_2} = X\right) > 0 \]
although it may be too small a figure for ECMA itself to distinguish from zero!

It is difficult to imagine a calculation being any more incorrect than this, and it quite piqued our interest in the properties of this distribution, which we have dubbed \(Rat(p)\).

The first thing that we concluded was that, since \(N_1\) and \(N_2\) have the same distribution, if \(R\) is distributed as \(Rat(p)\) then so must be \(\frac{1}{R}\).
The next was that if \(n_1\) and \(n_2\) are positive integers with no common factors greater than one, that cannot both be wholly divided by an integer greater than one, then
\[ \begin{align*} &\Pr\left(R = \frac{n_1}{n_2}\right)\\ &\quad= \Pr\left(\left(N_1 = n_1 \wedge N_2 = n_2\right) \vee \left(N_1 = 2n_1 \wedge N_2 = 2n_2\right) \vee \left(N_1 = 3n_1 \wedge N_2 = 3n_2\right) \vee \dots\right)\\ &\quad= \sum_{k=1}^\infty \Pr\left(N_1 = k n_1\right) \times \Pr\left(N_2 = k n_2\right) \end{align*} \]
where \(\wedge\) stands for and and \(\vee\) stands for or. Rewriting this in terms of the probability mass functions of \(N_1\) and \(N_2\) yields
\[ \begin{align*} \Pr\left(R = \frac{n_1}{n_2}\right) &= \sum_{k=1}^\infty p \times (1-p)^{kn_1-1} \times p \times (1-p)^{kn_2-1}\\ &= \sum_{k=1}^\infty p^2 (1-p)^{k(n_1+n_2)-2}\\ &= \sum_{k=1}^\infty p^2 (1-p)^{-2} \left((1-p)^{n_1+n_2}\right)^k\\ &= \sum_{k=1}^\infty p^2 (1-p)^{n_1+n_2-2} \left((1-p)^{n_1+n_2}\right)^{k-1} \end{align*} \]
Now, this is just another geometric series, albeit an infinite one, and must consequently resolve to
\[ \Pr\left(R = \frac{n_1}{n_2}\right) = p^2 (1-p)^{n_1+n_2-2} \frac{1 - \left((1-p)^{n_1+n_2}\right)^\infty}{1 - (1-p)^{n_1+n_2}} = \frac{p^2 (1-p)^{n_1+n_2-2}}{1 - (1-p)^{n_1+n_2}} \]
which is a rather unusual looking PMF, as demonstrated by deck 2.

Deck 2: The Rat PMF

We can satisfy ourselves with only plotting the PMF for arguments with numerators and denominators no greater than fifty since the probabilities of such fractions are already practically indistinguishable from the horizontal axis. Note that we are identifying the highest common factor of each numerator and denominator with ak.hcf[4].

To confirm that this is indeed the PMF of our distribution we need only compare its predicted probabilities against those observed from a ratio of geometrically distributed random numbers. Taking care to remove common factors in those random numbers, deck 3 does just this for fractions that can be expressed with a denominator of d.

Deck 3: Confirming That The PMF Is A Rat

This strikes me as a most convincing demonstration of the correctness of our formula for the PMF!

The Cumulative Distribution Function

On the face of it, it would seem a near impossible task to figure the CDF; how could we possibly take a sum over every fraction less than some given value? However, appearances can be deceptive and with a satisfyingly simple trick we can do just that.
Specifically, we need only observe that
\[ \Pr\left(R \leqslant x\right) = \Pr\left(\frac{N_1}{N_2} \leqslant x\right) = \Pr\left(N_1 \leqslant x \times N_2\right) = \sum_{k=1}^\infty \Pr(N_2 = k) \times \Pr(N_1 \leqslant kx) \]
Note that \(x\) need not be an integer, or even a rational for that matter, and so this implies that
\[ \begin{align*} \Pr\left(R \leqslant x\right) &= \sum_{k=1}^\infty \Pr(N_2 = k) \times \Pr(N_1 \leqslant \lfloor kx \rfloor)\\ &= \sum_{k=1}^\infty p \times (1-p)^{k-1} \times \left(1 - (1-p)^{\lfloor kx \rfloor}\right) \end{align*} \]
Of course this only formula holds true for non-negative \(x\) since \(R\) must, by definition, be positive. For \(x\) equal to zero, it correctly yields zero since the third term in the product is zero for all \(k\), but for negative \(x\) we must replace it with one that is everywhere zero.
Unfortunately my fellow students and I were not able to render it any simpler than this; the rounding down of \(kx\) frustrated our attempts to do so in short order.

This rounding down can also rather frustrate our attempts to figure the CDF with ECMA, in the sense that if \(x\) is a value that it cannot exactly represent, the rounded value will, on occasion, be smaller than it should be, somewhat throwing off the result. However, in such cases it will have simply shifted the correctly rounded value's location slightly away from zero and consequently ECMA is not remotely as confounded by the CDF as it was by the PMF, as illustrated by deck 4.

Deck 4: The Rat CDF

The jagged form of the CDF is a direct consequence of the fact that the PMF only has non-zero mass for positive rationals; if a point has mass, then it must have infinite density. That the PDF is the derivative of the CDF implies that, whilst the CDF is defined for all real values, it can only be continuous for irrational values, none of which we can represent with ECMA's numbers!

We can confirm that this is the correct form for the CDF by plotting it against a sum of values of the PDF. The latter must necessarily miss out many pertinent fractions, what with there being infinitely many of them, but provided that we include those that account for almost all outcomes, we should expect reasonable agreement between the two. Deck 5 plots their graphs, using those fractions with numerators and denominators no greater than fifty for the latter.

Deck 5: Confirming That The CDF Is A Rat

Once again, the results of our experimentation are convincingly aligned with the results of our reasoning, reassuring me that we are correct in the form of the CDF too!

I think it unlikely that we shall ever face quite so pathological a function as our PMF in practice, but it does serve to warn us never to forget that ECMA's numbers are most emphatically not the reals!
\(\Box\)

References

[1] On An Age Of Wonders, www.thusspakeak.com, 2014.

[2] You're Going To Have To Think! Why [Insert Technique Here] Won't Cure Your Floating Point Blues, www.thusspakeak.com, 2013.

[3] IEEE standard for binary floating-point arithmetic. ANSI/IEEE std 754-1985, 1985.

[4] You're Going To Have To Think! Why Rational Arithmetic Won't Cure Your Floating Point Blues, www.thusspakeak.com, 2013.

Leave a comment

 
This site requires HTML5, CSS 2.1 and JavaScript 5 and has been tested with

Chrome Chrome 26+
Firefox Firefox 20+
Internet Explorer Internet Explorer 9+