a.k. tag:www.thusspakeak.com,2013-04-25:/ak//9 2020-12-04T19:57:45Z Movable Type 5.2.13 Regressive Tendencies tag:www.thusspakeak.com,2020:/ak//9.341 2020-12-04T20:00:00Z 2020-12-04T19:57:45Z Several months ago we saw how we could use basis functions to interpolate between points upon arbitrary curves or surfaces to approximate the values between them. Related to that is linear regression which fits a straight line, or a flat plane, though points that have values that are assumed to be the results of a linear function with independent random errors, having means of zero and equal standard deviations, in order to reveal the underlying relationship between them. Specifically, we want to find the linear function that minimises the differences between its results and the values at those points. a.k. Several months ago we saw how we could use basis functions to interpolate between points upon arbitrary curves or surfaces to approximate the values between them. Related to that is linear regression which fits a straight line, or a flat plane, though points that have values that are assumed to be the results of a linear function with independent random errors, having means of zero and equal standard deviations, in order to reveal the underlying relationship between them. Specifically, we want to find the linear function that minimises the differences between its results and the values at those points. ]]> What's The Lucky Number? tag:www.thusspakeak.com,2020:/ak//9.340 2020-11-06T20:00:00Z 2020-02-28T04:29:53Z Over the last few months we have been looking at Bernoulli processes which are sequences of Bernoulli trails, being observations of a Bernoulli distributed random variable with a success probability of p. We have seen that the number of failures before the first success follows the geometric distribution and the number of failures before the rth success follows the negative binomial distribution, which are the discrete analogues of the exponential and gamma distributions respectively. This time we shall take a look at the binomial distribution which governs the number of successes out of n trials and is the discrete version of the Poisson distribution. a.k. Over the last few months we have been looking at Bernoulli processes which are sequences of Bernoulli trails, being observations of a Bernoulli distributed random variable with a success probability of p. We have seen that the number of failures before the first success follows the geometric distribution and the number of failures before the rth success follows the negative binomial distribution, which are the discrete analogues of the exponential and gamma distributions respectively.
This time we shall take a look at the binomial distribution which governs the number of successes out of n trials and is the discrete version of the Poisson distribution. ]]>
Bad Luck Comes In Ks tag:www.thusspakeak.com,2020:/ak//9.339 2020-10-02T19:00:00Z 2020-10-02T19:02:02Z Lately we have been looking at Bernoulli processes which are sequences of independent experiments, known as Bernoulli trials, whose successes or failures are given by observations of a Bernoulli distributed random variable. Last time we saw that the number of failures before the first success was governed by the geometric distribution which is the discrete analogue of the exponential distribution and, like it, is a memoryless waiting time distribution in the sense that the distribution for the number of failures before the next success is identical no matter how many failures have already occurred whilst we've been waiting. This time we shall take a look at the distribution of the number of failures before a given number of successes, which is a discrete version of the gamma distribution which defines the probabilities of how long we must wait for multiple exponentially distributed events to occur. a.k. Lately we have been looking at Bernoulli processes which are sequences of independent experiments, known as Bernoulli trials, whose successes or failures are given by observations of a Bernoulli distributed random variable. Last time we saw that the number of failures before the first success was governed by the geometric distribution which is the discrete analogue of the exponential distribution and, like it, is a memoryless waiting time distribution in the sense that the distribution for the number of failures before the next success is identical no matter how many failures have already occurred whilst we've been waiting.
This time we shall take a look at the distribution of the number of failures before a given number of successes, which is a discrete version of the gamma distribution which defines the probabilities of how long we must wait for multiple exponentially distributed events to occur. ]]>
If At First You Don't Succeed tag:www.thusspakeak.com,2020:/ak//9.338 2020-09-04T19:00:00Z 2020-09-04T19:08:56Z Last time we took a first look at Bernoulli processes which are formed from a sequence of independent experiments, known as Bernoulli trials, each of which is governed by the Bernoulli distribution with a probability p of success. Since the outcome of one trial has no effect upon the next, such processes are memoryless meaning that the number of trials that we need to perform before getting a success is independent of how many we have already performed whilst waiting for one. We have already seen that if waiting times for memoryless events with fixed average arrival rates are continuous then they must be exponentially distributed and in this post we shall be looking at the discrete analogue. a.k. Last time we took a first look at Bernoulli processes which are formed from a sequence of independent experiments, known as Bernoulli trials, each of which is governed by the Bernoulli distribution with a probability p of success. Since the outcome of one trial has no effect upon the next, such processes are memoryless meaning that the number of trials that we need to perform before getting a success is independent of how many we have already performed whilst waiting for one.
We have already seen that if waiting times for memoryless events with fixed average arrival rates are continuous then they must be exponentially distributed and in this post we shall be looking at the discrete analogue. ]]>
One Thing Or Another tag:www.thusspakeak.com,2020:/ak//9.337 2020-08-07T19:00:00Z 2020-08-07T19:00:32Z Several years ago we took a look at memoryless processes in which the probability that we should wait for any given length of time for an event to occur is independent of how long we have already been waiting. We found that this implied that the waiting time must be exponentially distributed, that the waiting time for several events must be gamma distributed and that the number of events occuring in a unit of time must be Poisson distributed. These govern continuous memoryless processes in which events can occur at any time but not those in which events can only occur at specified times, such as the roll of a die coming up six, known as Bernoulli processes. Observations of such processes are known as Bernoulli trials and their successes and failures are governed by the Bernoulli distribution, which we shall take a look at in this post. a.k. Several years ago we took a look at memoryless processes in which the probability that we should wait for any given length of time for an event to occur is independent of how long we have already been waiting. We found that this implied that the waiting time must be exponentially distributed, that the waiting time for several events must be gamma distributed and that the number of events occuring in a unit of time must be Poisson distributed.
These govern continuous memoryless processes in which events can occur at any time but not those in which events can only occur at specified times, such as the roll of a die coming up six, known as Bernoulli processes. Observations of such processes are known as Bernoulli trials and their successes and failures are governed by the Bernoulli distribution, which we shall take a look at in this post. ]]>
Beta Animals tag:www.thusspakeak.com,2020:/ak//9.329 2020-07-03T19:00:00Z 2020-07-03T19:59:56Z a.k. Several years ago we took a look at the gamma function Γ, which is a generalisation of the factorial to non-integers, being equal to the factorial of a non-negative integer n when passed an argument of n+1 and smoothly interpolating between them. Like the normal cumulative distribution function Φ, it and its related functions are examples of special functions; so named because they frequently crop up in the solutions to interesting mathematical problems but can't be expressed as simple formulae, forcing us to resort to numerical approximation.
This time we shall take a look at another family of special functions derived from the beta function B. ]]>
Slashing The Odds tag:www.thusspakeak.com,2020:/ak//9.326 2020-06-05T19:00:00Z 2020-06-05T18:59:15Z In the previous post we explored the Cauchy distribution, which, having undefined means and standard deviations, is an example of a pathological distribution. We saw that this is because it has a relatively high probability of generating extremely large values which we concluded was a consequence of its standard random variable being equal to the ratio of two independent standard normally distributed random variables, so that the magnitudes of observations of it can be significantly increased by the not particularly unlikely event that observations of the denominator are close to zero. Whilst we didn't originally derive the Cauchy distribution in this way, there are others, known as ratio distributions, that are explicitly constructed in this manner and in this post we shall take a look at one of them. a.k. In the previous post we explored the Cauchy distribution, which, having undefined means and standard deviations, is an example of a pathological distribution. We saw that this is because it has a relatively high probability of generating extremely large values which we concluded was a consequence of its standard random variable being equal to the ratio of two independent standard normally distributed random variables, so that the magnitudes of observations of it can be significantly increased by the not particularly unlikely event that observations of the denominator are close to zero.
Whilst we didn't originally derive the Cauchy distribution in this way, there are others, known as ratio distributions, that are explicitly constructed in this manner and in this post we shall take a look at one of them. ]]>
Moments Of Pathological Behaviour tag:www.thusspakeak.com,2020:/ak//9.322 2020-05-01T19:00:00Z 2020-05-01T18:59:29Z Last time we took a look at basis function interpolation with which we approximate functions from their values at given sets of arguments, known as nodes, using weighted sums of distinct functions, known as basis functions. We began by constructing approximations using polynomials before moving on to using bell shaped curves, such as the normal probability density function, centred at the nodes. The latter are particularly useful for approximating multi-dimensional functions, as we saw by using multivariate normal PDFs. An easy way to create rotationally symmetric functions, known as radial basis functions, is to apply univariate functions that are symmetric about zero to the distance between the interpolation's argument and their associated nodes. PDFs are a rich source of such functions and, in fact, the second bell shaped curve that we considered is related to that of the Cauchy distribution, which has some rather interesting properties. a.k. Last time we took a look at basis function interpolation with which we approximate functions from their values at given sets of arguments, known as nodes, using weighted sums of distinct functions, known as basis functions. We began by constructing approximations using polynomials before moving on to using bell shaped curves, such as the normal probability density function, centred at the nodes. The latter are particularly useful for approximating multi-dimensional functions, as we saw by using multivariate normal PDFs.
An easy way to create rotationally symmetric functions, known as radial basis functions, is to apply univariate functions that are symmetric about zero to the distance between the interpolation's argument and their associated nodes. PDFs are a rich source of such functions and, in fact, the second bell shaped curve that we considered is related to that of the Cauchy distribution, which has some rather interesting properties. ]]>
All Your Basis Are Belong To Us tag:www.thusspakeak.com,2020:/ak//9.321 2020-04-03T19:00:00Z 2020-04-03T22:11:31Z A few years ago we saw how we could approximate a function f between pairs of points (xi, f(xi)) and (xi+1, f(xi+1)) by linear and cubic spline interpolation which connect them with straight lines and cubic polynomials respectively, the latter of which yield smooth curves at the cost of somewhat arbitrary choices about their exact shapes. An alternative approach is to construct a single function that passes through all of the points and, given that nth order polynomials are uniquely defined by n+1 values at distinct xi, it's tempting to use them. a.k. A few years ago we saw how we could approximate a function f between pairs of points (xi, f(xi)) and (xi+1, f(xi+1)) by linear and cubic spline interpolation which connect them with straight lines and cubic polynomials respectively, the latter of which yield smooth curves at the cost of somewhat arbitrary choices about their exact shapes.
An alternative approach is to construct a single function that passes through all of the points and, given that nth order polynomials are uniquely defined by n+1 values at distinct xi, it's tempting to use them. ]]>
The Spectral Apparition tag:www.thusspakeak.com,2020:/ak//9.312 2020-03-06T20:00:00Z 2020-03-06T20:00:46Z a.k. Over the last few months we have seen how we can efficiently implement the Householder transformations and shifted Givens rotations used by Francis's algorithm to diagonalise a real symmetric matrix M, yielding its eigensystem in a matrix V whose columns are its eigenvectors and a diagonal matrix Λ whose diagonal elements are their associated eigenvalues, which satisfy

M = V × Λ × VT

and together are known as the spectral decomposition of M.
In this post, we shall add it to the `ak` library using the `householder` and `givens` functions that we have put so much effort into optimising.
]]>
Funky Givens tag:www.thusspakeak.com,2020:/ak//9.311 2020-02-07T20:00:00Z 2020-02-07T20:03:04Z a.k. We have recently been looking at how we can use a special case of Francis's QR transformation to reduce a real symmetric matrix M to a diagonal matrix Λ by first applying Householder transformations to put it in tridiagonal form and then using shifted Givens rotations to zero out the off diagonal elements.
The columns of the matrix of transformations V and the elements on the leading diagonal of Λ are the unit eigenvectors and eigenvalues of M respectively and they consequently satisfy

M × V = V × Λ

and, since the product of V and its transpose is the identity matrix

M = V × Λ × VT

which is known as the spectral decomposition of M.
Last time we saw how we could efficiently apply the Householder transformations in-place, replacing the elements of M with those of the matrix of accumulated transformations Q and creating a pair of arrays to represent the leading and off diagonal elements of the tridiagonal matrix. This time we shall see how we can similarly improve the implementation of the Givens rotations. ]]>
A Well Managed Household tag:www.thusspakeak.com,2020:/ak//9.310 2020-01-03T20:00:00Z 2020-01-04T02:32:25Z a.k. Over the last few months we have seen how we can use a sequence of Householder transformations followed by a sequence of shifted Givens rotations to efficiently find the spectral decomposition of a symmetric real matrix M, formed from a matrix V and a diagonal matrix Λ satisfying

M × V = V × Λ

implying that the columns of V are the unit eigenvectors of M and their associated elements on the diagonal of Λ are their eigenvalues so that

V × VT = I

where I is the identity matrix, and therefore

M = V × Λ × VT

From a mathematical perspective the combination of Householder transformations and shifted Givens rotations is particularly appealing, converging on the spectral decomposition after relatively few matrix multiplications, but from an implementation perspective using `ak.matrix` multiplication operations is less than satisfactory since it wastefully creates new `ak.matrix` objects at each step and so in this post we shall start to see how we can do better. ]]>
Spryer Francis tag:www.thusspakeak.com,2019:/ak//9.309 2019-12-06T20:00:00Z 2020-01-16T06:51:26Z a.k. Last time we saw how we could use a sequence of Householder transformations to reduce a symmetric real matrix M to a symmetric tridiagonal matrix, having zeros everywhere other than upon the leading, upper and lower diagonals, which we could then further reduce to a diagonal matrix Λ using a sequence of Givens rotations to iteratively transform the elements upon the upper and lower diagonals to zero so that the columns of the accumulated transformations V were the unit eigenvectors of M and the elements on the leading diagonal of the result were their associated eigenvalues, satisfying

M × V = V × Λ

and, since the transpose of V is its own inverse

M = V × Λ × VT

which is known as the spectral decomposition of M.
Unfortunately, the way that we used Givens rotations to diagonalise tridiagonal symmetric matrices wasn't particularly efficient and I concluded by stating that it could be significantly improved with a relatively minor change. In this post we shall see what it is and why it works. ]]>
FAO The Householder tag:www.thusspakeak.com,2019:/ak//9.308 2019-11-01T19:00:00Z 2019-11-01T20:22:28Z a.k. Some years ago we saw how we could use the Jacobi algorithm to find the eigensystem of a real valued symmetric matrix M, which is defined as the set of pairs of non-zero vectors vi and scalars λi that satisfy

M × vi = λi × vi

known as the eigenvectors and the eigenvalues respectively, with the vectors typically restricted to those of unit length in which case we can define its spectral decomposition as the product

M = V × Λ × VT

where the columns of V are the unit eigenvectors, Λ is a diagonal matrix whose ith diagonal element is the eigenvalue associated with the ith column of V and the T superscript denotes the transpose, in which the rows and columns of the matrix are swapped.
You may recall that this is a particularly convenient representation of the matrix since we can use it to generalise any scalar function to it with

f(M) = V × f(Λ) × VT

where f(Λ) is the diagonal matrix whose ith diagonal element is the result of applying f to the ith diagonal element of Λ.
You may also recall that I suggested that there's a more efficient way to find eigensystems and I think that it's high time that we took a look at it. ]]>
A Little Bit Slinky tag:www.thusspakeak.com,2019:/ak//9.307 2019-10-04T19:00:00Z 2019-10-04T19:13:35Z For several months we've for been taking a look at cluster analysis which seeks to partition sets of data into subsets of similar data, known as clusters. Most recently we have focused our attention on hierarchical clusterings, which are sequences of sets of clusters in which pairs of data that belong to the same cluster at one step belong to the same cluster in the next step. A simple way of constructing them is to initially place each datum in its own cluster and then iteratively merge the closest pairs of clusters in each clustering to produce the next one in the sequence, stopping when all of the data belong to a single cluster. We have considered three ways of measuring the distance between pairs of clusters, the average distance between their members, the distance between their closest members and the distance between their farthest members, known as average linkage, single linkage and complete linkage respectively, and implemented a reasonably efficient algorithm for generating hierarchical clusterings defined with them, using a min-heap structure to cache the distances between clusters. Finally, I claimed that there is a more efficient algorithm for generating single linkage hierarchical clusterings that would make the sorting of clusters by size in our ak.clustering type too expensive and so last time we implemented the ak.rawClustering type to represent clusterings without sorting their clusters which we shall now use in the implementation of that algorithm. a.k. For several months we've for been taking a look at cluster analysis which seeks to partition sets of data into subsets of similar data, known as clusters. Most recently we have focused our attention on hierarchical clusterings, which are sequences of sets of clusters in which pairs of data that belong to the same cluster at one step belong to the same cluster in the next step.
A simple way of constructing them is to initially place each datum in its own cluster and then iteratively merge the closest pairs of clusters in each clustering to produce the next one in the sequence, stopping when all of the data belong to a single cluster. We have considered three ways of measuring the distance between pairs of clusters, the average distance between their members, the distance between their closest members and the distance between their farthest members, known as average linkage, single linkage and complete linkage respectively, and implemented a reasonably efficient algorithm for generating hierarchical clusterings defined with them, using a min-heap structure to cache the distances between clusters.
Finally, I claimed that there is a more efficient algorithm for generating single linkage hierarchical clusterings that would make the sorting of clusters by size in our `ak.clustering` type too expensive and so last time we implemented the `ak.rawClustering` type to represent clusterings without sorting their clusters which we shall now use in the implementation of that algorithm. ]]>