a.k. tag:www.thusspakeak.com,2013-04-25:/ak//9 2021-10-01T20:47:56Z Movable Type 5.2.13 Finding The Middle Ground tag:www.thusspakeak.com,2021:/ak//9.365 2021-10-01T19:00:00Z 2021-10-01T20:47:56Z Last time we saw how we can use Euler's method to approximate the solutions of ordinary differential equations, or ODEs, which define the derivative of one variable with respect to another as a function of them both, so that they cannot be solved by direct integration. Specifically, it uses Taylor's theorem to estimate the change in the first variable that results from a small step in the second, iteratively accumulating the results for steps of a constant length to approximate the value of the former at some particular value of the latter. Unfortunately it isn't very accurate, yielding an accumulated error proportional to the step length, and so this time we shall take a look at a way to improve it. a.k. Last time we saw how we can use Euler's method to approximate the solutions of ordinary differential equations, or ODEs, which define the derivative of one variable with respect to another as a function of them both, so that they cannot be solved by direct integration. Specifically, it uses Taylor's theorem to estimate the change in the first variable that results from a small step in the second, iteratively accumulating the results for steps of a constant length to approximate the value of the former at some particular value of the latter.
Unfortunately it isn't very accurate, yielding an accumulated error proportional to the step length, and so this time we shall take a look at a way to improve it. ]]>
Out Of The Ordinary tag:www.thusspakeak.com,2021:/ak//9.364 2021-09-03T19:00:00Z 2021-09-03T18:53:20Z Several years ago we saw how to use the trapezium rule to approximate integrals. This works by dividing the interval of integration into a set of equally spaced values, evaluating the function being integrated, or integrand, at each of them and calculating the area under the curve formed by connecting adjacent points with straight lines to form trapeziums. This was an improvement over an even more rudimentary scheme which instead placed rectangles spanning adjacent values with heights equal to the values of the function at their midpoints to approximate the area. Whilst there really wasn't much point in implementing this since it offers no advantage over the trapezium rule, it is a reasonable first approach to approximating the solutions to another type of problem involving calculus; ordinary differential equations, or ODEs. a.k. Several years ago we saw how to use the trapezium rule to approximate integrals. This works by dividing the interval of integration into a set of equally spaced values, evaluating the function being integrated, or integrand, at each of them and calculating the area under the curve formed by connecting adjacent points with straight lines to form trapeziums.
This was an improvement over an even more rudimentary scheme which instead placed rectangles spanning adjacent values with heights equal to the values of the function at their midpoints to approximate the area. Whilst there really wasn't much point in implementing this since it offers no advantage over the trapezium rule, it is a reasonable first approach to approximating the solutions to another type of problem involving calculus; ordinary differential equations, or ODEs. ]]>
Will They Blend? tag:www.thusspakeak.com,2021:/ak//9.363 2021-08-06T19:00:00Z 2021-08-06T19:31:55Z Last time we saw how we can create new random variables from sets of random variables with given probabilities of observation. To make an observation of such a random variable we randomly select one of its components, according to their probabilities, and make an observation of it. Furthermore, their associated probability density functions, or PDFs, cumulative distribution functions, or CDFs, and characteristic functions, or CFs, are simply sums of the component functions weighted by their probabilities of observation. Now there is nothing about such distributions, known as mixture distributions, that requires that the components are univariate. Given that copulas are simply multivariate distributions with standard uniformly distributed marginals, being the distributions of each element considered independently of the others, we can use the same technique to create new copulas too. a.k. Last time we saw how we can create new random variables from sets of random variables with given probabilities of observation. To make an observation of such a random variable we randomly select one of its components, according to their probabilities, and make an observation of it. Furthermore, their associated probability density functions, or PDFs, cumulative distribution functions, or CDFs, and characteristic functions, or CFs, are simply sums of the component functions weighted by their probabilities of observation.
Now there is nothing about such distributions, known as mixture distributions, that requires that the components are univariate. Given that copulas are simply multivariate distributions with standard uniformly distributed marginals, being the distributions of each element considered independently of the others, we can use the same technique to create new copulas too. ]]>
Mixing It Up tag:www.thusspakeak.com,2021:/ak//9.362 2021-07-02T19:00:00Z 2021-07-02T18:54:02Z Last year we took a look at basis function interpolation which fits a weighted sum of n independent functions, known as basis functions, through observations of an arbitrary function's values at a set of n points in order to approximate it at unobserved points. In particular, we saw that symmetric probability density functions, or PDFs, make reasonable basis functions for approximating both univariate and multivariate functions. It is quite tempting, therefore, to use weighted sums of PDFs to construct new PDFs and in this post we shall see how we can use a simple probabilistic argument to do so. a.k. Last year we took a look at basis function interpolation which fits a weighted sum of n independent functions, known as basis functions, through observations of an arbitrary function's values at a set of n points in order to approximate it at unobserved points. In particular, we saw that symmetric probability density functions, or PDFs, make reasonable basis functions for approximating both univariate and multivariate functions.
It is quite tempting, therefore, to use weighted sums of PDFs to construct new PDFs and in this post we shall see how we can use a simple probabilistic argument to do so. ]]>
A PR Exercise tag:www.thusspakeak.com,2021:/ak//9.361 2021-06-04T19:00:00Z 2021-06-05T08:03:45Z In the last few posts we've been looking at the BFGS quasi-Newton algorithm for minimising multivariate functions. This uses iteratively updated approximations of the Hessian matrix of second partial derivatives in order to choose directions in which to search for univariate minima, saving the expense of calculating it explicitly. A particularly useful property of the algorithm is that if the line search satisfies the Wolfe conditions then the positive definiteness of the Hessian is preserved, meaning that the implied locally quadratic approximation of the function must have a minimum. Unfortunately for large numbers of dimension the calculation of the approximation will still be relatively expensive and will require a significant amount of memory to store and so in this post we shall take a look at an algorithm that only uses the vector of first partial derivatives. a.k. In the last few posts we've been looking at the BFGS quasi-Newton algorithm for minimising multivariate functions. This uses iteratively updated approximations of the Hessian matrix of second partial derivatives in order to choose directions in which to search for univariate minima, saving the expense of calculating it explicitly. A particularly useful property of the algorithm is that if the line search satisfies the Wolfe conditions then the positive definiteness of the Hessian is preserved, meaning that the implied locally quadratic approximation of the function must have a minimum.
Unfortunately for large numbers of dimension the calculation of the approximation will still be relatively expensive and will require a significant amount of memory to store and so in this post we shall take a look at an algorithm that only uses the vector of first partial derivatives. ]]>
Bring Out The Big Flipping GunS tag:www.thusspakeak.com,2021:/ak//9.360 2021-05-07T19:00:00Z 2021-05-08T04:32:22Z Last month we took a look at quasi-Newton multivariate function minimisation algorithms which use approximations of the Hessian matrix of second partial derivatives to choose line search directions. We demonstrated that the BFGS rule for updating the Hessian after each line search maintains its positive definiteness if they conform to the Wolfe conditions, ensuring that the locally quadratic approximation of the function defined by its value, the vector of first partial derivatives and the Hessian has a minimum. Now that we've got the theoretical details out of the way it's time to get on with the implementation. a.k. Last month we took a look at quasi-Newton multivariate function minimisation algorithms which use approximations of the Hessian matrix of second partial derivatives to choose line search directions. We demonstrated that the BFGS rule for updating the Hessian after each line search maintains its positive definiteness if they conform to the Wolfe conditions, ensuring that the locally quadratic approximation of the function defined by its value, the vector of first partial derivatives and the Hessian has a minimum.
Now that we've got the theoretical details out of the way it's time to get on with the implementation. ]]>
Big Friendly GiantS tag:www.thusspakeak.com,2021:/ak//9.359 2021-04-02T19:00:00Z 2021-04-02T18:58:14Z In the previous post we saw how we could perform a univariate line search for a point that satisfies the Wolfe conditions meaning that it is reasonably close to a minimum and takes a lot less work to find than the minimum itself. Line searches are used in a class of multivariate minimisation algorithms which iteratively choose directions in which to proceed, in particular those that use approximations of the Hessian matrix of second partial derivatives of a function to do so, similarly to how the Levenberg-Marquardt multivariate inversion algorithm uses a diagonal matrix in place of the sum of the products of its Hessian matrices for each element and the error in that element's current value, and in this post we shall take a look at one of them. a.k. In the previous post we saw how we could perform a univariate line search for a point that satisfies the Wolfe conditions meaning that it is reasonably close to a minimum and takes a lot less work to find than the minimum itself. Line searches are used in a class of multivariate minimisation algorithms which iteratively choose directions in which to proceed, in particular those that use approximations of the Hessian matrix of second partial derivatives of a function to do so, similarly to how the Levenberg-Marquardt multivariate inversion algorithm uses a diagonal matrix in place of the sum of the products of its Hessian matrices for each element and the error in that element's current value, and in this post we shall take a look at one of them. ]]> Wolfe It Down tag:www.thusspakeak.com,2021:/ak//9.358 2021-03-05T20:00:00Z 2021-03-06T08:49:56Z Last time we saw how we could efficiently invert a vector valued multivariate function with the Levenberg-Marquardt algorithm which replaces the sum of its second derivatives with respect to each element in its result multiplied by the difference from those of its target value with a diagonal matrix. Similarly there are minimisation algorithms that use approximations of the Hessian matrix of second partial derivatives to estimate directions in which the value of the function will decrease. Before we take a look at them, however, we'll need a way to step toward minima in such directions, known as a line search, and in this post we shall see how we might reasonably do so. a.k. Last time we saw how we could efficiently invert a vector valued multivariate function with the Levenberg-Marquardt algorithm which replaces the sum of its second derivatives with respect to each element in its result multiplied by the difference from those of its target value with a diagonal matrix. Similarly there are minimisation algorithms that use approximations of the Hessian matrix of second partial derivatives to estimate directions in which the value of the function will decrease.
Before we take a look at them, however, we'll need a way to step toward minima in such directions, known as a line search, and in this post we shall see how we might reasonably do so. ]]>
Found In Space tag:www.thusspakeak.com,2021:/ak//9.343 2021-02-05T20:00:00Z 2021-02-05T19:57:26Z Some time ago we saw how Newton's method used the derivative of a univariate scalar valued function to guide the search for an argument at which it took a specific value. A related problem is finding a vector at which a multivariate vector valued function takes one, or at least comes as close as possible to it. In particular, we should often like to fit an arbitrary parametrically defined scalar valued functional form to a set of points with possibly noisy values, much as we did using linear regression to find the best fitting weighted sum of a given set of functions, and in this post we shall see how we can generalise Newton's method to solve such problems. a.k. Some time ago we saw how Newton's method used the derivative of a univariate scalar valued function to guide the search for an argument at which it took a specific value. A related problem is finding a vector at which a multivariate vector valued function takes one, or at least comes as close as possible to it. In particular, we should often like to fit an arbitrary parametrically defined scalar valued functional form to a set of points with possibly noisy values, much as we did using linear regression to find the best fitting weighted sum of a given set of functions, and in this post we shall see how we can generalise Newton's method to solve such problems. ]]> Smooth Operator tag:www.thusspakeak.com,2021:/ak//9.342 2021-01-01T20:00:00Z 2021-01-03T20:54:07Z Last time we took a look at linear regression which finds the linear function that minimises the differences between its results and values at a set of points that are presumed, possibly after applying some specified transformation, to be random deviations from a straight line or, in multiple dimensions, a flat plane. The purpose was to reveal the underlying relationship between the independent variable represented by the points and the dependent variable represented by the values at them. This time we shall see how we can approximate the function that defines the relationship between them without actually revealing what it is. a.k. Last time we took a look at linear regression which finds the linear function that minimises the differences between its results and values at a set of points that are presumed, possibly after applying some specified transformation, to be random deviations from a straight line or, in multiple dimensions, a flat plane. The purpose was to reveal the underlying relationship between the independent variable represented by the points and the dependent variable represented by the values at them.
This time we shall see how we can approximate the function that defines the relationship between them without actually revealing what it is. ]]>
Regressive Tendencies tag:www.thusspakeak.com,2020:/ak//9.341 2020-12-04T20:00:00Z 2020-12-04T19:57:45Z Several months ago we saw how we could use basis functions to interpolate between points upon arbitrary curves or surfaces to approximate the values between them. Related to that is linear regression which fits a straight line, or a flat plane, though points that have values that are assumed to be the results of a linear function with independent random errors, having means of zero and equal standard deviations, in order to reveal the underlying relationship between them. Specifically, we want to find the linear function that minimises the differences between its results and the values at those points. a.k. Several months ago we saw how we could use basis functions to interpolate between points upon arbitrary curves or surfaces to approximate the values between them. Related to that is linear regression which fits a straight line, or a flat plane, though points that have values that are assumed to be the results of a linear function with independent random errors, having means of zero and equal standard deviations, in order to reveal the underlying relationship between them. Specifically, we want to find the linear function that minimises the differences between its results and the values at those points. ]]> What's The Lucky Number? tag:www.thusspakeak.com,2020:/ak//9.340 2020-11-06T20:00:00Z 2020-02-28T04:29:53Z Over the last few months we have been looking at Bernoulli processes which are sequences of Bernoulli trails, being observations of a Bernoulli distributed random variable with a success probability of p. We have seen that the number of failures before the first success follows the geometric distribution and the number of failures before the rth success follows the negative binomial distribution, which are the discrete analogues of the exponential and gamma distributions respectively. This time we shall take a look at the binomial distribution which governs the number of successes out of n trials and is the discrete version of the Poisson distribution. a.k. Over the last few months we have been looking at Bernoulli processes which are sequences of Bernoulli trails, being observations of a Bernoulli distributed random variable with a success probability of p. We have seen that the number of failures before the first success follows the geometric distribution and the number of failures before the rth success follows the negative binomial distribution, which are the discrete analogues of the exponential and gamma distributions respectively.
This time we shall take a look at the binomial distribution which governs the number of successes out of n trials and is the discrete version of the Poisson distribution. ]]>
Bad Luck Comes In Ks tag:www.thusspakeak.com,2020:/ak//9.339 2020-10-02T19:00:00Z 2020-10-02T19:02:02Z Lately we have been looking at Bernoulli processes which are sequences of independent experiments, known as Bernoulli trials, whose successes or failures are given by observations of a Bernoulli distributed random variable. Last time we saw that the number of failures before the first success was governed by the geometric distribution which is the discrete analogue of the exponential distribution and, like it, is a memoryless waiting time distribution in the sense that the distribution for the number of failures before the next success is identical no matter how many failures have already occurred whilst we've been waiting. This time we shall take a look at the distribution of the number of failures before a given number of successes, which is a discrete version of the gamma distribution which defines the probabilities of how long we must wait for multiple exponentially distributed events to occur. a.k. Lately we have been looking at Bernoulli processes which are sequences of independent experiments, known as Bernoulli trials, whose successes or failures are given by observations of a Bernoulli distributed random variable. Last time we saw that the number of failures before the first success was governed by the geometric distribution which is the discrete analogue of the exponential distribution and, like it, is a memoryless waiting time distribution in the sense that the distribution for the number of failures before the next success is identical no matter how many failures have already occurred whilst we've been waiting.
This time we shall take a look at the distribution of the number of failures before a given number of successes, which is a discrete version of the gamma distribution which defines the probabilities of how long we must wait for multiple exponentially distributed events to occur. ]]>
If At First You Don't Succeed tag:www.thusspakeak.com,2020:/ak//9.338 2020-09-04T19:00:00Z 2020-09-04T19:08:56Z Last time we took a first look at Bernoulli processes which are formed from a sequence of independent experiments, known as Bernoulli trials, each of which is governed by the Bernoulli distribution with a probability p of success. Since the outcome of one trial has no effect upon the next, such processes are memoryless meaning that the number of trials that we need to perform before getting a success is independent of how many we have already performed whilst waiting for one. We have already seen that if waiting times for memoryless events with fixed average arrival rates are continuous then they must be exponentially distributed and in this post we shall be looking at the discrete analogue. a.k. Last time we took a first look at Bernoulli processes which are formed from a sequence of independent experiments, known as Bernoulli trials, each of which is governed by the Bernoulli distribution with a probability p of success. Since the outcome of one trial has no effect upon the next, such processes are memoryless meaning that the number of trials that we need to perform before getting a success is independent of how many we have already performed whilst waiting for one.
We have already seen that if waiting times for memoryless events with fixed average arrival rates are continuous then they must be exponentially distributed and in this post we shall be looking at the discrete analogue. ]]>
One Thing Or Another tag:www.thusspakeak.com,2020:/ak//9.337 2020-08-07T19:00:00Z 2020-08-07T19:00:32Z Several years ago we took a look at memoryless processes in which the probability that we should wait for any given length of time for an event to occur is independent of how long we have already been waiting. We found that this implied that the waiting time must be exponentially distributed, that the waiting time for several events must be gamma distributed and that the number of events occuring in a unit of time must be Poisson distributed. These govern continuous memoryless processes in which events can occur at any time but not those in which events can only occur at specified times, such as the roll of a die coming up six, known as Bernoulli processes. Observations of such processes are known as Bernoulli trials and their successes and failures are governed by the Bernoulli distribution, which we shall take a look at in this post. a.k. Several years ago we took a look at memoryless processes in which the probability that we should wait for any given length of time for an event to occur is independent of how long we have already been waiting. We found that this implied that the waiting time must be exponentially distributed, that the waiting time for several events must be gamma distributed and that the number of events occuring in a unit of time must be Poisson distributed.
These govern continuous memoryless processes in which events can occur at any time but not those in which events can only occur at specified times, such as the roll of a die coming up six, known as Bernoulli processes. Observations of such processes are known as Bernoulli trials and their successes and failures are governed by the Bernoulli distribution, which we shall take a look at in this post. ]]>