Welcome to our site! EDAboard.com is an international Electronics Discussion Forum focused on EDA software, circuits, schematics, books, theory, papers, asic, pld, 8051, DSP, Network, RF, Analog Design, PCB, Service Manuals... and a whole lot more! To participate you need to register. Registration is free. Click here to register now.
Simulation of dependent random variables using copulas
MATLAB is an ideal tool for running simulations that incorporate random inputs or noise. The Statistics Toolbox provides functions to create sequences of random data according to many common univariate distributions. The Toolbox also includes a few functions to generate random data from multivariate distributions, such as the multivariate normal and multivariate t. However, there is no built-in way to generate data from multivariate distributions that have complicated relationships among the variables, or where the individual variables are from different distributions.
Recently, copulas have become popular in simulation models. Copulas are functions that describe dependencies among variables, and provide a way to create distributions to model correlated multivariate data. Using a copula, a data analyst can construct a multivariate distribution by specifying marginal univariate distributions, and choosing a particular copula to provide a correlation structure between variables. Bivariate distributions, as well as distributions in higher dimensions, are possible. In this demo, we discuss how to use copulas to generate dependent multivariate random data in MATLAB, using the Statistics Toolbox.
Contents
* Dependence between simulation inputs
* A more general method for constructing dependent bivariate distributions
* Rank correlation coefficents
* Copulas
* t copulas
* Higher-order copulas
* Copulas and empirical marginal distributions
Dependence between simulation inputs
One of the design decisions for a Monte-Carlo simulation is a choice of probability distributions for the random inputs. Selecting a distribution for each individual variable is often straightforward, but deciding what dependencies should exist between the inputs may not be. Ideally, input data to a simulation should reflect what is known about dependence among the real quantities being modelled. However, there may be little or no information on which to base any dependence in the simulation, and in such cases, it is a good idea to experiment with different possibilities, in order to determine the model's sensitivity.
However, it can be difficult to actually generate random inputs with dependence when they have distributions that are not from a standard multivariate distribution. Further, some of the standard multivariate distributions can model only very limited types of dependence. It's always possible to make the inputs independent, and while that is a simple choice, it's not always sensible and can lead to the wrong conclusions.
For example, a Monte-Carlo simulation of financial risk might have random inputs that represent different sources of insurance losses. These inputs might be modeled as lognormal random variables. A reasonable question to ask is how dependence between these two inputs affects the results of the simulation. Indeed, it might be known from real data that the same random conditions affect both sources, and ignoring that in the simulation could lead to the wrong conclusions.
Simulation of independent lognormal random variables is trivial. The simplest way would be to use the LOGNRND function. Here, we'll use the MVNRND function to generate n pairs of independent normal random variables, and then exponentiate them. Notice that the covariance matrix used here is diagonal, i.e., independence between the columns of Z.
It's clear that there is more of a tendency in the second dataset for large values of X1 to be associated with large values of X2, and similarly for small values. This dependence is determined by the correlation parameter, rho, of the underlying bivariate normal. The conclusions drawn from the simulation could well depend on whether or not X1 and X2 were generated with dependence or not.
The bivariate lognormal distribution is a simple solution in the case, and of course easily generalizes to higher dimensions and cases where the marginal distributions are _different_ lognormals. Other multivariate distributions also exist, for example, the multivariate t and the Dirichlet distributions are used to simulate dependent t and beta random variables, respectively. But the list of simple multivariate distributions is not long, and they only apply in cases where the marginals are all in the same family (or even the exact same distributions). This can be a a real limitation in many situations.
A more general method for constructing dependent bivariate distributions
Although the above construction that creates a bivariate lognormal is simple, it serves to illustrate a method which is more generally applicable. First, we generate pairs of values from a bivariate normal distribution. There is statistical dependence between these two variables, and each has a normal marginal distribution. Next, a transformation (the exponential function) is applied separately to each variable, changing the marginal distributions into lognormals. The transformed variables still have a statistical dependence.
If a suitable transformation could be found, this method could be generalized to create dependent bivariate random vectors with other marginal distributions. In fact, a general method of constructing such a transformation does exist, although not as simple as just exponentiation.
By definition, applying the normal CDF (denoted here by PHI) to a standard normal random variable results in a r.v. that is uniform on the interval [0, 1]. To see this, if Z has a standard normal distribution, then the CDF of U = PHI(Z) is
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.