6/5/2012
Matlab Functions for DoE
Contents
Introduction
Design of Experiments is used to change one or more variables, while observing the effect these changes have on the response variables. Part of DoE is to plan an experimental design indicating the changes of the input factors. Experiments include comparative, screening experiments and modeling the response surface of a process and each one uses a set of experimental designs. In this part, these designs will be analysed along with their corresponding Matlab functions.
Definitions
Before we begin, a set of definitions of some common used terms will be given to ease the analysis.
- Factor: a variable of the experiment (I.e temperature, humidity)
- Level: the value that a factor has
- Permutation: a sequence containing each element from a finite set once and only once.
- Mean vector: a vector whose elements are the arithmetic means of each variable of a multivariate data set.
- Variance-covariance matrix:
- Uniform distribution: a distribution that has constant probability
- Normal distribution: a probability distribution with a bell-shaped probability density function
- Correlation: a measure of the strength of the relationship between two variables.
- Multivariate data: data collected on several variables
Comparative Experiments
For comparative experiments (choosing between alternatives), an experimental design used is the Latin Hypercube Sampling. This method generates a random sample with n sampling points for p variables. The probability distribution is split into n intervals of equal probability and each of the n intervals is sampled once.
The corresponding Matlab functions are:
X = lhsdesign(n,p)
n is the number of sampling points p is the number of factors
The LHS sample is a n-by-p matrix that is produced by dividing each variable p in n equally probable intervals (0,1/n),(1/n,2/n)...(1-1/n,1) that are being sampled once. The n sample points are randomly permutted for each dimension (column) producing the latin hypercube design.
X = lhsdesign(n,p,'smooth','off ')
This function chooses the centers of the above intervals as the sample points and are randomly permutted for each column. When the flag is 'on' (this is the default value) the output is the same as lhsdesign(n,p).
X = lhsdesign(n,p,'criterion',criterion)
A criterion is set to improve the design and can be one of the following:
'none' uses no iterative method to improve the design
'maximin' produces the deisgn with the gratest minimum distance between the sample points
'correlation' chooses the design with the minimum variables correlation.
Matlab supports the generation of Latin Hypercube Sample from a normal distribution. (Needs further information)
X = lhsnorm(mu,sigma,n)
mu is the mean vector of a multivariate data set
sigma is the variance-covariance matrix with the variances on the diagonal. Sigma should be a square matrix with size equal to the number of columns of mu or a row vector with length equal to the number of columns of mu. If we generate a 1-D sample, the covariance matrix is a single variable sigma2.
n is the number of sampling points
X = lhsnorm(mu,sigma,n,flag)
Screening Experiments
Another category of experiments are the so called “screening experiments” that are used to select the factors having a great effect in the response. The experimental designs used in screening experiments are full factorial and fractional factorial designs.
Full Factorial Design
The corresponding Matlab functions are:
dFF2 = ff2n(n)
This function outputs a two level full factorial design in standard order, meaning no random or center points are used, while all input factors are set at two levels high and low (1, 0). If n is the number of factors, the two level full factorial design has m = 2n runs and the output is a m-by-n matrix. This design can also be seen as a binary counter.
dFF = fullfact(levels)
The input vector [levels] is of length n, whose elements are the number of values that each input factor takes. The design has m = n1xn2x...xn runs and the output is a m-by-n matrix.
Fractional Factorial Designs
In full factorial designs, the number of runs can quickly become very large. A solution to this problem is to use only a fraction of the total runs of a full factorial design. We symbol these designs as 2(k-p), which is the number of runs, where k is the number of all input factors and p is the number of factors generated from the interactions of the k factors. As a result, the effects of the p input factors become confounded, meaning we cannot estimate separately their effects on the response, since their effect is “contaminated” by the interaction of the k factors. For example, suppose we have three input factors 'a', 'b' and 'c'. A full factorial design would need 23 = 8 runs, while the factorial design 2(3-1) needs only 4 runs. We write the full factorial design for 'a' and 'b' , while 'c' is derived by multiplicating each row of 'a' and 'b'. Since c=a*b, the effect of factor 'c' on the response is affected by the effects of a and b, which is said to be confounded. The degree to which the effects are confounded by interactions is given by the resolution of the design. There are three main types of resolution:
- Resolution III: Main effects are confounded with two factor interactions.
- Resolution IV: No main effects are confounded with two factor interactions, but two factor interactions are confounded with each other.
- Resolution V: No main effect or two-factor interaction is confounded with any other main effect or two-factor interaction, but two-factor interactions are confounded with three-factor interactions.
Another important concept in fractional factorial designs is “design generator”. Based on the example given above, c = a*b. Multiplying both sides by c, the result is c*c = a*b*c, resulting in I = a*b*c. This relationship is the defining relation of our model, meaning that any interaction between the input factors can be generated by this, resulting in the confounding pattern of the model.
The corresponding Matlab functions are:
Design Generators
fracfactgen(terms)
fracfactgen(terms, k)
fracfact(terms, k, R)
fracfactgen(terms,k,R,basic)
These functions produce terms interactions for fractional factorial designs. 'Terms' is a text string consisting of the letters 'a' -'Z' , including single letters or combinations of letters, which we call “words”. These words are separated by spaces and define which factors are estimable in the design. For example, fracfactgen('a b c abc') will produce a design that includes the main effects of a,b and c and the interaction between the three factors abc. Terms can also be a matrix of 0's and 1's, where the 1's in each row specify the main effects and the interactions of the model. For example, fracfactgen([1 0 0 0;1 0 0 1],3) will produce a model that includes 'a' and the interaction between 'a' and 'd', having a resolution of 3.
We input 'k' to specify the number of runs (2^k) for our design. If not specified, the function finds the design with the minimum runs.
'R' defines the resolution of our design, specifying which interactions take place. The common used resolutions are 3,4 and 5, which are explained above. The default value is 3.
'Basic' is a vector whose indices specify which factors should get a two-level full factorial design and which will be confounded with interactions.
Fractional Factorial Design
X = fracfact(gen)
This function produces a two-level fractional factorial design, using the words defined in gen. Gen includes an array of words consisting of letters 'a' – 'z' and 'A'-'Z', a total of 52 case sensitive letters. The user can define his own generators in gen or use the generators produced by fracfactgen function. The output is a m-by-n matrix, where m=2k and k is the number of letters in gen, while n is the number of words (single letters or combinations) in gen.
[X,conf] = fracfact(gen) [X,conf] = fracfact(gen, Name,Value)
These functions produce the confounding pattern of the design, showing how the factors are confounded with each other. Name and Value are additional options that allow the user to specify the corresponding names and values of the factors. The default name is {'X1', 'X2'....}.
Response Surface Methodology
Another category of experimental designs are Response Surface Designs, used when our model includes quadratic terms, creating curvature. In this situation, each factor needs at least 3 levels to estimate the quadratic curvature. In that case, using factorial designs for 3 factors creates not only a large amount of runs, but the produced design lacks the property of rotatibility, meaning that the variance of our response has to depend only on the distance of a point from the design's center and not its direction. Two commonly used experimental designs for this situation are the Central Composite Design and the Box Benkhen Design.
Central Composite Design
Central Composite Designs use factorial designs with center points (a point in the middle of a factor range) and then add a set of star points that estimate curvature. If we have k factors, then the star points are 2*k and represent new values of the input factors. The design space is considered a cube with vertices the points +1,-1 produced by the factorial design.
There are three CC design types:
- Circumscribed (CCC) : Star points are at some distance from the center of the design space. The design has spherical or circular symmetry and requires 5 levels for each input factor.
- Inscribed (CCI): The design considers the limits of factors as star points and produces a factorial design within those limits. It requires 5 levels of each input factor as CCC. It is also described as a scaled down CCC.
- Face centered(CCF): The star points are in the center of each face in the factorial space at a distance a = 1 from the center. It requires 3 levels for each input factor.
The corresponding Matlab functions are:
dCC = ccdesign(n)
This function creates a central composite design for n>=2 factors. The output is a m-by-n matrix, where m is the number of runs and n is the number of factors. Factor values are normalised so that cube points take values between -1 and +1.
[dCC,blocks] = ccdesign(n)
This function outputs the requested ccdesign and a column vector 'blocks' consisting of m rows. This vector indicates which runs have to be measured under similar conditions, to minimize any systematic effects on the response.
dCC = ccdesign(n,'Name',value)
This function allows additional options for the central composite design, in which the user specifies the name of the parameter and its corresponding value.
'Name' can take one of the following parameters:
- 'center' : It is the number of center points. Its value can be just an integer, indicating how many points to include, 'uniform' for uniform precision or 'orthogonal' which is the default.
- 'fraction': It chooses only a fraction of the design space. The values can be '0' the default, '½' or '¼'.
- 'type': It defines the type of the Central Composite Design. The values are 'circumscribed', 'inscribed' and 'faced' , which correspond to the description above.
- 'blocksize': It defines the maximum number of points in the block and its value is an integer.
Box-Benkhen Design
This design is also used in Response Surface Methodology and requires fewer runs than central composite design. It requires 3 levels of each factor and it works by keeping one point steady, while varying the other two points in a full factorial way.
The corresponding Matlab functions are:
dBB = bbdesign(n)
This function outputs a Box-Benkehn design for n factors with n=>3. The output matrix is a m-by-n matrix, where m is the number of treatments. The factor values are scaled so that the cube points take values between -1 and +1.
[dBB,blocks] = bbdesign(n)
This function outputs a Box Benkhen design and a column vector with m rows, indicating the runs that have to be measured under similar conditions to avoid any systematic effects on the response.
[…] = bbdesign(n,parameter,value)
This function allows the user to set more parameters for the design. The parameters are:
- 'center' : It indicated how many points are included in the design and it is specified by the integer n.
- 'blocksize': It indicates the maximum number of points in each block and it is specified by an integer.
Both Central Composite Design and Box-Benkhen Design can be visualised with the 'plot' command to see the effects the parameters have on the design.
The next few days, there will be an analysis on the functions of the Doe_beta toolbox, which will form the basis for the project.