Contributor-DoE-GSOC2012/report-2012-05-13

13/5/2012

In this report, we present the functions included in the DoE_beta toolbox by Yann Collette. The functions cover random number generation, factorial design, RSM functions and computation of statistical values.

Random Number Generation

Result = doe_prbs(init,feedback)

This is a pseudorandom binary signal generation function

init: initial vector of 1's or 0's
feedback: vector whose elements point the positions

lhs_matrix = doe_lhs(nb_dims, x_min, x_max, nb_div, nb_iterations, nb_points, random)

This function computes a latin hypercube sampling.

nb_dims: number of variables
x_min: a vector containing the lower bounds of each variable
x_max: a vector containing the upper bounds of each variables
nb_div:
nb_iterations: default value nb_iterations=3*nb_points
nb_points: number of sampling points in the design
random: If set to "%T" the sampling points are randomly position in the cell
If set to "%F" the sampling points are placed deterministic
lhs_matrix:A nb_points-by-nb_dims matrix

Quasi-random sequences

Quasi-random sequences are less random but better distributed than pseudo-random sequences, as they produce points with high correlation between them. Depending on the way set points are generated, quasi-random sequences can be hammersley, halton, faure or sobol sequences.

r = doe_hammersley(dim_num, n, step, seed, leap, base)

This function computes a Hammersley data set.

dim_num: number of dimensions
n: number of points to be generated

The function works inputting only the above 2 parameters. The following are optional:

r = doe_halton(dim_num,n,step,seed,leap,base)

This function computes a Halton point set.

dim_num: number of dimensions
n: number of points generated

r = doe_sobol(n)

This function computes a Sobol data set

n: the dimension of the generated vector (not above 6)
r: vector containing Sobol quasi_random points

The user first initialises the function inputting doe_sobol(-1) and then repeatedly calls doe_sobol(n).

Factorial Design

Factorial designs were analysed in the 2012-05-06 report of DoE.

r = doe_factorial(nb_var)

nb_var: number if input factors
r: A 2ⁿ matrix with possible combinations of the factors

Yates Algorithm

Yates algorithm estimates effects in factorial designs. In this toolbox it is implemented by doe_yates.sci

[ef,id] = doe_yates(y, sort_eff)

y: response from a 2 level full factorial design
sort_eff: sort the effects
ef: vector of average response, main response and interaction effects
id: identification vector main and interaction effects

The Reverve Yates algorithm estimates the response given the effects.

[y,id] = doe_ryates(ef)

ef: the estimated response
y: vector of average response, main effects and interaction effects
id: identification vector of main and interaction effects

Hadamard matrix

Hadamard matrix is a square matrix consisting of +1's and -1's, with each consequent rows representing orthogonal vectors.

H = hadamard(n)

n:the number of repetitions of the hadamard matrix H=[1,1;-1 1]
H: A 2ⁿ-by-2ⁿ matrix with consequent orthogonal vectors

Response Surface Methodology

In Doe_beta toolbox, there are functions producing Box-Benkhen and central composite design.

H = doe_box_benkhen(nb_var,nb_center)

nb_var: number of variables
nb_center: number of repetitions of the central point

H = doe_composite(nb_var,alpha)

nb_var: number of variables
alpha: normalised vlaue of the star points(default srt(2))

H = doe_star(nb_var)

This function outputs a 2*nb_var -by-nb_var matrix, containing a matrix with diagonal of +1's and a matrix with diagonal of -1's.

nb_var: number of variables

Computer-aided designs

Computer-aided designs are experimental designs that are generated based on a particular optimality criterion and are generally optimal only for a specified model. The most used criteria are the following:

A-optimal Design: It minimizes the trace of the inverse of the information matrix

It is implemented by [M_doe,history] = doe_a_opti(M_init, M_cand, doe_size, model, l_bounds, u_bounds, ItMx, p_level, Log, size_tabu_list) and comp_a_opti_crit.sci

Input Parameters (the input parameters are the same for all the optimal functions)

M_init: an initial design of experiments
M_cand: set of candidate points
doe_size: desired number of points in the optimised doe
models: list of monomials
l_bounds: vector of lower bounds for each variable
u_bounds: vector of upper bounds for each variable
itMX: maximum number of iterations
P_level: progress level
D-optimal design: It maximizes the determinant of the information matrix X'X of the deisgn

[M_doe,history] = doe_d_opti(M_init, M_cand, doe_size, model, l_bounds, u_bounds, ItMX, p_level, LOg, size_tabu_list)

comp_d_opti_crit.sci

G-optimal: minimizes max[d = X'_(X'X)^-1X] over a psecified set of design points

[M_doe,history] = doe_g_opti(M_init, M_cand, doe_size, model, l_bounds, u_bounds, ItMX, p_level, LOg, size_tabu_list)

comp_g_opti_crit.sci

O-optimal Design:

[M_doe,history] = doe_o_opti(M_init, M_cand, doe_size, model, l_bounds, u_bounds, ItMX, p_level, LOg, size_tabu_list)

comp_o_opti_crit.sci

Super Saturated experiments

When the number of factors exceed the number of runs, the design is called super saturated. Such designs can be computed by applying the a-optimal, d-optimal , correlation and khi2 criteria. The following functions implement such designs:

Result = comp_ssd_a_value_crit(M_doe,model)

This function uses the a-optimal criterion mentioned above for the computation of a super saturated design.

M_doe: input data set on which the criterion is performed
model: a list of monomials (it can be generated by the function doe_poly_model() which is described later)

These inputs are the same in all ssd functions.

Result = comp_ssd_ave_khi2_crit(M_doe,Model) (Needs more information)

Result = comp_ssd_max_khi2_crit(M_doe,Model) (Needs more information)

Result = comp_ssd_r_value_crit(M_doe,Model) (Needs more information)

This function uses the correlation criterion to compute such a design.

Computing Statistical Values

result = doe_test_mean(x, y, level, operation)

It tests whether the means of two samples x kai y are equal.

x: first sample
y: second sample
level: the level of confidence of the comparison, which must be between 0 and 1.
operation: The user can input '==', '>=', '<='
result: 'T' if operation is true and 'F' if operation is false.

result = doe_test_var(x, y, level, operation)

It tests whether the variances of two samples x kai y are equal.

Input and output parameters are the same as in doe_test_mean().

result = doe_test_significance(param_mean,param_var,size_stat,val_to_comp,level,operation)

retval = skewness(x)

This function measures the skewness, the asymmetry of the probability distribution. If skewness is positive, the probability distribution is concentrated to the left of the figure, if it is negative it is concentrated on the right and if it is zero then it is symmetrical.

x: the input data set

retval = kurtosis(x)

This function measures kurtosis, the degree of peakedness of a distribution.

x: the input data set

Discrepancy

The following functions compute the difference between the empirical cumulative distribution function of a design and the uniform cumulative distribution function, using the Centerd-L2 and Wrap-around-L2 discrepancy criteria.

Result = comp_CL2_crit(Data)

Result = comp_WD2_crit(M_doe,Model)

Input Parameters

Data: The data points
M_doe: an experimental design
Model: a list of monomials to represent the model

Data sets

X_norm = normalize(X_in,replace)

This function normalises a given data set

X_in: the input data set
replace: a boolean operator
X_norm: the normalised data set

X_std = standardize(X_in)

This function normalises and centers the given data set.

X_in: the input data set
X_std: the normalised and centered data set

H = doe_scramble(H1,N)

This function scrambles a given design of experiments

H1: the design
N: number of random permutation

H = doe_union(H1,H2)

H = doe_merge(H1,H2)

These functions merge two given design of experiments H1, H2.

H = doe_diff(H1,H2)

This function outputs a vector H containing the common points between two data sets H1 and H2.

[s_opt,b_opt,res_mean,res_std] = crossvalidate(fun,K,steps,X,y,varargin)

Cross validation measures how accurately a model will perform in practise. All observations are used for both training and validation, and each observation is used for validation exactly once.

H_unnorm = unnorm_doe_matrix(H,min_levels,max_levels)

This function translates a design if experimtnes containing +1's and -1's to maximun and minimim levels.

H: the experimental deisgn
min_levels: The value -1's will be translated to.
max_levels: The value +1's will be translated to.

Regression

R = build_regression_matrix(H,model,build)

This function computes the regression matrix of a given model.

H: a given experimental design
model: a listi of monomials
build: By default all the monomials of the model are selected. If build(i)==%T then only the i-th monomial is selected.

var = var_regression_matrix(H,x,model,sigma)

This function computes the variance of a given model.

H: the input experimental design
x: the point where the variance will be computed
model: a list of monomials
sigma: the variance of the model
var: the variance of the given design in a certain point

model = doe_poly_model(mod_type,nb_var,order)

This function produces a list of monomials that represent a polynomial model.

mod_type: the type of polynomial to be produced. It can be set to 'lin' for linear model, 'poly' for a polynomial model and to 'inter' for a linear model with interactions.
nb_var: number of variables
order: the order of the polynomial
model: an array of polynomials 1 x1 x2 x1^2 etc

[model_new,coeff_new] = doe_model_bselect(nb_var,model_old,measures,Log)

This function removes unnecessary monomials from an input model and selects the best subset.

nb_var: number of variables
model_old: the input model whose monomials wish to remove
measures: a set of data. The last column of the data set must containts
- the measure of the output.
Log: if %T then some intermediate messages are printed in the
- console

[model_new,coeff_new] = doe_model_fselect(nb_var,model_old,measures,Log)

This function starts with one monomial of a model and progressively adds the best monomials.

The input and output parameters are the same as in doe_model_bselect()