13/5/2012

In this report, we present the functions included in the DoE_beta toolbox by Yann Collette. The functions cover random number generation, factorial design, RSM functions and computation of statistical values.

**Random Number Generation**

*Result = doe_prbs(init,feedback)*

This is a pseudorandom binary signal generation function

- init: initial vector of 1's or 0's
- feedback: vector whose elements point the positions

*lhs_matrix = doe_lhs(nb_dims, x_min, x_max, nb_div, nb_iterations, nb_points, random)*

This function computes a latin hypercube sampling.

- nb_dims: number of variables
- x_min: a vector containing the lower bounds of each variable
- x_max: a vector containing the upper bounds of each variables
- nb_div:
- nb_iterations: default value nb_iterations=3*nb_points
- nb_points: number of sampling points in the design
- random: If set to "%T" the sampling points are randomly position in the cell
- If set to "%F" the sampling points are placed deterministic
- lhs_matrix:A nb_points-by-nb_dims matrix

**Quasi-random sequences**

Quasi-random sequences are less random but better distributed than pseudo-random sequences, as they produce points with high correlation between them. Depending on the way set points are generated, quasi-random sequences can be hammersley, halton, faure or sobol sequences.

*r = doe_hammersley(dim_num, n, step, seed, leap, base)*

This function computes a Hammersley data set.

- dim_num: number of dimensions
- n: number of points to be generated

The function works inputting only the above 2 parameters. The following are optional:

*r = doe_halton(dim_num,n,step,seed,leap,base)*

This function computes a Halton point set.

- dim_num: number of dimensions
- n: number of points generated

*r = doe_sobol(n)*

This function computes a Sobol data set

- n: the dimension of the generated vector (not above 6)
- r: vector containing Sobol quasi_random points

The user first initialises the function inputting *doe_sobol(-1)* and then repeatedly calls doe_sobol(n).

**Factorial Design**

Factorial designs were analysed in the 2012-05-06 report of DoE.

*r = doe_factorial(nb_var)*

- nb_var: number if input factors
r: A

*2*^{n}matrix with possible combinations of the factors

Yates Algorithm

Yates algorithm estimates effects in factorial designs. In this toolbox it is implemented by doe_yates.sci

*[ef,id] = doe_yates(y, sort_eff)*

- y: response from a 2 level full factorial design
- sort_eff: sort the effects
- ef: vector of average response, main response and interaction effects
- id: identification vector main and interaction effects

The Reverve Yates algorithm estimates the response given the effects.

*[y,id] = doe_ryates(ef)*

- ef: the estimated response
- y: vector of average response, main effects and interaction effects
- id: identification vector of main and interaction effects

Hadamard matrix

Hadamard matrix is a square matrix consisting of +1's and -1's, with each consequent rows representing orthogonal vectors.

*H = hadamard(n)*

- n:the number of repetitions of the hadamard matrix H=[1,1;-1 1]
H: A

*2*^{n}-by-*2*^{n}matrix with consequent orthogonal vectors

**Response Surface Methodology**

In Doe_beta toolbox, there are functions producing Box-Benkhen and central composite design.

*H = doe_box_benkhen(nb_var,nb_center)*

- nb_var: number of variables
- nb_center: number of repetitions of the central point

*H = doe_composite(nb_var,alpha)*

- nb_var: number of variables
- alpha: normalised vlaue of the star points(default srt(2))

*H = doe_star(nb_var)*

This function outputs a 2*nb_var -by-nb_var matrix, containing a matrix with diagonal of +1's and a matrix with diagonal of -1's.

- nb_var: number of variables

**Computer-aided designs**

Computer-aided designs are experimental designs that are generated based on a particular optimality criterion and are generally optimal only for a specified model. The most used criteria are the following:

- A-optimal Design: It minimizes the trace of the inverse of the information matrix

It is implemented by *[M_doe,history] = doe_a_opti(M_init, M_cand, doe_size, model, l_bounds, u_bounds, ItMx, p_level, Log, size_tabu_list)* and *comp_a_opti_crit.sci*

Input Parameters (the input parameters are the same for all the optimal functions)

- M_init: an initial design of experiments
- M_cand: set of candidate points
- doe_size: desired number of points in the optimised doe
- models: list of monomials
- l_bounds: vector of lower bounds for each variable
- u_bounds: vector of upper bounds for each variable
- itMX: maximum number of iterations
- P_level: progress level
- D-optimal design: It maximizes the determinant of the information matrix X'X of the deisgn

*[M_doe,history] = doe_d_opti(M_init, M_cand, doe_size, model, l_bounds, u_bounds, ItMX, p_level, LOg, size_tabu_list)*

*comp_d_opti_crit.sci*

G-optimal: minimizes max[d = X'

_{(X'X)}^{-1}X] over a psecified set of design points

*[M_doe,history] = doe_g_opti(M_init, M_cand, doe_size, model, l_bounds, u_bounds, ItMX, p_level, LOg, size_tabu_list)*

*comp_g_opti_crit.sci*

- O-optimal Design:

*[M_doe,history] = doe_o_opti(M_init, M_cand, doe_size, model, l_bounds, u_bounds, ItMX, p_level, LOg, size_tabu_list)*

*comp_o_opti_crit.sci*

**Super Saturated experiments**

When the number of factors exceed the number of runs, the design is called super saturated. Such designs can be computed by applying the a-optimal, d-optimal , correlation and khi2 criteria. The following functions implement such designs:

*Result = comp_ssd_a_value_crit(M_doe,model)*

This function uses the a-optimal criterion mentioned above for the computation of a super saturated design.

- M_doe: input data set on which the criterion is performed
- model: a list of monomials (it can be generated by the function doe_poly_model() which is described later)

These inputs are the same in all ssd functions.

*Result = comp_ssd_ave_khi2_crit(M_doe,Model)* (Needs more information)

*Result = comp_ssd_max_khi2_crit(M_doe,Model)* (Needs more information)

*Result = comp_ssd_r_value_crit(M_doe,Model)* (Needs more information)

This function uses the correlation criterion to compute such a design.

**Computing Statistical Values**

*result = doe_test_mean(x, y, level, operation)*

It tests whether the means of two samples x kai y are equal.

- x: first sample
- y: second sample
- level: the level of confidence of the comparison, which must be between 0 and 1.
operation: The user can input '==', '>=', '<='

- result: 'T' if operation is true and 'F' if operation is false.

*result = doe_test_var(x, y, level, operation)*

It tests whether the variances of two samples x kai y are equal.

Input and output parameters are the same as in doe_test_mean().

*result = doe_test_significance(param_mean,param_var,size_stat,val_to_comp,level,operation)*

*retval = skewness(x)*

This function measures the skewness, the asymmetry of the probability distribution. If skewness is positive, the probability distribution is concentrated to the left of the figure, if it is negative it is concentrated on the right and if it is zero then it is symmetrical.

- x: the input data set

*retval = kurtosis(x)*

This function measures kurtosis, the degree of peakedness of a distribution.

- x: the input data set

Discrepancy

The following functions compute the difference between the empirical cumulative distribution function of a design and the uniform cumulative distribution function, using the Centerd-L2 and Wrap-around-L2 discrepancy criteria.

*Result = comp_CL2_crit(Data)*

*Result = comp_WD2_crit(M_doe,Model)*

Input Parameters

- Data: The data points
- M_doe: an experimental design
- Model: a list of monomials to represent the model

**Data sets**

*X_norm = normalize(X_in,replace)*

This function normalises a given data set

- X_in: the input data set
- replace: a boolean operator
- X_norm: the normalised data set

*X_std = standardize(X_in)*

This function normalises and centers the given data set.

- X_in: the input data set
- X_std: the normalised and centered data set

*H = doe_scramble(H1,N)*

This function scrambles a given design of experiments

- H1: the design
- N: number of random permutation

*H = doe_union(H1,H2)*

*H = doe_merge(H1,H2)*

These functions merge two given design of experiments H1, H2.

*H = doe_diff(H1,H2)*

This function outputs a vector H containing the common points between two data sets H1 and H2.

*[s_opt,b_opt,res_mean,res_std] = crossvalidate(fun,K,steps,X,y,varargin)*

Cross validation measures how accurately a model will perform in practise. All observations are used for both training and validation, and each observation is used for validation exactly once.

*H_unnorm = unnorm_doe_matrix(H,min_levels,max_levels)*

This function translates a design if experimtnes containing +1's and -1's to maximun and minimim levels.

- H: the experimental deisgn
- min_levels: The value -1's will be translated to.
- max_levels: The value +1's will be translated to.

**Regression**

*R = build_regression_matrix(H,model,build)*

This function computes the regression matrix of a given model.

- H: a given experimental design
- model: a listi of monomials
- build: By default all the monomials of the model are selected. If build(i)==%T then only the i-th monomial is selected.

*var = var_regression_matrix(H,x,model,sigma)*

This function computes the variance of a given model.

- H: the input experimental design
- x: the point where the variance will be computed
- model: a list of monomials
- sigma: the variance of the model
- var: the variance of the given design in a certain point

*model = doe_poly_model(mod_type,nb_var,order)*

This function produces a list of monomials that represent a polynomial model.

- mod_type: the type of polynomial to be produced. It can be set to 'lin' for linear model, 'poly' for a polynomial model and to 'inter' for a linear model with interactions.
- nb_var: number of variables
- order: the order of the polynomial
- model: an array of polynomials 1 x1 x2 x1^2 etc

*[model_new,coeff_new] = doe_model_bselect(nb_var,model_old,measures,Log)*

This function removes unnecessary monomials from an input model and selects the best subset.

- nb_var: number of variables
- model_old: the input model whose monomials wish to remove
- measures: a set of data. The last column of the data set must containts
- the measure of the output.

- Log: if %T then some intermediate messages are printed in the
- console

*[model_new,coeff_new] = doe_model_fselect(nb_var,model_old,measures,Log)*

This function starts with one monomial of a model and progressively adds the best monomials.

The input and output parameters are the same as in doe_model_bselect()