Guidelines To Design a Module (also called Toolbox)

Abstract

In this page, we give some general principles which can be used when we design a Scilab module. We do not focus on technical details to actually create a toolbox. Instead, we give general methods on the design of a public module. These rules are based on day-to-day experiences from ourselves and our contact with other Scilab users. The core idea in this page is to present simple proposals which are often neglected when we do not think from the point of view of other users.

Contents

Guidelines To Design a Module (also called Toolbox)

Introduction

The first step when we learn how to create a module is to read the "Toolbox skeleton", a complete example module which is available in SCI/contrib/toolbox_skeleton. Beginners may also consider the How to create a toolbox:

Create a toolbox

In order to manage the sources of the module, we can use the Scilab Forge :

http://forge.scilab.org/

The Scilab Forge is a plateform to host the sources and share the work in a team of developers, sometimes not located in the same country. It provides both SubVersion (SVN) and GIT source code systems. The Forge also allows to manage the Downloads of the module and the Tickets (Bug Reports).

Once done, we can use ATOMS to distribute our work. This is describe in

https://scilab.gitlab.io/legacy_wiki/ATOMS.

Creating a toolbox for our personal use or distributing a module for a public use is sometimes completely different and, most of the times, brings new issues and requires a different point of view. All in all, writing a public set of functions requires that we think in terms of Application Programming Interface (API): what public functions this module should provide ?

In this document, we analyse methods that we can use to solve practical issues with the design of a toolbox. Indeed, not all problems are caused by technical bugs. They are some problems which are generated by the design of the toolbox, and which do not appear so clearly. These issues are parts of the toolbox which are missing, which makes them not so easy to identify.

Avoid function name conflicts

We should avoid to name our functions with short names such as "euler" or "dt". This is because it may cause name conflicts between our module and other modules.

For example, we may design your module with Scilab v5.2.2. But, in Scilab v5.2.3, we introduce a new function named "euler". Then our module will not work anymore.
For example, we may design your module with Scilab v5.2.2. But, there is another module, say "MyExtraModule", that we have never used or even heard of, which also provide an "euler" function. Then, the users who both use our module and the "MyExtraModule" module will have trouble with the "euler" function.
It may happen that users of our module call a function, say "foo1", where "dt" is a variable. In this case, we may see a warning message about the overwriting of the "dt" function.

There is one simple way to solve this issue. Assume that the name of our module is "scifoo". A good naming rule is to use underscores to separate the name of the module and the name of the functions. For example, we may name our functions with as "scifoo_euler", "scifoo_dt", etc... The naming rule "scifooEuler", "scifooDt" would also work.

An additionnal advantage of this naming convention is that the auto-completion in the console will work. That is, if we interactively type "scifoo" in the console, then type the "TAB" key, then the list of all functions begining with "scifoo" will be automatically displayed graphically by Scilab.

Provide help pages

In general, all the public functions should be associated with an help page.

The content of the .xml file associated with the help page may be automatically produced from the .sci macro, based on the help_from_sci function:

help help_from_sci

The help_from_sci function generates the .xml file depending on the comments in the .sci file.

Simply call the help_from_sci function, and this will open the editor with a template to be filled.

help_from_sci()

For example, the following mystery function contains comments which give a description of the function.

function y = mystery ( x )
  // Randomly performs a mysterious computation.
  // 
  // Calling Sequence
  //   y = mystery  ( x )
  // 
  // Parameters
  //   x : a m-by-n, matrix of doubles
  //   y : a m-by-n, matrix of doubles, the randomly computed values.
  // 
  // Description
  // Randomly computes results.
  // 
  // Examples
  // x = mystery ([1 2 4 5])
  //
  // Authors
  // Bill Smith, 2010

  x = x(:)
  y = grand(1,"prm",x)
endfunction

In the previous function, the words "Calling Sequence", "Parameters", "Description", "Examples" and "Authors" are automatically recognized by the help_from_sci() function, which creates the associated sections in the .xml file. In this case, the following statement would automatically generate the associated help/en_US/mystery.xml file, based on the macros/mystery.sci macro.

help_from_sci("macros","help/en_US")

Another solution is to use the helptbx module to automate the process of updating the .xml files of a toolbox based on the .sci files. This script sequentially uses the "help_from_sci" function to automatically generate the .xml from the .sci files in one given directory. Moreover, the .xml files can be automatically updated depending on the time stamp of the .sci files. This is convenient in the cases where we update the comments in the .sci file.

As an example, we can analyse how the help pages of the "number" module (http://forge.scilab.org/index.php/p/number) were created. In the help/en_US directory of the module, the update_help.sce is a script which updates the help pages (.xml) from the macros (.sci). The helptbx_helpupdate function takes as input argument a matrix of strings defining the functions to update, the help directory and the macros directory.

helpdir = get_absolute_file_path("update_help.sce");
funmat = [
  "number_carmichael"
  "number_coprime"
  "number_extendedeuclid"
  "number_factor"
  "number_fermat"
  ];
macrosdir = cwd +"../../macros";
demosdir = [];
modulename = "number";
helptbx_helpupdate ( funmat , helpdir , macrosdir , demosdir , modulename , %t );

In the case where we have to create the help pages of functions based on gateways (in the C language, for example), then the previous method cannot be used directly. In this case, a possible solution is to create pseudo-macros as .sci files, which only contain the comments necessary for the helptbx module.

For example, the "accsum" module is partly based on gateways. The help/en_US/pseudomacros directory

http://forge.scilab.org/index.php/p/accsum/source/tree/HEAD/help/en_US/pseudomacros

contains a set of .sci files which only contains comments: the body of the function is empty, since the actual implementation is done in the C gateways. These .sci files are used in the associated update_help.sce script, which automatically generates the .xml files.

More informations on this topic are available in the help of the helptbx module :

http://atoms.scilab.org/toolboxes/helptbx

Provide examples

In the help page of your module, we should consistently provide an example for each function. This example should work, whatever the context of the user may be, that is, we should not assume that some variable already exist or some function is defined.

Examples are self-contained

A typically wrong example is:

plot(x,sin(x))

This example works only if the variable x is already defined. Hence, if I execute this example, I get:

-->plot(x,sin(x))
       !--error 4 
Undefined variable: x

All the examples should be self-contained. A better example is:

x = linspace(-%pi,%pi,1000);
plot(x,sin(x))

Only valid statements

Another wrong example is:

We can plot the sine function with:
plot(linspace(-%pi,%pi,100),sin)

This example does not work, because "We can plot the sine function with:" is not a valid Scilab statement. We get the error:

-->We can plot the sine function with:
  !--error 4 
Undefined variable: We

Provide scripts, not sessions

Another typically wrong example is the following:

-->sin(3)
 ans  =
    0.1411200

which is the result of a session, but not an example.

A good example

A typically good example is:

x=linspace(-%pi,%pi,200);
scf();
plot(x,sin(x))

This example can be copied by the user and pasted into the console: it directly works.

All functions must have one (or more) example

All the functions in a module should be associated with an example, without exception.

Including output produces in a typical test case is a good idea, since it gives the user the typical result that should be produced, without even running the example. In all cases, these sessions should be separate from the example: both have a value.

Designing the API

In this section, we provide practical advices on the design of the functions of the module. This is commonly called the Application Programming Interface (A.P.I.)

We generally use our function in a specific application. But, when we provide our module as a public module, we have to think differently, so that the function can work in a different context, probably a context that we are not used to.

In this section, we presents common problems or errors when we design a module.

Optional input arguments

In this section, we analyze and compare several methods to manage optional input arguments.

This section requires that we know sufficiently how to write a flexible function. More details on this topic are presented in [1], in the section "Management of functions".

The most common methods to manage optional input arguments are:

using varargin,
using varargin and the empty matrix,
using the key=value syntax and the exists function.

One of the problems to solve is to be able to "skip" an input argument, that is, to set the optional argument #3, while using the default value of the optional argument #2. Hence, we must also compare the methods with respect to this criteria.

Here is a list of drawbacks and advantages for the previous methods.

Using the varargin variable is an interesting method and works very well in practice. But it does not allow to "skip" an argument.
The key=value syntax is a simple method to provide optional arguments to a function. It allows to "skip" an argument. Still, it has several drawbacks which make it a difficult programming method. These drawback lead to development problems and bugs. More details on this topic are presented in Why using key=value syntax to manage input arguments is not a good idea.
The method based on varargin and the empty matrix is a safe programming practice. It allows to "skip" and argument safely.

A correct order for optional input arguments

In some cases, the function we are designing have optional input arguments. If this case, an optional argument which is not provided by the user is replaced by its default value. In this section, we discuss the way to order the optional arguments, so that they best fit the user's need.

Assume that we have a function f with the following calling sequence:

y = f(x)
y = f(x,a)
y = f(x,a,b)

where x is a mandatory input argument, y is a mandatory output argument and a and b are optional input arguments.

The choice that we have to make is: should we choose :

y = f(x,a)
y = f(x,a,b)

y = f(x,b)
y = f(x,b,a)

In other words, how order the optional input arguments ?

In practice, the optional arguments are probably not all at the same rank of practical use. For example, the option b may be much more frequent to use than the option a. In this case, the option b should come before in the calling sequence of the optional arguments.

A correct order for optional output arguments

In this section, we discuss how to order the optional output arguments of a function.

Assume that we have a function f where x is a mandatory input argument, y is a mandatory output argument and z is an optional output argument.

The choice that we have to make is: should we choose the calling sequence

y = f(x)
[y,z] = f(x)

z = f(x)
[z,y] = f(x)

In other words, how order the optional output arguments ?

The problem is that, if the user wants to get only z, then the calling sequence [y,z] = f(x) forces to compute y, even if y is unnecessary.

In order to choose, we might think in terms of computational cost: which argument cost the most of CPU time ?

It might happen that the cost of y is much larger than the cost of z. The output argument z comes for "free". In this case, the calling sequence [y,z] = f(x) is fine.
Instead, if the cost of z is much lighter than the cost of y, choosing [y,z] = f(x) is a bad idea. Indeed, if we are in the particular situation where y is unnecessary, we must compute y. In this case, the calling sequence [z,y] = f(x) is a better choice.

We might also think in terms of practical uses. For example, it might happen that, if z is required, then y is also required. In this case, the calling sequence [y,z] = f(x) is a good choice.

For example, the derivative function, which computes numerical derivatives, has the calling sequence [J,H]=derivative(f,x), where f is a function, x is the current point, J is the Jacobian and H is the Hessian matrix. In practice, when we need the Hessian matrix, we also need the Jacobian matrix. Moreover, the cost of H is much larger than the cost of J. This is why the calling sequence [J,H]=derivative(f,x) is a good choice. On the other hand, in the cases where only H is required, then the derivative function requires to compute J, so that useless function evaluations are done. In this case, we should create two separate functions derivativeJacobian and derivativeHessian functions.

Argument checking

Functions should be designed so that they are robust against wrong uses. The warning and error functions are the basis of robust functions.

For example, a user may set an input argument as a string instead of a function. This might generate an unexpected error. Or the user switch two input arguments: this might not generate an error, but may produce a completely wrong result.

This is why we often check the input arguments of functions so that the error message generated to the user is as clear as possible. In general, we should consider the following checks:

number of input/output arguments,
type of input arguments,
size of input arguments,
content of input arguments.

More details on this topic are presented in [1], in the section "Robust functions".

Localization of messages

The gettext function manages the messages which are produced by Scilab in various languages. In general, we should write functions which can be localized. In order to make this localization easier, we should use the standard error messages which are presented at:

Localization in English - Standard messages

Indeed, if we use these standard messages, our function is automatically localized, without any work from the developper.

Outputs messages within functions

It might happen that we need to print a message or create a plot during the execution of the function. This is specifically the case for iterative methods, where the algorithms runs through a number of steps before reaching the "final" step, usually when some tolerance is met. Since the process can be long, we need to get some feeback during the process. In this case, executing the function may generate the following output.

-->y=myfunction(x0,itermax)
Iteration #1, y=1.0, x=12.0
Iteration #2, y=0.7, x=10.0
Iteration #3, y=0.5, x=8.0
Iteration #4, y=0.3, x=6.0
[...]

This is bad, because the user has to see the messages, whether we want it or not. Even worse are the functions which are creating forcing the creation of graphics (generally 2D plots, where the first axis is the iteration number). Even even worse are the functions which are displaying their licence, as in the following hypothetical session.

-->y=myfunction(x0,itermax)
   Copyright (C) 2011 - foo (foo@blabla.com)
   This function is provided under the FOO licence.
   Use at your own risks.
   Commercial use is permitted, within the limits of this licence.
   Please contact the support if needed.
   Please mention the authors of the function in any publication
   making use of this module.
[...]

Indeed, in most cases, we do not need these messages. Would we be happy if Scilab produced messages when we compute the sine function, as in the following hypothetical session:

-->y=sin(2)
I am here
I am there
I should go here
But I go there
[...]

By default, a computational function should stay quiet. To provide messages, we can provide verbose option, which is false by default, but can be enabled if necessary. The header of the function would then be

y=myfunction(x0,itermax)
y=myfunction(x0,itermax,verbose)

where the first calling sequence is associated with the default false value of the verbose variable.

Orthogonality between modules

Your particular module should contain the functions which make your module "special", and only these functions. This allows to reduce the development time, increase the testing, the quality, and the chances of having back-compatibility issues.

One common issue is to put all your functions into the same module, even if the functions are completely independent. This is the "all-in-one" pitfall: we have one single module, and we put everything we need into it. After all, if functions are completely independent, we should create two (or more) separate modules.

Assume, for example, that we design a module to model a car: this would be an "Automotive Toolbox". In this module, we should not include a general-purpose optimization algorithm, or statistical functions, or Finite Elements functions, for example. Indeed, even if the practical use of the module requires an optimization algorithm, this is not the purpose of the current module. Here is a short list of reasons.

Another module may provide the feature. If not, we can create another module to provide the optimization feature that we are needing. This allows to separate the dependencies of the project. It additionally allows other users to use and test the optimization algorithms that we created, without requiring them to use the whole "Automotive module".
As a module developer and maintainer, if users actually use this optimization algorithm, we will have to maintain this function thoughout the version. In the end, some other module, or even Scilab itself may provide the feature, so that all our work may be lost. Removing the feature may create back-compatibility issues.
Bugs, help and testing issues will have to be performed for this support function, even if this is not the goal of the module. Therefore, developing the module will take more time and will slow down the release process.
It may still happen that an algorithm is necessary, but not in the "core scope". In this case, we may provide it, but do not provide an associated help page. This particular function is then not part of the "public API", but still used in the "private API". This way, it allows to change the internal detail of the module without breaking the back-compatibility.

A check-list

The following table is a check-list that we may use to check that a function is ready for release.

Optional input arguments	Can we "skip" arguments safely ?
A correct order for optional input arguments	Is the order of input arguments correct ?
A correct order for optional output arguments	Is the order of output arguments correct ?
Argument checking	Are the argument checked for "number/type/size/content"
Localization of messages	Are the standards messages used ?
Outputs messages within functions	Can the messages be controlled ?
Orthogonality between modules	Are the functions complementary ?

Do not stay alone : communicate!

Most of the previous errors can be avoided by communicating sufficiently with the Scilab community. Indeed, being able to discuss these technical topics with other users is one of the many advantages of using Scilab.

There are many ways to communicate with other users and developers. Some of these ways are described at :

At any stage of the development of the toolbox, a user should communicate with the community. For example, we may not be aware of similar tools which are independently developed by other users.

In this context, one of the common steps are the following.

We begin by writing simple scripts, later gathered in a module.
Several months later, this collection of scripts begins a bigger module, finally ending as a public project.
Several months later, this becomes a public module.
We communicate on the "new module".
A user tells us that another module has (partly or completely) the same feature.

Hence, a simple mail such as : "We are starting to work on a module to compute this and that." can generate a lot of interesting connections with other projects and can save us a lot of time.

Conclusion

Readers who want to read more about the creation of Scilab modules might be interested by [2]. The following is the abstract of the document.

"In the first part, we focus on the use of external modules. We describe their general organization and how to install a module from ATOMS. Then we describe how to build a module from the sources. In the second part, we present the management of a toolbox, and the purpose of each directory. We emphasize the use of simple methods to automatically create the help pages and to manage the unit tests. Then we present the creation of interfaces, which allows to connect Scilab to a compiled C, C++ or Fortran library. We consider the example of a simple function in the C language and explore several ways to make this function available to Scilab. We consider a simple method based on exchanging data by file. We then present a method based on the call function. Finally, we present the classical, but more advanced, method to create a gateway and how to use the Scilab API. The two last sections focus on designing issues, such as managing the optional input or output arguments or designing examples."

Bibliography

[1] "Programming in Scilab", Michael Baudin, 2011, http://forge.scilab.org/index.php/p/docprogscilab/downloads/
[2] "Writing Scilab extensions", Michael Baudin, 2012, http://forge.scilab.org/index.php/p/docsciextensions/downloads/