Statistical Visualization in Scilab
Scilab provides a few statistical visualization functions, including:
- princomp — Principal components analysis
- show_pca — Visualization of principal components analysis results
Several existing toolboxes provides statistical visualization features, including:
stixbox : http://forge.scilab.org/index.php/p/stixbox/, e.g. matrix of scatter plots, bubble chart, QQ-plot)
distfun : http://forge.scilab.org/index.php/p/distfun/, e.g. integer histogram
Nan-toolbox : http://forge.scilab.org/index.php/p/nan-toolbox/, e.g. parallel cordinate plot, QQ-plot, matrix of scatter plots
The problem is that
- most functions are not compatible with Matlab
- most functions have less features than Matlab
- most functions have less tests than required
- most functions have less help pages than required
- some functions are duplicated in several toolboxes : this spreads the development effort into several pieces, instead of focusing on a small set of high-quality functions
This conclusion was shared by several toolbox authors, including
- Michael Baudin, author of Distfun, Stixbox (contributor)
- Holger Nahrstaedt, author of Nan-Toolbox
- Torbjørn Pettersen, author of regtools
This leaded us to write our "Ideal" statistics module at :
The collection of statistical visualization functions that we have come to is defined below.
We think that this is a fun project for a GSOC student, and extremely useful for engineering and research purposes.
Here is a list of functions that we suggest to develop.
- statvis_identify : Identify points on a plot by clicking with the mouse (draft from Stixbox)
- statvis_plotsym : Plot with symbols (draft from Stixbox)
- statvis_qqnorm : Normal probability paper (draft from Stixbox)
- statvis_qqplot : Plot empirical quantile vs empirical quantile (draft from Stixbox, from Nan-Toolbox)
- statvis_boxplot : Draw a box-and-whiskers plot for data provided as column vectors (draft from Stixbox)
- statvis_cdfplot : plots empirical commulative distribution function (draft from Stixbox)
- statvis_normplot : Produce a normal probability plot for each column of X (draft from Stixbox)
statvis_plotmatrix : Scatter plot matrix - http://www.mathworks.fr/help/techdoc/ref/plotmatrix.html (draft from Stixbox, from Nan-Toolbox)
statvis_cdfplot : http://www.mathworks.fr/fr/help/stats/cdfplot.html. (draft = nan_cdfplot from Nan-Toolbox)
statvis_gscatter : http://www.mathworks.fr/fr/help/stats/gscatter.html (draft = nan_gscatter from Nan-Toolbox)
statvis_hist : http://www.mathworks.fr/fr/help/matlab/ref/hist.html (draft = histo from Stixbox, and nan_hist from Nan-Toolbox)
- statvis_bubblechart — Plot a bubble chart
- statvis_bubblematrix — Plot a bubble chart matrix
- statvis_inthisto : Discrete histogram (draft is distfun_inthisto in distfun)
These are some examples of statistical grahics.
The following is a bubble chart.
The following is a matrix of scatter plots.
The following is a matrix of QQ-plot.
A suggestion of roadmap
In this section, we gather a set of steps required to achieve this goal.
- Identify the existing functions in the various Scilab toolboxes.
- Identify the existing functions in Matlab.
- Clarify the required functions in the new "statviz" toolbox : see which functions are
- Set priorities to the functions :
- Create the "statviz" project on Scilab Forge.
- Create a draft of 6 high priority functions, with
- Matlab compatiblity,
- argument checking,
- unit tests,
- argument description.
- Create a tutorial help page in XML showing how to quick start with these functions.
- Create a XML help page with gallery of graphics.
- Release the v0.1 on ATOMS.
- Increase the set of functions from 6 to 12.
- Remove the duplicated function in Stixbox, Nan-Toolbox, Distfun and other toolboxes.