# Statistical Visualization in Scilab

## Introduction

Scilab provides a few statistical visualization functions, including:

- princomp — Principal components analysis
- show_pca — Visualization of principal components analysis results

Several existing toolboxes provides statistical visualization features, including:

stixbox : http://forge.scilab.org/index.php/p/stixbox/, e.g. matrix of scatter plots, bubble chart, QQ-plot)

distfun : http://forge.scilab.org/index.php/p/distfun/, e.g. integer histogram

Nan-toolbox : http://forge.scilab.org/index.php/p/nan-toolbox/, e.g. parallel cordinate plot, QQ-plot, matrix of scatter plots

The problem is that

- most functions are not compatible with Matlab
- most functions have less features than Matlab
- most functions have less tests than required
- most functions have less help pages than required
- some functions are duplicated in several toolboxes : this spreads the development effort into several pieces, instead of focusing on a small set of high-quality functions

This conclusion was shared by several toolbox authors, including

- Michael Baudin, author of Distfun, Stixbox (contributor)
- Holger Nahrstaedt, author of Nan-Toolbox
- Torbjørn Pettersen, author of regtools

This leaded us to write our "Ideal" statistics module at :

http://wiki.scilab.org/The-Ideal-Statistics-Module

The collection of statistical visualization functions that we have come to is defined below.

## Proposal

We think that this is a *fun* project for a GSOC student, and extremely useful for engineering and research purposes.

Here is a list of functions that we suggest to develop.

- statvis_identify : Identify points on a plot by clicking with the mouse (draft from Stixbox)
- statvis_plotsym : Plot with symbols (draft from Stixbox)
- statvis_qqnorm : Normal probability paper (draft from Stixbox)
- statvis_qqplot : Plot empirical quantile vs empirical quantile (draft from Stixbox, from Nan-Toolbox)
- statvis_boxplot : Draw a box-and-whiskers plot for data provided as column vectors (draft from Stixbox)
- statvis_cdfplot : plots empirical commulative distribution function (draft from Stixbox)
- statvis_normplot : Produce a normal probability plot for each column of X (draft from Stixbox)
statvis_plotmatrix : Scatter plot matrix - http://www.mathworks.fr/help/techdoc/ref/plotmatrix.html (draft from Stixbox, from Nan-Toolbox)

statvis_cdfplot : http://www.mathworks.fr/fr/help/stats/cdfplot.html. (draft = nan_cdfplot from Nan-Toolbox)

statvis_gscatter : http://www.mathworks.fr/fr/help/stats/gscatter.html (draft = nan_gscatter from Nan-Toolbox)

- statvis_boxplot
- statvis_normplot
- statvis_andrewsplot
statvis_hist : http://www.mathworks.fr/fr/help/matlab/ref/hist.html (draft = histo from Stixbox, and nan_hist from Nan-Toolbox)

- statvis_ecdfhist
- statvis_fscatter3
- statvis_gplotmatrix
- statvis_parallelcoords
- statvis_errorb
- statvis_errorbar
- statvis_nhist
- statvis_bubblechart — Plot a bubble chart
- statvis_bubblematrix — Plot a bubble chart matrix
- statvis_inthisto : Discrete histogram (draft is distfun_inthisto in distfun)

## Examples

These are some examples of statistical grahics.

The following is a bubble chart.

The following is a matrix of scatter plots.

The following is a matrix of QQ-plot.

## A suggestion of roadmap

In this section, we gather a set of steps required to achieve this goal.

- Identify the existing functions in the various Scilab toolboxes.
- Identify the existing functions in Matlab.
- Clarify the required functions in the new "statviz" toolbox : see which functions are
- required,
- optionnal,
- unnecessary.

- Set priorities to the functions :
- high,
- medium,
- low.

- Create the "statviz" project on Scilab Forge.
- Create a draft of 6 high priority functions, with
- Matlab compatiblity,
- argument checking,
- unit tests,
- examples,
- argument description.

- Create a tutorial help page in XML showing how to quick start with these functions.
- Create a XML help page with gallery of graphics.
- Release the v0.1 on ATOMS.
- Increase the set of functions from 6 to 12.
- Remove the duplicated function in Stixbox, Nan-Toolbox, Distfun and other toolboxes.