1. GSoC 2017 - Machine Learning Toolbox in Scilab

Contents

GSoC 2017 - Machine Learning Toolbox in Scilab

1.1. Project Link

Machine Learning Toolbox in Scilab

1.2. Student and Mentors

Student Name -

Mandar Deshpande

Mentors -

Yann Debray
Philippe Saadé
Dhruv Khattar
Caio Souza

1.3. Introduction

This project aims to develop machine learning features in SCILAB, which will be available to the end-user as a toolbox or direct function calls. The project has been divided into two sections

all major machine learning algorithms will be implemented through Scilab code.
an integration approach will be followed to adapt popular ml library Tensorflow for SCILAB, using PIMS

ml libraries to be adapted from python to Scilab-

Scikit-learn

1.4. Community Bonding Period

(5th to 30th May, 2017)

Tasks	Description	Status
Getting to know my mentors and finalizing a rough starting point for the project	I was asked to revise all major machine learning models/ algorithms and get clarity about their mathematical modeling	Done
Getting in-depth knowledge of scikit-learn library	I went through the documentation and tutorials on the scikit-learn portal. Got hands-on experience working with the various modules in sklearn and trying them on Kaggle datasets	Done
Get acquainted with deep learning libraries like tensorflow and keras	Since neural network module already exists in scilab, it is current need to have a deep learning module/library implementation through Scilab. I understood the working and modeling of neural networks using tensorflow and keras library for python	Done
Scilab syntaxes and toolboxes	Here I was required to make myself comfortable in using Scilab for the entire development period. Getting to know how and where Scilab is different from MATLAB(which I am used to)	Done

1.5. Coding Period Begins

(30th May 2017)

The first task assigned to me was to study already existing toolboxes for machine learning in Scilab,so as to get a clear understanding of what exists and what needs to be developed.

Here are some of the modules which are present in atoms right now :

Artificial neural network toolbox(https://atoms.scilab.org/toolboxes/ANN_Toolbox)
Neural Network Module(https://atoms.scilab.org/toolboxes/neuralnetwork/2.0) - This is a Scilab Neural Network Module which covers supervised and unsupervised training algorithms
Regression tools(https://atoms.scilab.org/toolboxes/regtools/0.42) - A toolbox for linear and non linear regression analysis
NaN-toolbox(https://atoms.scilab.org/toolboxes/nan/1.3.4) - A statistics and machine learning toolbox for naive bayes classifier , k-means clustering
libsvm and liblinear(https://atoms.scilab.org/toolboxes/libsvm) -Libraries for SVM and large-scale linear classification
Stixbox(https://atoms.scilab.org/toolboxes/stixbox)- For logistic Regression
Clustering Toolbox (https://atoms.scilab.org/toolboxes/CLUSTER/3.2)
Fuzzy Logic (https://atoms.scilab.org/toolboxes/sciFLT/0.4.7)

1.6. Week 1-2 Report

(31st May 2017 - 14th June 2017)

Since there was a lack of documentation for usage of these toolboxes, I followed the working of the neural networks module through material provided by Tritytech. I studied the following 2 courses to get started with Scilab tooboxes
1. Neural network Module
2. Artificial Intelligence
Once I was done, possible integration approach for using PIMS- python integration mechanism in Scilab was selected as the way to follow. I worked with PIMS for a week, getting to know its syntaxes and finally making it import the scikit-learn library in Scilab.
Major time was spent trying to work with PIMS and scikit-learn to implement basic ml models in Scilab. I was successfully able to port linear regression for Scilab.
Link to the source code is here
After discussions with Philippe Saade and Yann Debray, it was decided to take up a more complex ml problem implementation, to test for any issues faced while using PIMS.
Tried to port this example through PIMS in Scilab.
All the issues which I faced have been documented in this document file.
On 14th June, 2017 had a detailed discussion with Simon Marchetto regarding possible methods to resolve the above reported issues. More discussions regarding the final approach to be followed to be held tomorrow.

1.7. Week 3-4 Report

(15th June 2017 - 30th June 2017)

Due to the issues faced in using PIMS,and since usage of PIMS would involve wrapping as many python-based functions as possible to allow individual usage in Scilab (which would be impractical to maintain on a large scale), it was decided to follow a different approach : Jupyter server-client.
This approach would allow direct usage of native python scripts without wrapping it for Scilab conversion, just the returned objects need to be converted (Eg. numpy arrays)

Following are the important points of this approach:

Its essential to know that, we are not planning to have an interactive python environment within Scilab, as it would be unfeasible to manage so many libraries and versions in an efficient manner
Python scripts will be written outside Scilab interface, and would be called to work only when their outputs or trained machine learning models need to be used in Scilab context.
This would involve the following steps
1. Writing the required machine learning script and saving it as a '.py' file.
2. Sending the ml script created to the python kernel running on jupyter server.
3. Once execution completes, passing back the python objects like regression model to Scilab and converting it to Scilab context.
4. This converted object can then be used for solving any required operation through Scilab like a differential equation.
Two major parts of this approach are :
- [1] Passing the python script file to the jupyter server, and/or letting the python kernel to know where this file exists [2] Conversion of python objects to Scilab compatible form

Part [1] involves, passing the path of the script file to the jupyter server so that the python kernel can execute it. This can be achieved through python code for transferring/copying the script file to python kernel path. Even if we decide to follow the PIMS approach or continue working on this jupyter server method, we would be required to handle part [2].

An illustration of Jupyter-Client Approach for machine learning in Scilab

With detailed discussions with Philippe Saade, it was decided first to have the coefficient arrays from the trained model to be transferred to Scilab side as matrices. Once converted, the coefficient matrix can be used for predicting test values or solving other Scilab operations like differential equation.
I took a simple linear regression example, trained the model on both Scilab side (using PIMS) and on python side.
Used matplotlib to plot prediction vs input data on python side; and used the coefficient matrix on Scilab side to make a similar plot, using a user-defined 'predict' method for linear_model.
I compared the two plots on both python and scikit-learn side, and they exactly the same. Find the attached images below:

Comparison of prediction results on Python and Scilab side

There are issues related to network port declaration currently with the Jupyter_client script. Working along Simon Marchetto too resolve them.

1.8. Week 5-6 Report

(1st July 2017 - 15th July 2017)

Daily Reports can be found here

For covering examples which can be demonstrated through the jupyter approach, I have investigated the source code of the following models:
1. Linear Regression
2. Ridge Regression
3. Kernel Ridge Regression
4. SVR
5. KMeans Clustering
Also have written the PIMS implementation of the model's examples, along with the original python script. These have been committed on the forge at this link.
All these model attributes have been retrieved from the Jupyter server in the python implementation. For the transfer of learned model attributes from jupyter server to the local machine, I have considered using 'pickle' library to store the numpy objects in a '.p' file.Here are few rough steps to achieve this:
1. After training is complete and attributes.p file is saved on remote server.
2. The pickle file will be transferred to the local machine at a location specified using the IP and working directory of the local machine.
3. Here the python_local.py will extract the attributes from the pickle file
4. and send these attributes to Scilab.
So pickle will be able to handle steps 1 and 3 mentioned above.
For now I have read the pickle usage documentation for saving and loading python objects in different python instance. And was able to demonstrate this for ML model attributes through the kernel_ridge regression example. I have committed the sample code here.
For step 2, we need to use a ssh protocol for sending the pickle file back to the local machine. For this I was considering usage of a python package like paramiko or fabric for it.
In the discussion with Philippe Saade about these protocols, "sshfs" was suggested if both client and server machines are linux based. But in my case I have a Windows and a Linux machine , so it would be better to use Samba on the server and allow folder sharing, as Philippe suggested. Also this would be more suitable for the general usage of Scilab by Windows and Linux users alike.
I will reading more about them , along with covering more scikit-learn classes for the jupyter approach.

1.9. 16th July - 21st August 2017

Daily Reports can be found here

1.10. Source Code

Link

The scilab forge is currently down, so I have created a copy of my repo on Github : https://github.com/mandroid6/machine-learning-Toolbox-SCILAB/blob/master/Final%20Submission/

1.11. Ideas/Direction To Work in Future

In the present stage, machine learning can be easily implemented through SCILAB using python libraries as supported by this toolbox. But still the steps involved may prove a bit elaborate for any non-python programmer. The current version of this machine learning toolbox is an early stage, and would require several iterations before it can be finalized for the end user. Right now, the user can run python scripts on a remote server as had been planned earlier, but this approach doesn't support multiple users working on the same server at the same time. This is due to a common workspace offered by the IPython kernel on the remote machine.

These are few ideas which if implemented later would significantly improve ML in Scilab:

Making a JupyterHub implementation within SCILAB, but offering a non-interactive execution.
Automating transfer of connection_file corresponding to the kernel running on the remote server.
Inclusion of support for Python3, which would open doors for many other ML libraries unavailable through Python 2.7
Making Scilab open to inclusion of famous ML libs like Tensorflow( currently it supports executoion on server side, but learned models cannot be used on local Scilab Machine)
Trying out Jupyter kernels to run "R programming language" scripts, since it is highly used in the field of Data Science.
Support for inline visualizations within Scilab using libs such as matplotlib, seaborn(python) and ggplot2(R)