12.07 report - midterm evaluations -
In this past two weeks (almost), I've worked for longer periods of time and dedicated more time to GSoC, because I'm looking to get a few days off for vacation.
Here's a list of things that I've achieved lately:
- I have rewritten the patch creator script, as will be described below.
- I have given up on PHP for the updater server and instead moved to Python too (using Apache's mod_wsgi module). PHP did not fulfill my needs.
- I have written the /check_for_updates and /download_updates pages for the web server, details are below.
- I have fixed more bugs in the toolbox for check_for_updates, made some code design changes. I have also written the code for download_updates. More details about how everything works are below.
Components
I have moved from the notion of module to the notion of component (in the updater). This allows to more easily divide Scilab into parts that need to be updated. A component is described by its name, its version and its path (or list of paths) relative to the SCI directory.
Currently, we have a component for each module, example:
core -> modules/core/
output_stream -> modules/output_stream/*
- ...
In order to cover all of Scilab, we also have a component called scilab, which contains all files that are not in any other component. This currently means that it contains all files that are not part of any module (such as bin/*).
The version of the scilab component is that of Scilab and the version of any other component is the one from version.xml. If version.xml doesn't exist or can't be parsed, then the version of that component is that of scilab.
When patching between SCI_OLD and SCI_NEW, and a component does not exist in SCI_OLD, but exists in SCI_NEW, then the version of that component is 0.0.0.0 in SCI_OLD. This is not true if the version exists in SCI_OLD but doesn't exist in SCI_NEW.
How do things work
The design of the database of the updater server
Currently, the database uses the following tables:
versions - Keeps track of possible version numbers
components - Keeps track of the existent components, their name and a colon-separated list of the paths
platforms - A list of platforms, their short and long names
algorithms - A list of binary diff-ing algorithms and their names
updates - This table has a list of updates, where each row in the table means: There's an update in component X from version A to B, on platform P
patches - This table lists, for each update, the files that need to be patched (In update X [entry from table updates], file Y [relative to SCI/] needs to be patched)
concrete_patches - This table lists, for each entry from table patches the algorithms that are available, the location on the server's hard-disk for each of the algorithms and the size of the patch.
The /check_for_updates webpage
This is the webpage that the scilab updater toolbox connects to when it wants to see what updates are available.
This webpage takes the following information as input (from the scilab client):
- The complete list of components that the client has, as well as the versions
- The platform that the client is on
In response, this webpage provides a list of all the components that have updates and their newest updatable version.
This is done by taking each component, and looking for updates for the user's platform. The updates are "chained" together, so if there's an update from 1.0.0.1 to 1.0.0.2 and an update from 1.0.0.2 to 1.0.3.1 then the webpage reports the new version to be 1.0.3.1, even if the user's version is 1.0.0.1.
The /download_updates webpage
This is the webpage that the scilab updater toolbox connects to when it wants to download updates for certain components.
This webpage takes the following information as input (from the scilab client):
- A list of triplets of the form: (component, old version of component, new version of component). This is the exact output of the /check_for_updates webpage
- A list of algorithms that the client has available (for binary diff-ing)
- The platform of the client
The webpage outputs a .tar file corresponding to the patch.
Similarly, the download_updates webpage takes each component that the client requested to be updated and picks all the patches from the database (the concrete_patches table) which are for his platform and support one of his algorithms (it picks the one with minimum size for each patch).
The webpage then writes all the patches into a updates.txt file, in the format algorithm_id:filename:patch_filename_on_the_archive. The filename is relative to the SCI folder, example: 4:modules/cacsd/macros/bloc2ss.sci:bloc2ss-9414.sci.
All the patch files, including the updates.txt are now TAR-ed on-the-fly (in memory) and then streamed to the client.
How to create an update between two versions of Scilab
In order to create an update from one version to scilab to another, you will have to use the _patch_creator.py script. First of all, this script stores an internal list of all the components and their paths relative to the SCI/ directory.
Whenever you run _patch_creator.py, it will go through the list of components, and find differences in the tree structure for each component (this means new files, deleted files or modified text/binary files). Whenever this tree diff is performed, a few other factors are taken into consideration:
- the files in SCI/.updateignore are not being tracked [global ignores]
- this files in SCI/path_to_current_component/.updateignore are not being tracked (local ignores)
By default, _patch_creator.py does not create patches, but it merely prints a list of what changes were found and in which component. This allows you to do a few things before actually creating the update, such as selecting only a few components for the update.
It also allows viewing which files will be updated but shouldn't be (such as a text file that should only exist locally). Those can be removed by appending to the right .updateignore file.
Moreover, since patch_creator.py relies on version numbers (for modules it's version.xml, for scilab it's the version of scilab), patch_creator.py will warn you if a component has changes but the version number does not differ.
After all warnings have been fixed, _patch_creator.py -w will actually create the patch between two given scilab directories and into a given output directory (default settings can be made in patch_creator_config.py).
If everything has been successful, then the output directory now contains all the patch files, including a text file, called track.txt contining information about all the patches, what component they belong to, between what version the patch is made etc.
That output directory can now be used as input for the next Python script, called patch_into_database.py. This script reads the track.txt file and fills the database (the one described above) with all the updates.
As you can see, creating a patch for all platforms will probably take a long time to do manually, which is why I'm considering somehow using Scilab's buildbot to also build updates for all platforms.
Things that need to be done
- /download_updates doesn't do any caching on the database now,
- so the server will get a high load when a lot of people will have to download a new update.
- Moreover, the Python script for /download_updates currently
- generates ALL of the .tar file before sending it, which means that it requires a lot of memory. Generating a big .tar file takes a long time, and it shouldn't be done more than once for the same update.
- The check_for_updates/download_updates scilab commands don't work on a
- separate thread/process now, which is problematic when wanting to perform automatic updates in the background. These commands are also not embedded in the GUI yet. It would help if the user could have a check for updates button.
- The /download_updates webpage does not perform dependency check now, so scilab
- won't get properly updated if the client requires to update only some modules without updating its dependencies.
- I'm developing on Windows, although I'm not using any Windows-specific
- features. I need to make the toolbox compile and work fine on linux afterward too.
- Also, there is currently no code to check if a Scilab binary is the official
- build (although we could fix that easily by compiling the official builds with a special define -DSCILAB_IS_OFFICIAL_BUILD compile flag).
- sci_download_updates is written although not well tested, and sci_apply_updates
- is not written yet (parts of it are written but they are not put together).
Roadmap
Although most of the roadmap is descried above, here's a more graphical view of the stuff.
Roadmap |
|||
Task |
Status |
Observations |
|
Gather information about available binary patching algorithms |
Done |
|
|
Create the updater toolbox skeleton |
Done |
|
|
Create scripts that generate patches using all of the tested binary patching algorithms |
Done |
|
|
Create a script that tests the algorithms between various versions and outputs to .csv |
Done |
|
|
Based on the generated statistics, pick the appropriate algorithm for each platform |
Still deciding |
|
|
Write libcurl easy-to-use C++ wrapper |
Done |
|
|
Design the database for the updater server |
Done |
|
|
Write a script that can put updates into the database |
Done |
|
|
Write a script that generates dependencies between Scilab modules and scilab .dll-s |
Done |
|
|
Write the /check_for_updates webpage |
Almost done |
Do some caching |
|
Write the /download_updates webpage |
In progress |
Technically it works, but it creates the .tar file in memory every time and Python often runs out of memory before it finishes creating the .tar file. Need to add caching and tweak the memory. Also, it doesn't do dependency checking. |
|
Caching |
Not done |
|
|
Write sci_check_for_updates |
Almost done |
Doesn't yet detect platform correctly, doesn't use threads, Not tested on Linux/Mac OS X |
|
Write sci_download_updates |
In progress |
Code is written but not tested, doesn't use threads, not tested on Linux/Mac OS X. Also, doesn't allow to only update some modules yet (though server allows it). |
|
Write sci_apply_downloaded_updates |
In progress |
Pieces of code exist, but are not put together. Same problems as above |
|
Periodical automatic updater and interprocess communication between scilab and automatic updater |
Not done |
|
|
Interaction with the graphical interface of scilab |
Not done |
|
|
Make the updater script interact with scilab's buildbot |
Not done |
|
|
Fix some cross-DLL bugs and more testing |
Not done |
|
|
Tweak it so it works on Linux and Mac OS X |
Not done |
|