Last report for Binary patching

Since my last report, I've spent most of my time working on UpdaterThread class. This class is a singleton owning a thread (boost.thread), thread which periodically checks for updates, asks for confirmation to download from main thread, then downloads them if permission is given.

The sci_check_for_updates and sci_download_updates functions can now work on the current thread, but also by communicating with the updater thread. However, sci_check_for_updates and sci_download_updates are synchronous operations (they block, waiting for response).

In addition, I've added sci_start_periodical_updates_checker and sci_stop_periodical_updates_checker, which spawn and close the updater thread respectively.

Here's a pseudocode on how updater thread works:

START:

check for updates
if there are updates, go to HAS_UPDATES
otherwise, wait for 30 hours or until main thread asks us to check for updates again
go back to START

HAS_UPDATES:

if configured, announce main thread we're done checking for update. wait until main thread allows us to resume operation (only waiting for resume if configured to).
ask for confirmation to download updates (if configured to expect confirmation)
if not given, exit thread
download updates, write to the unapplied_updates file
if configured, report to main thread that we're done downloading updates
exit

You may see "if configured" repeated a lot above. Main thread is allowed to configure the way updater thread works in a few ways, so it can better manipulate it internally. This is also useful for user configuration, depending on whether a user wants to be asked before downloading updates or not.

I've also created sci_get_updates, which is more highlevel and works in a nonblocking way. It simply starts the updater thread (if it's not started already) and forces it to check for updates and download updates without asking for any confirmation.

The updater thread has built-in debugging capability, so if enabled, every step that the updater thread performs can be observed in a log file. This helped me track down bugs and deadlocks. I have used a lot of mutexes and condition variables, and that's a place where things can so easily go wrong :).

After finishing writing the updater thread, I've started working on cross-platformness. Some features were not implemented for POSIX, some were implemented but buggy, and some of the code wasn't even compiling on gcc.

However, after some headaches I've managed to get it working on Linux. I couldn't test it on OS X because I don't own a Mac (and I haven't had time to try to install a vm), but my assumption is that there will be no problems there. For non-Windows, I have only used POSIX functions, which are also available on OS X, so I doubt there will be any issues.

Relying on Boost libraries

Although I initially tried not to rely on Boost because I wanted to avoid the extra dependency for the toolbox, I realized after a while that I don't have much of a choice if I wanted to get things done in a reasonable amount of time.

First of all, Boost.Thread saved me a lot of trouble, writing a wrapper that works with both pthread and winapi's threads the same would have taken quite some time. This is the only boost.thread dependency which is not header-only (i.e. you need to link to a library).

Secondly, I've used Boost.Variant, which is a replacement for my IntOrString type ( capable of storing either an int or a string). I initially thought I could easily implement such a type myself, but, as it turns out, my code wasn't standard compilant ( the memory alignment of string was not known so I couldn't guarantee it). Therefore, I gave up and used this boost header-only library instead.

An example on how to manually test the code

Probably the best way of understanding how things work is by trying to see them at work, so I will come with a series of instructions for Linux on how to install everything needed for the code. We will try to update scilab-5.3.2 to scilab-5.3.3

Clone my git repo into ~/binary-patch:

git clone git@git.forge.scilab.org:binary-patch.git

Extract scilab-5.3.2 (64bit) into ~/scilab-5.3.2.

Extract scilab-5.3.3 (64bit) into ~/scilab-5.3.3.

Install bsdiff (I have taken the version from mozilla firefox, which is also available in the repo now). After installing it, make sure you go to ~/binary-patch/server/updaters/init.py and configure the path to bsdiff. Also, make sure you have a fairly recent version of python installed (both 2.6 and 2.7 worked fine for me).

Create a patch between scilab-5.3.2 and scilab-5.3.3:

python ~/binary-patch/server/_patch_creator.py -f ~/scilab-5.3.2/ \
-t ~/scilab-5.3.3/ -o ~/my_first_patch \
-a bsdiff -P l64 -w

The flags have the following meanings:

from source scilab given by -f
to dest scilab given by -t
outputs the patch into directory given by -o
uses algorithm given by -a
the platform is given by -P (see --help for possible values)
do actual work rather than just preview, via -w (you can try it without -w first to see how it looks like in preview mode)

Now, the first time you will try this command, it will fail. The error message will tell you that a certain module has changes from version scilab 5.3.2 to scilab version 5.3.3. Normally, when a module does not have a version.xml file, the module is considered to have the same version as scilab. But when a module does have a version file, the version is taken from that xml.

It just happens that some modules have version 1.0.0 in both scilab 5.3.2 and scilab 5.3.3. In other words, the modules were changed, but their version numbers were not updated. In order to easily fix this, do:

rm -f ~/scilab-5.3.3/share/scilab/modules/modulename/version.xml

Replacing modulename with the module you're having problems with. What this does is it removes the version.xml file, which means that the module in 5.3.3 will have the same version as scilab, which is newer than 1.0.0. So that gets rid of the error.

Now that you've fixed this problem, you have to tell the patch creator to resume its work. You can do this using the -c (continue) flag. You MUST keep all the flags that you previously used. So the command you have to issue is:

python ~/binary-patch/server/_patch_creator.py -f ~/scilab-5.3.2/ \
-t ~/scilab-5.3.3/ -o ~/my_first_patch \
-a bsdiff -P l64 -w -c

You may have to repeat the procedure described above a few times, because there are more modules which have changed but have the same version.

After you're done with all that, _patch_creator.py should have finished creating a patch in your ~/my_first_patch.

In order to get a working server, you now have to install a mysql server. After installing it, create the database for the updater (called scilab_updater):

mysql -h localhost -u <user> -p < ~/binary-patch/server/database.sql

You can now go ahead and configure the server to work with your mysql server by modifying ~/binary-patch/server/docroot/serverconfig.py.

Once you've done that, you have to insert the patch that you've just created into the database:

python ~/binary-patch/server/patch_into_database.py -p ~/my_first_patch/

Now, you have to set up a webserver that can serve requests from clients. Install apache and mod_wsgi (mod_wsgi is for using python for server side scripting). Enable mod_userdir and configure apache by doing

WSGIScriptAlias /~youruser /home/youruser/public_html/handler.py

Create a public_html folder in your home as a link:

ln -s ~/binary-patch/server/docroot ~/public_html

At this point, you should have a webserver set up, capable of serving an update from 5.3.2 to 5.3.3.

You can now go to ~/binary-patch/client-toolbox and grep the .cpp files for occurrences of "http://127.0.0.1". You will find two of them, one "http://127.0.0.1/check_for_updates" and one "http://127.0.0.1/download_updates".

You will have to replace those with "http://127.0.0.1/~youruser/check_for_updates" and "http://127.0.0.1/~youruser/download_updates" respectively.

Once you've done that, you can go ahead and try to compile the toolbox. Please note that you cannot use scilab-5.3.2 to compile the toolbox because it lacks some files.

You will have to get scilab's source code from scilab's git (you surely already have it). What I did is I first checked out tag origin/5.3.3 and then compiled scilab. After you compile scilab (or whatever way you have to get a scilab that has all the files necessary for the toolbox to compile), you can go to ~/binary-patch/client-toolbox/updater/config.sce.

Configure those few settings correctly. If you don't have boost, you will have to download and build it too (or get a package if you want). Configure the path to boost in config.sce, the platform and the compiler, and then start scilab-git.

After starting it, execute:

exec('~/binary-patch/updater/builder.sce')

Given that you've done everything correctly and I haven't done anything wrong explaining here, you should be able to compile the toolbox at this point.

Once it has compiled, EXIT scilab-git, start the old ~/scilab-5.3.2 and execute:

exec('~/binary-patch/updater/loader.sce')

If you're lucky, everything worked fine and you can now use the toolbox. If you now type check_for_updates, you should get a list of all the modules that will be updated and the version from and version to.

Afterward, you can download_updates, which will connect to the server and extract the update to a temporary directory, and write that directory down to the unapplied_updates file.

Applying the updates

Although the program which applies the updates is available, it is not currently started automatically by scilab. What sci_apply_downloaded_updates does is it marks the updates in unapplied_updates as ready to be applied, with a flag.

That flag should be checked on scilab startup, and if it's turned on, then it should stop scilab and start scilab_update_applier instead. Except it currently doesn't check for that flag (I haven't had time to make that happen).

If you would like to see this program apply some updates, go to ~/binary-patch/client-toolbox/src/patch_applier_app and type:

make BOOST_DIR=path_to_boost_dir/

Final details on how the updater works

I won't get into a lot of detail here. I have explained the internal functionality of the server and the client throughout my reports, and repeating myself over and over could get annoying. You can find more information in my midterm report and my SEP. However, my SEP was written in the very beginning, and many things have changed since them (but many of the ideas are the same). I am planning on updating my SEP soon. Also, please note that the roadmap written in the midterm page has not been updated since midterm. That said, here's how things work (in short):

The toolbox (client side)

The toolbox periodically connects to the updater server using a separate thread and checks for updates. If any updates are available, it downloads them and extracts them to a temporary folder, which is then written to the unapplied_updates file. The separate thread only starts and agrees to work only when there aren't any unapplied updates reported in the unapplied_updates file.

The toolbox has a list of installed patching algorithms (such as bsdiff, text diff, send whole file, delete file etc.), which are reported to the server whenever it downloads updates. The server then sends a patch which only uses the patching algorithms available to the user.

The patch applier (client side)

This application should normally be started by scilab. What it does is it takes the entry in unapplied_updates file and it patches every file from the temporary directory of the patch (rules for applying are described in a special file called updates.txt, which is part of the directory). Resuming applying is supported by using a store.txt file to remember exactly how much of the patch was applied. If the application closes unexpectedly, it can resume from where it stopped.

The server

The /check_for_updates script expects a list of modules and their version from the client, as well as the version of scilab. It then reports back which modules need to be updated and to which versions.

The /download_updates script expects a list of modules, their old and new version (expected). It sends back a .tar.bz2 file containing the patches to be applied and an updates.txt file describing how to apply each.

Documentation

In the first version of binary-patch, the user doesn't really have to know much, so not a lot of documentation is needed. Binary-patch should come precompiled (because I doubt it will be possible to compile it using only the toolbox tools). I have written some user documentation for my sci_* functions, but it's pretty old and I would like to rewrite that before posting it.

As for technical documentation, most of it is in the source-code. I've tried to write the source-code as clean as possible (long identifier names, not more than 80 chars per line). However, it could still use more comments around some functions, and I'm willing to do that too.

Is this ready to be released? 0.1 or 1.0

Although I've managed to do an, IMO, decent amount of work, I cannot claim that binary-patch is ready to be released yet. Below is a list of ideas and features that I would like implemented before the first release.

Planning to continue

I am expecting feedback from the developers. I have enjoyed working this project and it has really been challenging in trying to understand the internals of such an immense project written over so many years. In fact, I think the hardest part of the project was getting used to the large codebase and the environment. I am willing to continue working on this toolbox until it is ready to be released (and maintain it afterward). Here are a few things that should be in binary-patch but aren't:

Check for the flag in unapplied_updates automatically on scilab startup, so updates can be applied without having to call scilab_updates_applier manually. Scilab should refuse to start in this situation. [easy]
Interaction with the graphical interface: Unfortunately, I did not have the time to make this happen.
- I would like the user to be asked for confirmation when downloading updates, as well as being able to configure whether he wants to be asked or not. I would also like the user to be asked for confirmation before an update is applied (if configured so, of course). The updater thread is already capable of doing such things, but I didn't find a way to make it interact with the GUI thread. [difficult]
Support download resuming. Updates can be large in size, and scilab should be able to resume a download in case of failure. [easy to medium]
Better caching and optimization for the web server. Creating a big .tar.bz2 takes quite some time. [easy]
Although I have worked on statistical comparisons, I haven't had a chance to put those results to good use (I'm still using bsdiff mostly). Attach courgette and patchapi to the patching algorithms manager. [easy]
A quite subtle and unlikely scenario: If the same user has the same version of scilab installed more than once, then it will have the same appdata (~/.Scilab/version, on Linux), and therefore the same unapplied_updates. This can cause problems when an update is applied to one of them but not to the other. Still, this is not a priority. Such a scenario probably causes bigger problems than this.

My priorities for the first release of this project is the first four of the things listed above. For the second release of the project, we should allow the user to pick which modules he wants updated.

The part that took me longest was the threads part, and the getting used with working in scilab's environment. I don't think my subject would normally be that difficult, but making automatic updates for scilab certainly is not as easy as doing it for a simple, small application.

I realize that I've delivered somewhat less work than I've initially expected, but I think I did good for the given amount of time: getting used to scilab, writing scripts for generating statistics on various patching algorithms, writing scripts for generating dependency graphs (graphviz) between visual studio projects and between dll files, writing the php updater server (which I dropped), writing the python scripts for creating updates and for putting them into the database, designing the database, writing the updater server in python, writing the cross-platform updater toolbox and the update applier application.