[Contents] [TitleIndex] [WordIndex

Contributor - Binary patch (GSoC 2011)

Description

The purpose of this external module is to perform automatic updates for Scilab by querying a remote server for new versions of each of its installed modules.

Implementation

As a starting point, multiple binary patching algorithms are being selected for verification. Afterwards, they will each be used to generate patches between various versions of scilab and statistical data will be gathered.

This data will indicate exactly which algorithm has performed best in which situations (e.g. average of patch size, how often did it perform best/worst, statistics on specific file extensions (e.g. *.dll) etc.).

Currently, a few of the updaters that are being researched include bspatch, xdelta, patchapi, msdelta, update engine, exediff, courgette (a more detailed description of each is available in the links repo). Note that all of the algorithms that are being tested are GPL compatible.

When enough information is given by statistical analysis, we can proceed on choosing which of the algorithms will actually end up as part of the module (criteria for selection can include: how small the patches, which platform it supports, how well it's maintained, and less importantly, how fast it can apply patches and maybe some other factors too).

The updater module will be able to use multiple binary patching algorithms which means that the updater client will report to the server not only the version of each of its modules, but also the algorithms it supports.

The update server (dispatcher) is supposed to hold patches created using all of the algorithms that are implemented (different platforms might support different algorithms, but a single platform can have multiple algorithms too). Whenever a client claims to support multiple algorithms, the server will choose, for each binary file that needs to be patched, the smallest patch that will work on the client.

Here's an example:

Let's say that two binary files are out-of-date and need to be patched (file1 and file2), and there are 3 possible algorithms (1, 2, 3).

Here are the sizes of the patches:

file1: 1 -> 5KB; 2-> 3.6KB; 3->1.8KB;
file2: 1 -> 3.4KB; 2-> 5.6KB; 3->1.2KB;

Now, let us say that the client only supports algorithms 1 and 2. The server will therefore send the patch for file1 using algorithm 2 and the patch for file2 using algorithm 1.

This is because file1's patch is smaller using algorithm 2, whereas file2's patch si smaller using algorithm 1. The patches are smaller using algorithm 3, but the client doesn't support it (meaning that algorithm 3 is for a different platform), which means that the update server ignores that algorithm in finding the smallest patch for a file.

Solving dependencies

I'm currently writing some python scripts which generate dependency trees for scilab.

One of them will generate a graph which represents dependencies between (visual studio) projects. The other one will show dllimport dependencies (retrived via e.g. dumpbin /imports).

These graphs might help better understand scilab code, as well as see exactly what DLL-s need to be distributed and which don't.

However, the updater will likely not need to take care about dependencies in most cases, because the toolchain does so already. Whenever some project is recompiled, all projects that depend on that one are also recompiled. This will make the dependencies change their content too, thus marking them for update.

Report summaries

05/07/2011

I've written a proof-of-concept module that acts as an updater by connecting to a HTTP server via libcurl, reporting the available algorithms. If the server reports that updates are available, a .tar file is downloaded (compressed in transmit) and extracted to a temporary directory. The file updates.txt that was just uncompressed is parsed for files that need to be patched and using which algorithm. It then applies those patches.

I've also written a python script which takes as input two different versions of scilab and generates an update collection of files. It basically goes through all the files which:

  1. Are in one version of scilab but not in the other. This appends to updates.txt a line indicating that:
    1. a file needs to be deleted (if the file is existent in old but not in new)
    2. a file needs to be created (file in new but not in old). the contents of this file will be added as part of the update collection.
  2. Are in both versions but are different.
    1. The file is text - the diff is stored as part of the update collection, and a line is appended to updates.txt to indicate that the file needs to be patched using the given diff file.
    2. The file is binary - same above, except a binary diff (or more) is/are used.

For more details on how this is used, try using the script (link is below). In addition, I've developed a script to generate the dependency tree for scilab projects, as well as a PHP script that is meant to run on the webserver. For now, the PHP script randomly returns either up-to-date or tar-s the directory update/ and sends it to the client.

I'm currently working on writing a script that finds dllimport dependencies and creates a tree out of it. I'm also working on a script that creates a spreadsheet (for more details on what it will contain see links), as well as transforming the updater into an external module.

04/06/2011

More details about recent changes are available in my draft SEP at [6]. This and the next week are the finals, so it's a busy period for me. Vacation starts at 10/06/2011.

17/06/2011

See here

  1. GSoC proposal A copy of my GSoC proposal

  2. Updaters A document providing details about the binary patching algorithms that are being tested

  3. Dependency tree generators provides code and examples for dependency trees, as well as a doc on how to use the script

  4. Statistical analysis Script that will generate a spreadsheet with the size of the patch for each file using each algorithm. Various formulas can be used to gain insight as to which performs best and when

  5. Patch creator A script that will create a complete patch from two different versions of scilab

  6. Draft SEP An unfinalized version of my SEP

  7. Chromium-dev thread My thread on chromium-dev


2022-09-08 09:26