Scilab function variables: representation, manipulation.
Functions in Scilab are stored as variables in the stack. Apparently, functions are generated from parsing scilab textual code by a process called "compilation", which seems rather to be a translation in a condensed, tokenized internal representation (called improperly pseudocode).
There is no published description, neither of the parsing of the function text which produces the function pseudocode nor of the storage conventions and implications. It is not clear to me whether this reflects an unstated intention of keeping the innermost proprietary details of scilab deliberately cryptic, or is just a result of the development history of Scilab.
Nevertheless, a better understanding of the works of the parser, and its way of storing and perusing code data, would be beneficial for any attempt of designing or improving modern Scilab code tools - like a lexer, a profiler, a debugger, a cross-compiler, a code differentiator, and so on.
What follows are a few personal deductions, for reference.
There are two sorts of functions: compiled (type 13) and compiled with provisions for profiling (type 13 as well).
The basis for the storage seems to be a header, detailing function type, input and output arguments and size, and a function body stored as pseudocode (type 13). Some of this information can be conjectured from help save_format, as it is likely that, for economy, the binary scilab files are essentially a dump of the stack structures. Historically, once upon a time function variables were stripped of the comment text in the function definition; now comments are preserved and stored along.
In all cases, the function body seems to be organized in elementary chunks corresponding to individual code lines. Both breakpointing and profiling operate with such a granularity.
Functions compiled for profiling apparently differ from those "just compiled", in that two extra words are added per function line (this is roughly deducible from the function size reported by who). I figure out that one of such words is for storing a cumulative call count, while the second for storing the cumulative time spent. Besides that, profilable functions are compatible with breakpointing (the time spent waiting for user input at the breakpoint is even *not* cumulated in the timing, correctly); the impact of making a function profilable on the performance seems at all negligible.
There are few Scilab functions providing some degree of access to the function types, and the possibility to manipulate them. These are:
Interfacing with files:
exec: within its general purpose of loading, parsing and executing scilab code, it can be seen as a function which gets function text from a file, and stores it as plain compiled function (type 13, no option of profiling). This is the most robust function (less odd bugs reported)
getd: is based on calling exec in a loop for all files of a given directory.
genlib: also loops on all *.sci files in a directory, in order to ceate a library object.
save: dumps stack objects in a binary file, probably without any data translation. It is thus a last resort way to access the bytecode function data.
Operating on functions, or producing functions:
deff: stores function text defined in a string or in a string vector into a function variable, of any of the three sorts. It was once thought to be an "unconvenient but shorthanded way of defining online functions". Unconvenient because typing strings (with escaped double quotes, etc.) at the command prompt looked tedious; however, considering that this function provides a very direct string→function conversion, the perspective is different. In the past it also seemed that functions defined by deff were of a sort incompatible with breakpointing (bug 884, since then evaporated). A minimal limitation is that the function text needs to be stripped of the keywords function - endfunction in order to be fed to deff, but this can be easily worked out by string operations. A more limiting bug is that deff doesn't support continuation dots (bug 2419), though the limitation can be circumvented.
string: on functions, returns the input and output arguments, and the function text.
For the m2sci suite:
macr2tree and tree2code: are dual functions introduced most likely with the Matlab→Scilab translator. Probably m2sci handles "program" tlists too (CHECK), and thus macr2tree was introduced in order to have a parallel syntactical check.
macr2tree: is a Scilab primitive which transforms a function into an undocumented tlist of type "program"
tree2code: transforms "programs" into function text, stored in string arrays.
The format of a tlist of type "program" is:
tlist ( [ "program","name","outputs","inputs","statements","nblines" ] , ...)
"name" is the function name
"outputs" is the tlist of the output variables
"inputs" is the tlist of the input variables
"statements" is the list of instructions found in the body of the function
"nblines" is the total number of lines constituting the function
The couple tree2code(macr2tree) used to be at odd with particular syntax constructs, which are gradually sorted out.