Scilab function variables: representation, manipulation.
Functions in Scilab are stored as variables in the stack. Apparently, functions are generated from parsing scilab textual code by a process called "compilation", which seems rather to be a translation in a condensed, tokenized internal representation (called improperly pseudocode).
There is no published description, neither of the parsing of the function text which produces the function pseudocode nor of the storage conventions and implications. It is not clear to me whether this reflects an unstated intention of keeping the innermost proprietary details of scilab deliberately cryptic, or is just a result of the development history of Scilab.
Nevertheless, a better understanding of the works of the parser, and its way of storing and perusing code data, would be beneficial for any attempt of designing or improving modern Scilab code tools - like a lexer, a profiler, a debugger, a cross-compiler, a code differentiator, and so on.
What follows are a few personal deductions, for reference.
There are three sorts of functions: uncompiled (type 11), compiled (type 13) and compiled with provisions for profiling (type 13 as well).
The basis for the storage seems to be a header, detailing function type, input and output arguments and size, and a function body, stored either as program text (type 11) or pseudocode (type 13). Some of this information can be conjectured from help save_format, as it is likely that, for economy, the binary scilab files are essentially a dump of the stack structures. Historically, once upon a time function variables were stripped of the comment text in the function definition; now comments are preserved and stored along.
In all cases, the function body seems to be organized in elementary chunks corresponding to individual code lines. Both breakpointing and profiling operate with such a granularity.
Functions compiled for profiling apparently differ from those "just compiled", in that two extra words are added per function line (this is roughly deducible from the function size reported by who). I figure out that one of such words is for storing a cumulative call count, while the second for storing the cumulative time spent. Besides that, profilable functions are compatible with breakpointing (the time spent waiting for user input at the breakpoint is even *not* cumulated in the timing, correctly); the impact of making a function profilable on the performance seems at all negligible.
There are few Scilab functions providing some degree of access to the function types, and the possibility to manipulate them. These are:
Interfacing with files:
exec: within its general purpose of loading, parsing and executing scilab code, it can be seen as a function which gets function text from a file, and stores it as plain compiled function (type 13, no option of profiling). This is the most robust function (less odd bugs reported)
getf: gets function text from a file, and store it in function variables, in either of the three forms. getf was once the only way to load function text from a file, at a time when a file could define a single function only, and nested function definitions were not at the horizon. Incremental extensions attempts caused obsolescence bugs. Up to Scilab 4.1.1, getf was at odd with files containing functions and zero level code (bug 1968), is confused by the fragment "function" appearing in the text code (legal, bug 2253), improperly accepts "FUNCTION" (bug 2434) , and digests nested functions only if they are found on a single line (bug 1564) . Only lately getf has been finally declared obsolete, intending that further attempts to repair it will only address backward compatibility with the old script format (no mandatory endfunction, no nested functions), and no other extension, whereas the recommended command is exec.
getd: is based on calling getf in a loop for all files of a given directory, it suffers from getf limitations (bug 2019, bug 2130) .
genlib: also loops on all *.sci files in a directory, in order to ceate a library object. Was once based on getf, now on exec.
save: dumps stack objects in a binary file, probably without any data translation. It is thus a last resort way to access the bytecode function data.
Operating on functions, or producing functions:
deff: stores function text defined in a string or in a string vector into a function variable, of any of the three sorts. It was once thought to be an "unconvenient but shorthanded way of defining online functions". Unconvenient because typing strings (with escaped double quotes, etc.) at the command prompt looked tedious; however, considering that this function provides a very direct string→function conversion, the perspective is different. In the past it also seemed that functions defined by deff were of a sort incompatible with breakpointing (bug 884, since then evaporated). A minimal limitation is that the function text needs to be stripped of the keywords function - endfunction in order to be fed to deff, but this can be easily worked out by string operations. A more limiting bug is that deff doesn't support continuation dots (bug 2419), though the limitation can be circumvented.
comp: transforms *in place* an uncompiled function (type 11) into a compiled, non profilable one (type 13). comp accepts a second optional argument which is said in the help to be 0 or 1. Serge said that it is absolutely ignored (bug 2117). However, if this argument is 2, the uncompiled function is transformed into a compiled profilable one, which turns most useful.
string: on functions, returns the input and output arguments, and for uncompiled functions also the function text (see bug 1370).
macr2lst: scilab primitive, which derives, from function variables of any of the three kinds, a list variable of undocumented format (could be perhaps reverse-engineered...). This is a very old primitive, apparently introduced in connection with the abortive and now obsolesced Scilab→Fortran translator. I'm not sure whether macr2lst is at odd with nested functions. fun2string(macr2lst()) chews nested functions, but the list generated by macr2lst doesn't seem to include the correct line separators [25 x y] for subfunction lines (bug 2413), for profilable functions. The test lst(5)(1)=="25" (where lst is the output of macr2lst) is taken by Serge as sufficient for classifying a function variable as profilable.
fun2string: translates a function of any of the three kinds into function text, in a string matrix. It is based on macr2lst and a host of other ancillary functions defining the reverse compilation and formatting details. It suffers of some minor bugs causing smaller differences in line numbering and formatting in presence of comments and nested function definitions (bugs 731, 1469, 1819 2347). fun2string can also skip the initial call macr2lst and accept as input argument a list form of the function; this is not mentioned in the help, either.
profile: extracts the timing statistics from a profilable function, via macr2lst. profile is at odd with nested functions, because it relies on macr2list, providing then a wrong line count (bug 2413). Analyzing the output of macr2lst() and the works of get_profile() [ancillary of profile()], I see that in the list form of the function, when the function is profilable, a list element [25 x y] appears for every line, with x the call count and y the time spent. get_profile just collects this data.
showprofile and plotprofile: display the profile information in visual form, catering on profile's output (and using fun2string to reproduce the corresponding function text).
bytecode: an easter egg primitive, surprisingly committed by Serge on BUILD_4 (svn commit 16382on matsys.f, 16/05/07) only. Its apparent purpose is to convert a function variable to a numeric array containing its pseudocode, and possibly viceversa. Probably unfinished. To activate (on a BUILD_4 snapshot later than 16/05/2007), issue newfun("bytecode",1367). The primitive, as is now, complains if its argument is not a function, and causes an EXCEPTION_ACCESS_VIOLATION, with no result.
For the m2sci suite:
macr2tree and tree2code: are dual functions introduced most likely with the Matlab→Scilab translator. Probably m2sci handles "program" tlists too (CHECK), and thus macr2tree was introduced in order to have a parallel syntactical check.
macr2tree: is a Scilab primitive which transforms a function into an undocumented tlist of type "program"
tree2code: transforms "programs" into function text, stored in string arrays.
The format of a tlist of type "program" is:
tlist ( [ "program","name","outputs","inputs","statements","nblines" ] , ...)
"name" is the function name
"outputs" is the tlist of the output variables
"inputs" is the tlist of the input variables
"statements" is the list of instructions found in the body of the function
"nblines" is the total number of lines constituting the function
The couple tree2code(macr2tree) used to be at odd with particular syntax constructs, which are gradually sorted out. Does it support profilable functions (bug 1619)?
Format of lst=macr2lst(foo)
All members of lst are either string vectors or lists (nested, of the same sort)
lst(1)= function name
lst(2)= output arguments names
lst(3)= input argument names
lst(4)= "15" (end of line 1, corresponding to the function header)
lst(5)= ["25" "x" "y”] for profilable functions, beginning of the pseudocode for plain compiled functions
if clauses appear as nested lists (a list for the condition, one for the true case, one for the false case)
//comment lines appear in lst as ["31" "comment_text_stripped_of_the_trailing_slashes”]
lines of code which is not compiled immediately (e.g., nested function definitions) are reported as embedded text strings, i.e ["3" "string”] (this is what happens for the subfunction name) or ["26" nlines 1 "string1" "string2" ... "string_nlines”]. The problem is, whitespace in the code can give odd results (see again the discussion inbug 2413). A subfunction definition is closed by ["20" "deff" z1 z2].
A partial opcode list
- decrypted from:
functions cod2sci and exp2sci (ancillaries of fun2string). op(1) is the first (numeric) entry of the string vector, list element of the result of macr2lst
the C function GetInstruction (ancillary of intmacr2tree)
op(1) |
meaning |
0 |
deleted operation |
1 |
stackp (i.e. stack put, retained for compatibility with 2.7 and earlier version) |
2 |
stackg (i.e. stack get) |
3 |
string |
4 |
empty matrix |
5 |
allops (i.e. operations) |
6 |
number |
7 |
"for-end" control instruction |
8 |
"if-then-else" control instruction |
9 |
"while-end" control instruction |
10 |
"select-case-end" control instruction |
11 |
"try-catch-end" control instruction |
12 |
pause |
13 |
break |
14 |
abort |
15 |
EOL |
16 |
set line number |
17 |
quit |
18 |
named variable |
19 |
mkindx (make recursive index list: start of a new opcode index structure) |
20 |
functions |
21 |
beginning of rhs |
22 |
set print mode |
23 |
create variable from name |
24 |
create object with type 0 |
25 |
profiling information |
26 |
vector of strings |
27 |
funptr variable |
28 |
continue |
29 |
affectation (assignment) |
30 |
expression evaluation short circuiting |
31 |
comment "in multiline matrix definition a=[..." (?) |
99 |
return |
A fast Scilab function for listing all the function variables in a namespace, together with their kind:
function [flist,compiled,profilable,called]=listfunctions()
nam=who("get")'
called=uint32(zeros(nam)); afun=(called==1); pfun=afun; cfun=pfun;
for i=1:size(nam,2)
clear rvar lst;
// rvar is cleared to avoid function redefinition warning
// lst (topmost, variable size) is cleared to speed up garbage collection
execstr("rvar="+nam(i));
if type(rvar)==11 then afun(i)=%t; end
if type(rvar)==13 then
afun(i)=%t; cfun(i)=%t;
lst=macr2lst(rvar)
pfun(i)=and(lst(5)(1)=="25")
if pfun(i) then execstr("called(i)="+lst(5)(2)); end
end
end
flist=nam(afun)
compiled=cfun(afun)
profilable=pfun(afun)
called=called(afun)
endfunction
Tricks for converting functions of one kind into another:
uncompile a function of type 13 (both kinds) into one of type 11:
funtext=fun2string(foo,"foo") deff(strsubst(funtext(1),"function ",""),funtext(2:\$-1),"n")
type 11 to type 13 (non profilable):
comp(foo)
type 11 to type 13 (profilable):
comp(foo, 2)
type 13 (profilable) to type 13 (not profilable) and viceversa: do it as for →type 11, just call deff(...,"c") or deff(...,"p").
All together in nice function form:
function recompilefunction(funname,kind,force)
if ~exists("force","local") then force=%f; end
if ~exists("kind","local") then kind="c"; end
if ~exists(funname)
error("No variable named: "+funname)
end
clear fvar funtext tempfun
execstr("fvar="+funname)
if ~or(type(fvar)==[11 13]) then
error(funname+" must be the name of a scilab function variable")
end
if type(fvar)==11 & ~force then
oldkind="n"
if kind=="n" then
warning(funname+" is already noncompiled, nothing to do!")
return
end
//can't avoid "Warning: redefining function: fvar", sorry
// if kind=="c" then comp(fvar); end
// if kind=="p" then comp(fvar,2); end
// execstr(funname+"=resume(fvar)")
//or:
[out,in,funtext]=string(fvar);
deff("["+strcat(out,",")+"]=tempfun("+strcat(in,",")+")",..
funtext,kind)
execstr(funname+"=resume(tempfun)")
elseif type(fvar)==13 then
lst=macr2lst(fvar)
if lst(5)(1)=="25" then oldkind="p"; else oldkind="c"; end
if kind=="c" & oldkind=="c" & ~force then
warning(funname+" is already compiled, nothing to do!")
return
end
if kind=="p" & oldkind=="p" & ~force then
warning(funname+" is already compiled for profiling, nothing to do!")
return
end
funtext=fun2string(lst,"tempfun")
deff(strsubst(funtext(1),"function ",""),funtext(2:\$-1),kind)
execstr(funname+"=resume(tempfun)")
end
endfunction