The Scipad debugger - Some sort of a white paper

Abstract

This white paper is written to explain how the Scipad debugger works in Scilab 4.x. It describes the inner codings of the debugger engine written in Tcl/Tk.

The purpose is mainly twofold:

The intention of this page is however not to provide any subtle detail about the inner workings of the debugger. Such information is provided in the source code, which is believed to be fully documented. Only the big picture is given here.

The code and picture snapshots in this wiki page were taken from Scipad 6.129.BP2 running with Scilab-4.1.2.


User-level help

See Scipad help file from Scilab-4.1.2.


Typical workflow with the debugger

Suppose the following is the script to debug:

// this is only for the sake of exemplifying
function inner_beauties(par1,par2)
  a = 7
  b = 10
  c = par1 + par2
  disp(a)
  for i=1:5
    d = rand() + myancillary(c)
  end
  disp(d)
endfunction
function out = myancillary(in)
  a = rand()
  out = in + 2*a
endfunction

inner_beauties_1.PNG

inner_beauties_2.PNG

inner_beauties_3.PNG

inner_beauties_4.PNG

inner_beauties_5.PNG


How the debugger works behind the scene

Debugger state machine

The debugger uses a state machine having exactly three states:

There is indeed an order to respect when debugging.

The initial state is NoDebug. In this state no function was configured for debug.

When the latter has successfully happened, the debugger state switches to ReadyForDebug. In this state the debugger is ready to launch the debug and will accept debug commands.

When debug commands will have been given by the user, the debugger will be in the DebugInProgress state. This will be the case until the function to debug finishes execution (the debugger goes back to ReadyForDebug), or the user cancels the debug (then the debugger switches to NoDebug).

The full debug state machine is shown in the following diagram (debugstatemachine.dot ).

debugstatemachine.png

At any time, in any state, the user is able to set or remove breakpoints in his code.

The debug actions are the following:

Basic working principle

To set or remove breakpoints does not need any communication with Scilab at the time the breakpoints are modified. The breakpoints are sent later to Scilab.

After a function has been configured for debug, the user issues a debug command. This first debug action has a number of effects, in the following order:

ScilabEval exec(nonlevelzerocode.sce) sync seq

ScilabEval setbpt("foo",2);foo(); seq

ScilabEval TCL_EvalStr("set callstackcontent """+FormatWhereForWatch(3)+"""","scipad"); seq
ScilabEval TCL_EvalStr("updatewatch_bp","scipad"); seq

ScilabEval
    [db_l,db_m]=where();
    if size(db_l,1)>=3 then
      TCL_EvalStr("updateactbreakpointtag "+string(db_l(3))+" {"+string(db_m(3))+"} ","scipad");
    else
      TCL_EvalStr("updateactbreakpointtag 0 """" ","scipad");
    end;
  seq

ScilabEval
    [db_l,db_m]=where();
    if size(db_l,1)==1 then
      TCL_EvalStr("ScilabEval ""$removecomm"" ""seq"" ","scipad");
      TCL_EvalStr("setdbstate ""ReadyForDebug"" ","scipad");
      TCL_EvalStr("scedebugcleanup_bp","scipad");
      TCL_EvalStr("checkexecutionerror_bp","scipad");
      TCL_EvalStr "updatewatchvars;unsetdebuggerbusycursor","scipad");
      TCL_SetVar("prevdbpauselevel",$initprevdbpauselevel,"scipad");
    else
      if $steppedininsteadofover then
        TCL_EvalStr("ScilabEval TCL_EvalStr(""closecurifopenedbyuabpt"",""scipad"") seq","scipad");
        TCL_EvalStr("ScilabEval {TCL_EvalStr(""set afilewasopenedbyuabpt false"",""scipad"")} seq","scipad");
        TCL_EvalStr("stepbystepout_bp 0 0","scipad");
      elseif $didntwentout then
        TCL_EvalStr("ScilabEval {TCL_EvalStr(""set afilewasopenedbyuabpt false"",""scipad"")} seq","scipad");
        TCL_EvalStr("stepbystepout_bp 0 0","scipad");
      else
        TCL_EvalStr("ScilabEval {TCL_EvalStr(""set afilewasopenedbyuabpt false"",""scipad"")} seq","scipad");
        TCL_EvalStr("ScilabEval ""$skipline""  ""seq"" ","scipad");
        TCL_SetVar("prevdbpauselevel",size(db_l,1),"scipad");
      end
      TCL_EvalStr("ScilabEval ""$cmd"" ""seq"" ","scipad");
      TCL_EvalStr("ScilabEval {TCL_EvalStr(""resetbreakhit_bp"",""scipad"")} seq","scipad");
    end;
  seq

TCL_EvalStr("updatewatchvars;unsetdebuggerbusycursor","scipad");

set stoppedonarealbpt "TCL_EvalStr(\"lsearch \[getreallybptedlines \" + db_m(3) + \"\] \" + string(db_l(3)-1) + \"\",\"scipad\") <> string(-1)"
set breakwashit "TCL_EvalStr(\"isbreakhit_bp\",\"scipad\") == \"true\""
switch -- $stepmode {
    "nostep"   { set steppedininsteadofover "%f" }
    "into"     { set steppedininsteadofover "%f" }
    "over"     { set steppedininsteadofover "(size(db_l,1) > $prevdbpauselevel) & ~($stoppedonarealbpt) & ~($breakwashit)" }
    "out"      { set steppedininsteadofover "%f" }
    "runtocur" { set steppedininsteadofover "%f" }
    "runtoret" { set steppedininsteadofover "%f" }
}

When Scilab encounters a breakpoint, execution stops in the Scilab shell, and the first pause level prompt -1-> is displayed. The queued ScilabEvals now get executed, performing the actions they contain and that are described above.

At this point, the user can examine watched variables,and add or remove them from the watch window. For instance adding variable "a" as a watched variable results in the following code sent to Scilab:

ScilabEval
    if ext_exists("a"),
      [db_svar,db_tysi,db_edit]=FormatStringsForWatch(a);
      TCL_EvalStr("set watchvarsprops(a,value) """+db_svar+"""","scipad");
      TCL_EvalStr("set watchvarsprops(a,tysi) """+db_tysi+"""","scipad");
      TCL_EvalStr("set watchvarsprops(a,editable) """+db_edit+"""","scipad");
    else
      TCL_EvalStr("set watchvarsprops(a,value) ""<?>""","scipad");
      TCL_EvalStr("set watchvarsprops(a,tysi) ""<?>""","scipad");
      TCL_EvalStr("set watchvarsprops(a,editable) true","scipad");
    end;
  seq
ScilabEval TCL_EvalStr("updatewatch_bp","scipad"); seq

This is again a number of ScilabEval seq, again to guarantee execution order of the sent code. Watch variables are retrieved by FormatStringsForWatch, which provides their content, type and size, plus an editability flag. All this gets displayed in the watch window.

Then the user can launch further debug commands, which all result in a number of further ScilabEvals, with the same purpose as above:

ScilabEval [...]=resume(...) seq

This process is repeated until debug end (i.e. execution stops on an error, or the user cancels, or the debugged function comes to execution completion).

This is the big picture for simple debug commands. For step by step, run to cursor, or run to return point, it is even more complicated, although the basic principle remains the same. The break command however is a bit special: it launches a

ScilabEval setbpt(allfuns,alllineumbers) sync seq

while Scilab is already running, i.e. during execution of the above

ScilabEval setbpt("foo",2); foo(); seq

This is another example of a reentrant call, that must obviously be executed immediately by Scilab, and not when the previous instruction is finished.

Checking from Scipad whether Scilab is busy

In Scilab 4, a special Tcl variable named sciprompt is set to:

For instance when at the main --> prompt, sciprompt value is 0, and when at the -2-> prompt, it's value is 2.

Updating sciprompt is done by the tksynchro routine, which is called by the main parsing routine of the Scilab shell (parse.c).

This feature allows to check very simply from any Tcl script, and in particular from Scipad, whether Scilab is busy or not. Indeed the Scipad debugger needs to know whether Scilab is busy or not for a number of reasons, the basic one being to know whether the user can be allowed to issue debug commands or not. When Scilab is busy of course he is not (with the exception of the Break command).


Use cases of the ScilabEval options for the Scipad debugger

Why the "sync" option is useful in Scipad

In Scipad the usual flow is Tcl code only but sometimes I need to have some code executed in Scilab and the execution results to be returned to the Tcl space for use by Scipad.

For this I use:

ScilabEval {TCL_SetVar("myTclvar",here_Scilab_code(args),"scipad")}

Most of the time, I need to get the result of such an evaluation (i.e. I need the new content of myTclvar) immediately, that is ScilabEval must not return before evaluation is finished in Scilab. The Tcl instruction right after the ScilabEval will use myTclvar that ScilabEval has just set, for instance:

ScilabEval {TCL_SetVar("myTclvar",here_Scilab_code(args,"scipad")}
dosomething $myTclVar

proc dosomething is a Tcl procedure.

The above code does not work as it is, not even in Scilab 4.x, because ScilabEval without any option just queues its argument (here: TCL_SetVar("myTclvar",...) ) and this is gonna be executed at some non controlled point of time after ScilabEval returns.

Then, dosomething at the moment it is executed will not use the new content of myTclvar but some old content (if there was one, otherwise you get a Tcl error: can't read "myTclVar": no such variable).

So most of the time (there are other, much less frequent, use cases) I use the sync option because I need code evaluation by Scilab to be performed before ScilabEval returns, and I do this because I need the result of this evaluation immediately in the Tcl space.

This is called synchronous execution and is definitely needed.

Why the "seq" option is useful in Scipad

The seq option of ScilabEval is used in Scipad to guarantee that a given series of ScilabEval will execute in the order they were coded in Scipad. Execution is sequential, and a Scilab code queued by a ScilabEval must not be interrupted by Scilab starting to evaluate a callback queued later in the queue because in Scipad this second callback will often use the results of the first one.

Let's give an example not related to Scipad, but that reflects how it is used in Scipad pretty well.

For instance consider this piece of code stored in a Tcl file named seq_expl.tcl:

unset -nocomplain myTclVar
ScilabEval {sleep(rand()*100);a=1}
ScilabEval {sleep(rand()*100);a=2}
ScilabEval {TCL_SetVar("myTclvar",string(a))}
ScilabEval "disp($myTclvar)"

As it is above, when you TCL_EvalFile this, you get (in Scilab 4):

-->clear a;TCL_EvalFile seq_expl.tcl

-->TCL_SetVar("myTclvar",string(a))
                               !--error 4
undefined variable : a
while executing a callback
while executing a callback
while executing a callback

But when the seq options are added after each ScilabEval you get the correct result:

-->clear a;TCL_EvalFile seq_expl.tcl

-->
    2.

The seq option is needed in Scipad for the same reason. Queued callbacks need the execution results of previously queued callbacks. Non-interrupted sequential execution is therefore mandatory.

This explanation for the seq option then brings another question: why not always use "sync" then?

Why the "flush" option is useless for Scipad

IIRC, the flush option of ScilabEval has been developed as a patch trying to work around limitations of sync, seq and sync/seq options at times when these options were not working correctly, or to handle special cases such as ScilabEval {abort}

What it does is that it unqueues the callbacks one by one and forces synchronous execution of each of them, keeping the seq flag as it was when the corresponding ScilabEval was issued. When everything has been executed synchronously, flush returns. In short, "flush" empties the queue by forcing immediate (but possibly interrupted) evaluation of what is in this queue.

I'm not sure it ever really worked correctly. I think it should now work OK but I didn't try for a long time. In Scipad I didn't find a use case of it, again because in the debugger most of the time we're in pause mode launched by a seq (and then, sync thus flush cannot be used). I think the flush feature has been developed as a patch to cure semi-broken implementations of sync, which are now pretty much tackled as far as I know, at least in Scilab 4.x environments.

I don't use flush in Scipad at all. No idea if anybody else uses it however, be it in Scilab source or in toolboxes or user scripts we've never heard of. The flush option should probably be maintained, at least for the sake of backwards compatibility.

What is needed for Scipad

What is needed is after all quite simple:

ScilabEval "..." [options] must behave just like described in its help file, namely:

(see also comment #3 in bug 1086)

All these complicated options were in fact needed because in the early ages the ScilabEval implementation was using the queue mechanism, which in turn brought asynchronism and non sequentiality of execution.

Now sync and seq are just patches on this queuing mechanism because the normally expected behavior of a command is in fact sync seq:

The only command that has a default behavior opposite to this standard expectation is precisely ScilabEval.

What is needed in the Scipad debugger is a way to control the order of execution of Scilab code interlaced with Tcl code.

In Scilab 4.x I achieved this using ScilabEval(TCL_EvalStr(ScilabEval ...) ) contraptions.

The fact that this does not work anymore in Scilab 5 is perhaps a good opportunity to reset all the mess and restart a better design.


Situation in the current Scilab 5 trunk

The new Tcl interface has removed the possibility to reenter the Tcl interpreter while it is already busy. This has broken the debugger, which is entirely built on ScilabEval(TCL_EvalStr(ScilabEval ...) ) constructs.

See inner details of the new Tcl thread here.

Moreover, Scipad is no longer aware of when Scilab is busy or not. The sciprompt synchronisation mechanism between Scilab and Tcl has been lost.

Besides, the operational team of the Scilab consortium reported recently that the breakpointing system (pertaining to the Scilab parser) is also somewhat at fault in the new code architecture of Scilab 5. Their statement is:


Now how to make the debugger work in Scilab 5, given the loss of reentrancy and Scilab loop lock compared to Scilab 4?

Observations

Observation 1

Despite what is stated at the bottom of Tcl Thread, this reentrant call works well in Scilab 5:

TCL_EvalStr("ScilabEval {TCL_EvalStr(""set h hello"")} ""seq"" ")

Why? Two possibilities I see:

Even this is working:

TCL_EvalStr("ScilabEval {TCL_EvalStr(""ScilabEval {TCL_EvalStr(""""set h hello"""")} """"seq"""" "") } ""seq"" ")

Note that recent hints from commits and information from the opteam lead to think that TCL_EvalStr is now working while a TCL_EvalFile instruction is executed. Not sure that this answers observation #1 though.

Observation 2

Why is there a number of TODO comments in ScilabEval.c, such as:

// TODO : Scilab is supposed to be busy there. Add mutex lock...

Is the work described in Tcl Thread really finished?

Observation 3

Perhaps Tcl Thread could explain why a mutex commandQueueSingleAccess was needed in StoreCommandWithFlag and in GetCommand, and what is the LaunchScilab signal (see in dynamic_menus.c)

Observation 4

<TODO>


Proposals for a solution

To ensure strict sequentiality of the interleaved Tcl - Scilab - Tcl commands, the pattern (in Tcl) could be:

ScilabLock #only Tcl can push commands to the interpreter
TCLcriticalcommand1
ScilabEval scilabinstruction seq # ScilabEval returns only after completion
                             # *OR* when execution reaches a pause or a breakpoint
TCLcriticalcommand 2
ScilabUnlock #now the user can interact again with the shell

This might avoid some round trips to the Scilab interpreter, just for the sake of knowing how far the execution is. The Tcl thread does not need to change but only to read these variables, which could ease access locking issues.

Say, ScilabGetVar and ScilabSetVar, dual of TCL_GetVar and TCL_SetVar.

This also would avoid a roundtrip ScilabEval(TclSetvar...), leading at least to a simplification in some writeups.

This would allow replacing some of the recursive constructs, e.g.:

ScilabEval "TCL_SetVar(\"errline\", msprintf(\" %d\",db_l), \"scipad\");" "sync" "seq"

would become:

set errline [ScilabEval "msprintf(\" %d\",db_l)" "sync" "seq"]

This could at least be implemented when the result of ScilabEval evaluation is a string.

Put the sync callback instruction in the queue BUT in the "first to use position" (instead of the "last to use position" that is used for non-sync callbacks). This way as soon as Scilab is in a state compatible with the callback execution it will start the stored sync callback before any other.

Further brilliant as well as stupid ideas welcome since I have currently no other.

public: Scipad debugger inner beauties (last edited 2011-03-30 16:18:34 by localhost)