1. The Scipad debugger - Some sort of a white paper

Contents

The Scipad debugger - Some sort of a white paper

1.1. Abstract

This white paper is written to explain how the Scipad debugger works in Scilab 4.x. It describes the inner codings of the debugger engine written in Tcl/Tk.

The purpose is mainly twofold:

To document the debugger code at a higher level than the source code, thus helping in maintaining this feature in Scipad
To describe clearly how it works and why it has been coded like it is in Scilab 4.x. The goal is to have a non ambiguous starting point for trying to find a solution to make it work in Scilab 5.0. Indeed, the debugger is currently broken in Scilab 5. This is reported as bug 2789, and the root cause of why it is broken is probably a mixture of the code reorganization that happened in Scilab 5 and rewriting of the Tcl interface along with the loss of reentrancy of the Tcl calls, which is described in the Tcl Thread page of this wiki. Public discussion has started on the Scilab development list about how to fix this problem for Scilab 5, see here, here and here. Private discussions happened also in January 2009, from which some material was borrowed to complete this page.

The intention of this page is however not to provide any subtle detail about the inner workings of the debugger. Such information is provided in the source code, which is believed to be fully documented. Only the big picture is given here.

The code and picture snapshots in this wiki page were taken from Scipad 6.129.BP2 running with Scilab-4.1.2.

1.2. User-level help

See Scipad help file from Scilab-4.1.2.

1.3. Typical workflow with the debugger

Suppose the following is the script to debug:

// this is only for the sake of exemplifying
function inner_beauties(par1,par2)
  a = 7
  b = 10
  c = par1 + par2
  disp(a)
  for i=1:5
    d = rand() + myancillary(c)
  end
  disp(d)
endfunction
function out = myancillary(in)
  a = rand()
  out = in + 2*a
endfunction

Here is how it looks like in Scipad before the debug is started

Configure the function to debug

Eventually set breakpoints

Run the debug with one of the debug commands, say "Go to next breakpoint"

Use the watch window to check for variables contents, modify them, watch expressions, check the call stack, or the reported execution error,... and add or remove breakpoints on the fly

Issue further debug commands
End of debug cycle: start again until the script is working

1.4. How the debugger works behind the scene

1.4.1. Debugger state machine

The debugger uses a state machine having exactly three states:

NoDebug
ReadyForDebug
DebugInProgress

There is indeed an order to respect when debugging.

The initial state is NoDebug. In this state no function was configured for debug.

When the latter has successfully happened, the debugger state switches to ReadyForDebug. In this state the debugger is ready to launch the debug and will accept debug commands.

When debug commands will have been given by the user, the debugger will be in the DebugInProgress state. This will be the case until the function to debug finishes execution (the debugger goes back to ReadyForDebug), or the user cancels the debug (then the debugger switches to NoDebug).

The full debug state machine is shown in the following diagram (debugstatemachine.dot ).

At any time, in any state, the user is able to set or remove breakpoints in his code.

The debug actions are the following:

insertremove_bp: insert or remove a breakpoint
tonextbreakpoint_bp: run the debugged function, or continue execution, until a breakpoint gets hit
runtocursor_bp: run or continue execution until the cursor position in Scipad is reached
runtoreturnpoint_bp: run or continue execution until the return point of the current (sub-)function is reached
goonwo_bp: run execution to the end of the debugged script or function, as if no breakpoint were set
canceldebug_bp: cancel the debug - the script is not run to its end
stepbystepinto_bp: run or continue execution step by step, allowing to step into subfunctions
stepbystepover_bp: run or continue execution step by step, without entering subfunctions
stepbystepout_bp: step out of the current function and go to the instruction following the one that called the current subfunction
break_bp: stop execution of the script - useful to check where a long execution is perhaps stuck

1.4.2. Basic working principle

To set or remove breakpoints does not need any communication with Scilab at the time the breakpoints are modified. The breakpoints are sent later to Scilab.

After a function has been configured for debug, the user issues a debug command. This first debug action has a number of effects, in the following order:

All non level zero code from all scilab scheme buffers is put in a hidden buffer and exec'ed in Scilab. In simplified pseudocode, this translates into:

ScilabEval exec(nonlevelzerocode.sce) sync seq

The breakpoints set by the user are sent to Scilab, and the configured function is launched in Scilab, for instance:

ScilabEval setbpt("foo",2);foo(); seq

The commands allowing to update everything are queued so that:
- the watch window can be updated when the debug stops on the breakpoint,

ScilabEval TCL_EvalStr("set callstackcontent """+FormatWhereForWatch(3)+"""","scipad"); seq
ScilabEval TCL_EvalStr("updatewatch_bp","scipad"); seq

the generic expressions from the watch window get executed in the Scilab shell, and the active breakpoint position gets updated in Scipad.

ScilabEval
    [db_l,db_m]=where();
    if size(db_l,1)>=3 then
      TCL_EvalStr("updateactbreakpointtag "+string(db_l(3))+" {"+string(db_m(3))+"} ","scipad");
    else
      TCL_EvalStr("updateactbreakpointtag 0 """" ","scipad");
    end;
  seq

Most important, further ScilabEvals are queued for checking whether this command has in fact run the execution up to completion of the debugged script or not:

ScilabEval
    [db_l,db_m]=where();
    if size(db_l,1)==1 then
      TCL_EvalStr("ScilabEval ""$removecomm"" ""seq"" ","scipad");
      TCL_EvalStr("setdbstate ""ReadyForDebug"" ","scipad");
      TCL_EvalStr("scedebugcleanup_bp","scipad");
      TCL_EvalStr("checkexecutionerror_bp","scipad");
      TCL_EvalStr "updatewatchvars;unsetdebuggerbusycursor","scipad");
      TCL_SetVar("prevdbpauselevel",$initprevdbpauselevel,"scipad");
    else
      if $steppedininsteadofover then
        TCL_EvalStr("ScilabEval TCL_EvalStr(""closecurifopenedbyuabpt"",""scipad"") seq","scipad");
        TCL_EvalStr("ScilabEval {TCL_EvalStr(""set afilewasopenedbyuabpt false"",""scipad"")} seq","scipad");
        TCL_EvalStr("stepbystepout_bp 0 0","scipad");
      elseif $didntwentout then
        TCL_EvalStr("ScilabEval {TCL_EvalStr(""set afilewasopenedbyuabpt false"",""scipad"")} seq","scipad");
        TCL_EvalStr("stepbystepout_bp 0 0","scipad");
      else
        TCL_EvalStr("ScilabEval {TCL_EvalStr(""set afilewasopenedbyuabpt false"",""scipad"")} seq","scipad");
        TCL_EvalStr("ScilabEval ""$skipline""  ""seq"" ","scipad");
        TCL_SetVar("prevdbpauselevel",size(db_l,1),"scipad");
      end
      TCL_EvalStr("ScilabEval ""$cmd"" ""seq"" ","scipad");
      TCL_EvalStr("ScilabEval {TCL_EvalStr(""resetbreakhit_bp"",""scipad"")} seq","scipad");
    end;
  seq

All ScilabEvals here need the seq parameter, and only the seq parameter, in order to guarantee execution order.

Note also that ScilabEval( TCL_EvalStr (ScilabEval ...) ) contraptions must be used in order to guarantee execution order of mixed Scilab/Tcl code.

As a matter of fact, the seq parameter of ScilabEval only queues Tcl instructions. With this only, execution order of the Scilab code is not guaranteed and would in fact execute in-between the Tcl instructions queued by ScilabEval seq.

This is definitely not what is wanted, thus ScilabEval( TCL_EvalStr (ScilabEval ...) ) structures must be used.

This is perhaps apparent in the above code, but also for instance the single

TCL_EvalStr("updatewatchvars;unsetdebuggerbusycursor","scipad");

launches the Tcl proc updatewatchvars, which in turn launches further ScilabEval(TCL_EvalStr (conditional expression)).

The effect of all this queueing instructions is therefore to launch the debugged function in Scilab and to prepare commands that will be executed when execution of this function hits the first breakpoint in Scilab.

Note that the last ScilabEval contains a main if statement: Depending on the content of where(), the debugger decides whether the function execution is finished or not, and launches conditional actions queueing further ScilabEvals.

$removecomm, $initprevdbpauselevel, $steppedininsteadofover, $didntwentout, $skipline and $cmd are all Tcl strings that contain further commands. The point is that those commands are built conditionally, depending on where the current breakpoints are located, on what exact debug command has been launched previously by the user, or on other complicated conditions. For instance, here is how $steppedininsteadofover is defined in the Tcl code:

set stoppedonarealbpt "TCL_EvalStr(\"lsearch \[getreallybptedlines \" + db_m(3) + \"\] \" + string(db_l(3)-1) + \"\",\"scipad\") <> string(-1)"
set breakwashit "TCL_EvalStr(\"isbreakhit_bp\",\"scipad\") == \"true\""
switch -- $stepmode {
    "nostep"   { set steppedininsteadofover "%f" }
    "into"     { set steppedininsteadofover "%f" }
    "over"     { set steppedininsteadofover "(size(db_l,1) > $prevdbpauselevel) & ~($stoppedonarealbpt) & ~($breakwashit)" }
    "out"      { set steppedininsteadofover "%f" }
    "runtocur" { set steppedininsteadofover "%f" }
    "runtoret" { set steppedininsteadofover "%f" }
}

When Scilab encounters a breakpoint, execution stops in the Scilab shell, and the first pause level prompt -1-> is displayed. The queued ScilabEvals now get executed, performing the actions they contain and that are described above.

At this point, the user can examine watched variables,and add or remove them from the watch window. For instance adding variable "a" as a watched variable results in the following code sent to Scilab:

ScilabEval
    if ext_exists("a"),
      [db_svar,db_tysi,db_edit]=FormatStringsForWatch(a);
      TCL_EvalStr("set watchvarsprops(a,value) """+db_svar+"""","scipad");
      TCL_EvalStr("set watchvarsprops(a,tysi) """+db_tysi+"""","scipad");
      TCL_EvalStr("set watchvarsprops(a,editable) """+db_edit+"""","scipad");
    else
      TCL_EvalStr("set watchvarsprops(a,value) ""<?>""","scipad");
      TCL_EvalStr("set watchvarsprops(a,tysi) ""<?>""","scipad");
      TCL_EvalStr("set watchvarsprops(a,editable) true","scipad");
    end;
  seq
ScilabEval TCL_EvalStr("updatewatch_bp","scipad"); seq

This is again a number of ScilabEval seq, again to guarantee execution order of the sent code. Watch variables are retrieved by FormatStringsForWatch, which provides their content, type and size, plus an editability flag. All this gets displayed in the watch window.

Then the user can launch further debug commands, which all result in a number of further ScilabEvals, with the same purpose as above:

Send modified watched variables to the Scilab shell, and resume execution:

ScilabEval [...]=resume(...) seq

Update everything when the debug will stop again: watched variables, generic expression execution, etc. And check for end of execution.

This process is repeated until debug end (i.e. execution stops on an error, or the user cancels, or the debugged function comes to execution completion).

This is the big picture for simple debug commands. For step by step, run to cursor, or run to return point, it is even more complicated, although the basic principle remains the same. The break command however is a bit special: it launches a

ScilabEval setbpt(allfuns,alllineumbers) sync seq

while Scilab is already running, i.e. during execution of the above

ScilabEval setbpt("foo",2); foo(); seq

This is another example of a reentrant call, that must obviously be executed immediately by Scilab, and not when the previous instruction is finished.

1.4.3. Checking from Scipad whether Scilab is busy

In Scilab 4, a special Tcl variable named sciprompt is set to:

-1 when Scilab is busy executing instructions
the pause level when Scilab is idle

For instance when at the main --> prompt, sciprompt value is 0, and when at the -2-> prompt, it's value is 2.

Updating sciprompt is done by the tksynchro routine, which is called by the main parsing routine of the Scilab shell (parse.c).

This feature allows to check very simply from any Tcl script, and in particular from Scipad, whether Scilab is busy or not. Indeed the Scipad debugger needs to know whether Scilab is busy or not for a number of reasons, the basic one being to know whether the user can be allowed to issue debug commands or not. When Scilab is busy of course he is not (with the exception of the Break command).

1.5. Use cases of the ScilabEval options for the Scipad debugger

1.5.1. Why the "sync" option is useful in Scipad

In Scipad the usual flow is Tcl code only but sometimes I need to have some code executed in Scilab and the execution results to be returned to the Tcl space for use by Scipad.

For this I use:

ScilabEval {TCL_SetVar("myTclvar",here_Scilab_code(args),"scipad")}

Most of the time, I need to get the result of such an evaluation (i.e. I need the new content of myTclvar) immediately, that is ScilabEval must not return before evaluation is finished in Scilab. The Tcl instruction right after the ScilabEval will use myTclvar that ScilabEval has just set, for instance:

ScilabEval {TCL_SetVar("myTclvar",here_Scilab_code(args,"scipad")}
dosomething $myTclVar

proc dosomething is a Tcl procedure.

The above code does not work as it is, not even in Scilab 4.x, because ScilabEval without any option just queues its argument (here: TCL_SetVar("myTclvar",...) ) and this is gonna be executed at some non controlled point of time after ScilabEval returns.

Then, dosomething at the moment it is executed will not use the new content of myTclvar but some old content (if there was one, otherwise you get a Tcl error: can't read "myTclVar": no such variable).

So most of the time (there are other, much less frequent, use cases) I use the sync option because I need code evaluation by Scilab to be performed before ScilabEval returns, and I do this because I need the result of this evaluation immediately in the Tcl space.

This is called synchronous execution and is definitely needed.

1.5.2. Why the "seq" option is useful in Scipad

The seq option of ScilabEval is used in Scipad to guarantee that a given series of ScilabEval will execute in the order they were coded in Scipad. Execution is sequential, and a Scilab code queued by a ScilabEval must not be interrupted by Scilab starting to evaluate a callback queued later in the queue because in Scipad this second callback will often use the results of the first one.

Let's give an example not related to Scipad, but that reflects how it is used in Scipad pretty well.

For instance consider this piece of code stored in a Tcl file named seq_expl.tcl:

unset -nocomplain myTclVar
ScilabEval {sleep(rand()*100);a=1}
ScilabEval {sleep(rand()*100);a=2}
ScilabEval {TCL_SetVar("myTclvar",string(a))}
ScilabEval "disp($myTclvar)"

As it is above, when you TCL_EvalFile this, you get (in Scilab 4):

-->clear a;TCL_EvalFile seq_expl.tcl

-->TCL_SetVar("myTclvar",string(a))
                               !--error 4
undefined variable : a
while executing a callback
while executing a callback
while executing a callback

But when the seq options are added after each ScilabEval you get the correct result:

-->clear a;TCL_EvalFile seq_expl.tcl

-->
    2.

The seq option is needed in Scipad for the same reason. Queued callbacks need the execution results of previously queued callbacks. Non-interrupted sequential execution is therefore mandatory.

This explanation for the seq option then brings another question: why not always use "sync" then?

First of all, "sync" is still interruptible, and this is a problem unless both "sync" and "seq" are used.
Second answer is more complicated. I can't always use "sync" (or "sync" "seq") because the sync option does not work when I'm in pause mode in the middle of a callback launched in seq mode (which is exactly how the debugger is working). This is what I tried to explain just above.

1.5.3. Why the "flush" option is useless for Scipad

IIRC, the flush option of ScilabEval has been developed as a patch trying to work around limitations of sync, seq and sync/seq options at times when these options were not working correctly, or to handle special cases such as ScilabEval {abort}

What it does is that it unqueues the callbacks one by one and forces synchronous execution of each of them, keeping the seq flag as it was when the corresponding ScilabEval was issued. When everything has been executed synchronously, flush returns. In short, "flush" empties the queue by forcing immediate (but possibly interrupted) evaluation of what is in this queue.

I'm not sure it ever really worked correctly. I think it should now work OK but I didn't try for a long time. In Scipad I didn't find a use case of it, again because in the debugger most of the time we're in pause mode launched by a seq (and then, sync thus flush cannot be used). I think the flush feature has been developed as a patch to cure semi-broken implementations of sync, which are now pretty much tackled as far as I know, at least in Scilab 4.x environments.

I don't use flush in Scipad at all. No idea if anybody else uses it however, be it in Scilab source or in toolboxes or user scripts we've never heard of. The flush option should probably be maintained, at least for the sake of backwards compatibility.

1.5.4. What is needed for Scipad

What is needed is after all quite simple:

ScilabEval "..." [options] must behave just like described in its help file, namely:

"seq" means not interruptible
"sync" means ScilabEval does not return before Scilab has finished evaluation
"sync" "seq" is a non interruptible sync

(see also comment #3 in bug 1086)

All these complicated options were in fact needed because in the early ages the ScilabEval implementation was using the queue mechanism, which in turn brought asynchronism and non sequentiality of execution.

Now sync and seq are just patches on this queuing mechanism because the normally expected behavior of a command is in fact sync seq:

it should not return until execution is finished
it should not be interruptible, i.e. another command should not start executing in the middle of what is currently executed

The only command that has a default behavior opposite to this standard expectation is precisely ScilabEval.

What is needed in the Scipad debugger is a way to control the order of execution of Scilab code interlaced with Tcl code.

In Scilab 4.x I achieved this using ScilabEval(TCL_EvalStr(ScilabEval ...) ) contraptions.

The fact that this does not work anymore in Scilab 5 is perhaps a good opportunity to reset all the mess and restart a better design.

1.6. Situation in the current Scilab 5 trunk

The new Tcl interface has removed the possibility to reenter the Tcl interpreter while it is already busy. This has broken the debugger, which is entirely built on ScilabEval(TCL_EvalStr(ScilabEval ...) ) constructs.

See inner details of the new Tcl thread here.

Moreover, Scipad is no longer aware of when Scilab is busy or not. The sciprompt synchronisation mechanism between Scilab and Tcl has been lost.

Besides, the operational team of the Scilab consortium reported recently that the breakpointing system (pertaining to the Scilab parser) is also somewhat at fault in the new code architecture of Scilab 5. Their statement is:

"We thought that the problem was due to the TCL event loop reorganization, but it was not. Actually, if Scipad debugger does not work in current version, it is because of Scilab internal loop which remains blocked in some cases such as a breakpoint in the code to debug. In previous versions, a timer ensured all the commands in the "storecommand queue" were read and executed (hence make Scilab run again when in pause mode) even if Scilab was paused inside the parsing loop. In current version, when Scilab is blocked inside its parser, there is nothing that can wake it up."

1.7. Now how to make the debugger work in Scilab 5, given the loss of reentrancy and Scilab loop lock compared to Scilab 4?

1.7.1. Observations

1.7.1.1. Observation 1

Despite what is stated at the bottom of Tcl Thread, this reentrant call works well in Scilab 5:

TCL_EvalStr("ScilabEval {TCL_EvalStr(""set h hello"")} ""seq"" ")

Why? Two possibilities I see:

This working contraption is a bug
- This wouldn't help...
Tcl Thread states things that are not correct or accurate enough
- Then this construct could lead to a solution.

Even this is working:

TCL_EvalStr("ScilabEval {TCL_EvalStr(""ScilabEval {TCL_EvalStr(""""set h hello"""")} """"seq"""" "") } ""seq"" ")

Note that recent hints from commits and information from the opteam lead to think that TCL_EvalStr is now working while a TCL_EvalFile instruction is executed. Not sure that this answers observation #1 though.

1.7.1.2. Observation 2

Why is there a number of TODO comments in ScilabEval.c, such as:

// TODO : Scilab is supposed to be busy there. Add mutex lock...

Is the work described in Tcl Thread really finished?

1.7.1.3. Observation 3

Perhaps Tcl Thread could explain why a mutex commandQueueSingleAccess was needed in StoreCommandWithFlag and in GetCommand, and what is the LaunchScilab signal (see in dynamic_menus.c)

1.7.1.4. Observation 4

<TODO>

1.7.2. Proposals for a solution

Give ScilabEval an option for allowing return to Tcl only at completion (and not if a pause or a set breakpoint is reached).
Give ScilabEval an option "interrupt" allowing Tcl to break arbitrarily a Scilab execution in progress, putting the interpreter in pause mode.
Provide the Tcl thread with a command capable of setting a lock on the Scilab interpreter.

To ensure strict sequentiality of the interleaved Tcl - Scilab - Tcl commands, the pattern (in Tcl) could be:

ScilabLock #only Tcl can push commands to the interpreter
TCLcriticalcommand1
ScilabEval scilabinstruction seq # ScilabEval returns only after completion
                             # *OR* when execution reaches a pause or a breakpoint
TCLcriticalcommand 2
ScilabUnlock #now the user can interact again with the shell

Add more Tcl variables like $sciprompt, accessible at at any time by the Tcl interpreter - for instance, the information usually provided by where() and lasterror().

This might avoid some round trips to the Scilab interpreter, just for the sake of knowing how far the execution is. The Tcl thread does not need to change but only to read these variables, which could ease access locking issues.

Add Tcl instructions for getting/setting a string Scilab variable from the Tcl thread.

Say, ScilabGetVar and ScilabSetVar, dual of TCL_GetVar and TCL_SetVar.

This also would avoid a roundtrip ScilabEval(TclSetvar...), leading at least to a simplification in some writeups.

Let ScilabEval return the result of evaluation of its argument by Scilab

This would allow replacing some of the recursive constructs, e.g.:

ScilabEval "TCL_SetVar(\"errline\", msprintf(\" %d\",db_l), \"scipad\");" "sync" "seq"

would become:

set errline [ScilabEval "msprintf(\" %d\",db_l)" "sync" "seq"]

This could at least be implemented when the result of ScilabEval evaluation is a string.

Don't use a second parser for sync evaluation.

Put the sync callback instruction in the queue BUT in the "first to use position" (instead of the "last to use position" that is used for non-sync callbacks). This way as soon as Scilab is in a state compatible with the callback execution it will start the stored sync callback before any other.

Analysis
- Actually this would not provide any improvement because in such an implementation ScilabEval sync would no longer be synchronous.
  Instead, ScilabEval sync would queue a command for later execution, but would not execute it immediately and wait for the result. The problem is I need this result right now, not later.
  In fact this proposal would convert synchronous execution into asynchronous execution. Even if a callback launched with sync would be executed before anything else waiting in the queue, it would no longer be synchronous execution. And what is needed for Scipad at least definitely is synchronous execution. ScilabEval sync must not return before Scilab has finished evaluation.

Allow to use use ScilabEval sync at any time, and also and especially during a paused ScilabEval seq
<To be continued...>

Further brilliant as well as stupid ideas welcome since I have currently no other.