Project Dana

Dana has a performance profiler built in to its runtime. This can be useful in understanding why a program might be slow and how that program could be optimised.

The profiler is activated using the -pp command-line option to the runtime, specified before the compiled main component file that you are going to run:

dana -pp MyProgram.o

When the program ends, the interpreter will print out a detailed profile of that program's overall execution time.

To examine a real profile, consider the example program:

component provides App requires io.Output out, data.StringUtil stringUtil {

   void doThing()
      {
      for (int i = 0; i < 100000; i++)
         String s = new String()
      }
	
   int App:main(AppParam params[])
      {
      doThing()
	  
      String parts[] = stringUtil.explode("r.a.k.s.e.b. .e.h.t. .n.i. .e.m. .t.i.h. .t.i.", ".")
	  
      for (int i = parts.arrayLength-1; i != INT_MAX; i--)
         out.println(parts[i].string)
      
      return 0
      }
   
   }

If we save this program in a file Test.dn, we compile it as normal and then run it as:

dana -pp Test.o

When the program finishes we get the following profile:

 -- GC total invocations --
reach scans: 0
cycle scans: 0

 -- function call totals --
Format: <name>: <total call #>, <total time> ms, <gc rc_scn>, <gc cyc_sc>
Test.o doThing: 1, 297, 0, 0
machine>output-println: 23, 63, 0, 0
StringUtil.o>explode: 1, 15, 0, 0
StringUtil.o>isAnyOf: 4, 15, 0, 0
List.o>clearList: 1, 0, 0, 0
List.o>getNext: 23, 0, 0, 0
List.o>getFirst: 1, 0, 0, 0
List.o>getLength: 2, 0, 0, 0
List.o>add: 23, 0, 0, 0

 -- function call hierarchy --
Format: <name>: <total # calls> c, time <total> ms, time <sub_calls> ms
Test.o>main: 0, 0, 375
  doThing: 1, 297, 0
  machine>output-println: 23, 63, 0
  StringUtil.o>explode: 1, 15, 15
    isAnyOf: 46, 15, 0
    List.o>getNext: 23, 0, 0
    List.o>getFirst: 1, 0, 0
    List.o>getLength: 2, 0, 0
    List.o>add: 23, 0, 0
  List.o>destroy: 0, 0, 0
    clearList: 1, 0, 0

The profile has three sections: garbage collection, total function time (flat), and function call time (nested). The different parts of this trace help you to understand where the slowest parts of your program were, and so where the most valuable areas are to work on performance improvements.

The garbage collection statistics are printed first. Dana uses an on-demand garbage collector which is invoked when an object drops a reference count to unmanaged memory, if that unmanaged memory is believed to potentially contain cycles in its reference graph. There are two levels of garbage collection statistics: "reach scans" are how many times the garbage collector needed to check if an entity is a potential candidate for cleanup by walking over its fields; "cycle scans" are how many times the garbage collector was obliged to then perform cyclical reference analysis, which is an expensive procedure. Cycle scans occur when memory is categorised as unmanaged, and when the data type(s) involved could contain circular references but have not been annotated with nocycle. Both reach and cycle scans should be minimised to improve performance; to do this the programmer can try to avoid data types that may have reference cycles being passed to non-creating objects of those data instances, and/or can mark data types with nocycle if cycles appear possible from static analysis of a data type but are known to never occur in practice.

Next, the total function time list provides a sorted, flat list of which functions took the most cumulative time when running the program. These statistics include both interface-implementing functions and local function calls within objects. This list can be useful to provide a quick sense of where to start looking for performance issues. This list also provides a breakdown of where garbage collection invocations happened: the gc count here is when the garbage collector was asked to decrement a reference to any entity, and is not itself a performance concern; the cc count is when the garbage collector was then obliged to perform a full cycle check (because the memory was categorised as unmanaged and the data type(s) involved were not annotated as nocycle).

Finally, the nested function call analysis provides the most detailed profile information and can be used to pinpoint exactly where performance problems are. This list is also sorted in order of time (high to low), and shows how much time each function took at each level of the call graph, along with how much of that time was due to sub-function calls. As an example of this we can see that the 'explode' function took 15ms, but that all 15ms of this time were spent in sub-function calls, which we can see below is entirely cause by the 'isAnyOf' function. For each call graph level in this section of the profile, Dana sums together the total time taken by all instances of that function when used at that same level of the call graph, and also provides information on how many calls have been counted.

Note that function timing statistics in general are not thread-aware: if a function is executed in parallel by multiple different threads, the total time executing that function will be a cumulative one across all such threads. In this way a function can sometimes appear to have a cumulative execution time that is longer than the total execution time of the program.