SYMBOLICDATA's Compute environment is set out to realize the following three goals:
In this section, we present the main principles of the realization of these ambitious goals. See [9] for details, further explanations, examples and complete on-line documentation.
Analyzing the general nature of benchmark computations reveals dependencies on the following parameters:
The benchmark computations of SYMBOLICDATA are facilitated by the Perl module Compute and realized using
symbolicdata Compute [options] sd-file(s)Parameter specifications are given either by command-line options, or, often more suitably, by init-files. A benchmark run consists of the following stages:
The set-up and evaluation stage require communications between the Compute module and the Perl routines specified by the input COMP and CASCONFIG records. The given input and expected output of these external routines is well-defined and documented. To ease the addition of new computations and systems to the available benchmark computations, as much functionality is provided by first, the Compute module; second, the routines of the COMP record, and, third, by the routines of the CASCONFIG record. For example, the run-time error check specification of a CASCONFIG can be as simple as specifying a regular expression.
Based on the input file and shell command returned by the CASCCONFIG routines, the actual run of the computation itself is fully controlled by the routines of the Compute module. For reliability reasons, timings are measured externally based on the GNU time program. While the actual computation is running, the symbolicdata program ``sleeps'' until either the computation finished, or the maximal (user plus system) time allowed for a computation expired. In the latter case, the running computation is unconditionally interrupted (killed) such that a following evaluation of the computation recognizes a ``maxtime violation''. Furthermore, if a run of the computation took less than a minimal (user plus system) time required, the computation is repeated until the sum of the times of all runs exceeds the bound, and the reported time is then averaged. Notice that the measured computation times include the times a system needs for start-up, input parsing, and output of result. While one could argue that these operations do not really contribute to the time of the actual computations, we did not separate out these timings (at least for the time being) for the following reasons:
The information about a particular benchmark computation is collected into a record of the type COMPREPORT which stores all input parameters and results, i.e., error and verification status, timings, output, etc., of the computation. Where applicable and requested, records of the COMPRESULT table are used to collect system independent, verified, and ``trusted'' results of computations. These COMPRESULT records may be extracted from one or more COMPREPORTs and may be used for further verifications and computations of invariants.
Running automated benchmark computations may quickly produce voluminous amounts of output data. Hence, we need mechanisms which effectively maintain and evaluate this data:
First, note that this is a classical data base application. We are in the process of developing tools to translate benchmark data to SQL and to store them in a classical data base. However, even as data base application, the management of benchmark data is still rather challenging since benchmark data combines records, software, machines, algorithms, implementations, etc. into a high dimensional ``state space'' which needs to be analyzed.
Second, note that only tools to analyze benchmark data are not enough. To effectively compare benchmark runs we need standardized and widely accepted concepts and methods to statistically evaluate this data under various aspects. The EvalComputation module provides a first solution attempt. Since a detailed discussion of the involved aspects would go beyond the scope (and frame) of this paper we refer to www.SymbolicData.org/doc/EvalComputations/ for a starting point for further thoughts and discussions.