We distinguish two levels of syntactic data integration. At the lowest level is the set of basic (atomic) types supported (such as numbers, strings, identifiers, symbolic constants, etc.) and their encoding. At a higher level are the rules which govern the composition of basic types to produce structured objects.
Syntactic generality implies that the basic types are encoded using a common, well-defined format applicable to a majority of mathematical software systems, and that the rules governing the composition of structured data are relatively simple and result in dynamically typed data, i.e. the representation of an object includes sufficient type information to unambiguously decode it without any additional information. Examples of syntactically general math protocols include MathLink and ASAP.
Syntactic efficiency implies that syntactic transformations (intermediate transformations between different representations of the same object, e.g., conversions between different encodings - ascii, hex, binary - for numeric data, transformations between different formats for arbitrary precision numbers, or transformations between sparse and dense matrices or polynomial representations) and the amount of ``overhead'' data (e.g., type information) are minimal. Consequently, if the encodings of basic and structured objects are close to the internal representation of the target system(s), and if the rules governing the composition of structured data are object specific and statically defined (e.g., in a document and hardcoded into the interface code), then efficient syntactic data integration is easily accomplished. Examples of syntactically efficient math protocols include POSSOXDR and the protocol used for communications between Macaulay2's front end and its compute engine [#!sg:macaulay2!#].
The discussion above suggests a tension between generality and efficiency. Among others, generality implies that explicit type and syntactic information be provided with the object, while efficiency implies that the responsibility for supplying this information be shifted from the object itself to the interface code that will parse it. More simply, until now, the more explicit the type information, the more general, but less efficient, the data integration, and vice versa.
Is it important to have a protocol that provides general and efficient syntactic data integration? Yes! Often, mathematical software systems require communication facilities that are general for some applications and efficient for others. Consider the CAS SINGULAR [#!sg:singular2!#], for example. If SINGULAR is to be widely used as a component in a PSE to provide specialized and state-of-the-art standard bases computations or if SINGULAR requires the services of another CAS, then we clearly need a general protocol - being able to easily establish basic connectivity is of upmost importance for this purpose. On the other hand, there is the desire to use SINGULAR-based parallel and distributed computations to expedite the execution of (time-) complex algorithms. So, there is a real need for an efficient protocol, since for parallel, and particularly distributed parallel, computations, the protocol's efficiency affects overall computation time as well as the grain size that can be supported. Without both, SINGULAR, as well as many other systems, would be required to have separate interfaces for general and efficient communications, which is clearly an undesirable duplication of development effort.
The next question follows naturally: Is it possible to have both? As we will show, the answer is yes! The main idea behind our solution is to provide mechanisms which implement the design philosophy of ``making the common case fast''; that is, support efficient data integration of ``common case'' data. On the lower level of basic object encodings, we realize this through negotiations of the basic data formats and on the higher level of structure encodings, we use a flexible and expressive data (type) description mechanism which supports very compact and natural encodings of a standard set of common data structures.
The next section introduces MP and discusses
the problem of general and efficient data integration within
the context of MP.
Sections 3 and 4 give details of our solutions for
low-level basic object encodings and high-level structure encodings,
respectively.
Section 5 introduces a general and efficient MP parser
which is based on the mechanisms described in sections 3 and 4 and
whose goal is to facilitate the development of MP
interfaces for software systems.
Finally, the last section summarizes our work.