RobIn for IOI I/O
===== === === ===

by Tom Verhoeff (Chair IOI Scientific Committee)

August 2000, Version 0.3 (incomplete unreleased draft)
Documents RobIn Version 0.3

Text bracketed by <...> mostly indicates unfinished business!

Contents
--------
  Introduction
  Historical Intermezzo
  I/O in Turbo Pascal and Turbo C/C++
  IOI-Style Meta-Format Restrictions
  User Requirements for RobIn
  How to Use RobIn
  How to Interpret RobIn Messages
  Design and Implementation of RobIn
  Limitations of RobIn
  Other Tools
  Concluding Remarks
  Acronyms
  Sample Programs

Introduction
------------
RobIn is a software package featuring a Turbo Pascal Unit for robust
input from text files with an IOI-style format (IOI = International
Olympiad in Informatics).

In this document, I provide some motivation for RobIn and I explain
its use and design.  Just in case you want to see it immediately,
the main implementation source file is named "RobIn.pas", its usage
is explained in the section "How to Use RobIn", and "Example.pas"
and "ValidEx.pas" are programs that illustrate how to use RobIn.

RobIn simplifies the development (in Turbo Pascal) of

  *  /input validators/, viz. programs that check the validity of the
     input files produced by the organizers, and

  *  /output checkers/, viz. programs that check the format and/or
     correctness of output files produced by the competitor programs.

The description of an IOI programming task defines the format of every
input and output file involved.  The IOI organizers are responsible for
producing valid input files, and the competitors may assume that the
input files are valid, that is, their programs may crash on invalid
input without being penalized for it.  Likewise, the competitors are
responsible for delivering programs that produce correct output files.
When grading the programs, format restrictions are enforced
in a more relaxed way for output files than for input files.

In the IOI competition, the file formats are kept simple because

  *  the IOI focuses on algorithms and not on I/O;

  *  the same format must be both easy to read and write "by" all
     of the IOI programming languages, and also by the competitors
     (the latter mainly through the editors built into the IDE for
     each programming language).

The competitor programs must read the input files and must write the
output files.  We do not want competitors to waste their time on programming
input parsers, when the task is really about some complex graph algorithm.
The competitors most likely must also somehow put together various input
files for testing and they must also inspect the resulting output files.
This is often done "by hand" through an editor and dumping to the screen
or even printing on paper.

The organizers of an IOI have the responsibility to define file formats
and to verify them.  Developing an input validator or output checker
forces one to think carefully about the format involved.  This turns
out to be a great help in formulating the format definitions.


Historical Intermezzo
---------- ----------
Traditionally, the algorithms developed by the IOI competitors are
delivered as executable programs and they are graded by executing them
for a (necessarily small) set of test cases.  They are not graded by
inspecting the algorithm designs or program texts.  In the scientific
and engineering world, it has long been known that execution-based
testing is a very poor way of verifying the performance
characterictics of algorithms (such as correctness, speed, and memory
usage).

Still, there are some (compelling?) reasons for execution-based
testing at the IOI:

  *  the language barrier,

  *  lack of agreement on alternatives, and

  *  executable products need to be demonstrated by execution.

At the IOI, the competitors are high-school students and they are not
expected to be capable of adequate technical communication with
foreign organizers and evaluators, neither written nor oral.  This
is known as the language barrier.  Therefore, the tasks are translated
to the competitor's native language before the competition.  At the
IOI, a translation step in the opposite direction, from competitor
to organizer, is avoided by requiring the delivery of machine-executable
products.  (By the way, I still think the IOI should look for ways of
expanding the evaluation process to include non-execution-based
methods of verification.)

Another reason for execution-based testing at the IOI is that it
apprears hard to agree upon alternatives.  What constitutes a
design of an algorithm?  How should such a design be presented and
judged?  We have no international notation for algorithms, like they
do in mathematics.  (Though we should be able to get a long way;
after all, the delegation leaders have little trouble discussing
the tasks, and approaches and obstacles to solutions.)

Finally, a correctness proof alone is not good enough for an executable
product.  Obviously, we need to see the thing "fly" to believe it.
(The IOI hovers dangerously between theoretical computer science
and software engineering.)

For execution, the algorithms somehow need to be provided with various
input parameters, and the resulting output needs to be observed.  Since
IOI'94 in Sweden, such I/O is done through text files.  All input
parameters are offered in a text file, even if this concerns just a
single integer.  All output must be written to a text file.  The text
file interface provides a convenient abstraction from the programming
language used to express the algorithm.  Note that the IOI has always
allowed a choice among multiple programming languages (for lack of
agreement on a single language).

Alternatives to (the direct use of) text files are

   *  task-specific libraries that competitor programs must use,

   *  tasks that ask for parameterized routines instead of programs.

The organizers of an IOI can develop special-purpose libraries
to encapsulate I/O.  Such a library provides a means of I/O on
the language level: variables or parameters with native types
can be directly used; there is no need for formatting and conversions.
This use of libraries is now common practice for reactive programming
tasks.  There is still the issue of restrictions on the use of
the library (its interface definition), in particular, also
restrictions on the calling order of library routines.

Another alternative is to require that the competitors develop a
routine with input and output parameters in the programming language
of their choice.  For evaluation, the routine is embedded into an
evalution framework to produce an executable program.  You might
say that here the competitors develop a library consisting a single
routine.  This may work better in the case of source code submission
than submission of object files.

The major disadvantage of both alternative approaches is that the
details of the interface now depend on the programming language.
Even with only two olympic programming languages, this is quite
a burden on the organization.  (But this does not stop us from
using it for reactive tasks.)

It seems to me that the current situation at the IOI reflects the
current state of computer science in general quite well.  We have
come a long way, but we are not "there" yet.


I/O in Turbo Pascal and Turbo C/C++
--- -- ----- ------ --- ----- -----
Until IOI2000, the IOI programming languages are

  *  Turbo Pascal 7.0

  *  Turbo C/C++ 3.0

(After IOI2000, we move to other Pascal and C/C++ compilers.)

Let me consider some of the subtleties of text I/O in these languages,
and in particular text input.  First, some general remarks about text
files on the MS-DOS/Windows platform are necessary.

The character set used for MS-DOS/Windows text files is Extended ASCII
(256 8-bit codes).  Of these, the "Standard" ASCII codes (128 7-bit
codes from 0 through 127, also called Low ASCII) include

   *  33 control codes (0 through 31, and 127),
   *  1 blank (SP, space, code 32),
   *  94 printing characters (Latin uppercase and lowercase letters,
      digits, and various symbols, with codes 33 through 126).

The extended codes (128 through 255, also called High ASCII) cover
adorned letters (with various accents for non-English languages), some
graphics and special symbols.  Of the control codes, the following are
worth mentioning:

   BS   Backspace,            Ctrl-H, ^H, code   8
   HT   Horizontal Tab, Tab   Ctrl-I, ^I, code   9
   LF   Line Feed,            Ctrl-J, ^J, code  10
   CR   Carriage Return,      Ctrl-M, ^M, code  13
   SUB  Substitute character, Ctrl-Z, ^Z, code  26
   ESC  Escape,               Ctrl-[, ^[, code  27
   DEL  Delete, Rub out,                  code 127

On the MS-DOS/Windows platform, the two-character combination
CR/LF (codes 13, 10) encodes the transition to a new line (end-of-line
in Pascal parlance, newline in C terminology).  Note that under Unix,
this is accomplished by a single LF, whereas under the MacOS, a single
CR is used.  (On the platform-less web you can often see the disastrous
consequences of these differences.)

Under MS-DOS/Windows, the end of a text file is encoded by SUB
(Ctrl-Z, code 26).  The other control codes, with the possible
exception of the Tab (HT, Ctrl-I, code 9), nowadays have no generally
accepted role in text files.  The characters CR/LF, HT (tab), and
SP (blank) are collectively known as /whitespace/.

Here are some Turbo Pascal (TP) text I/O details.  For convenience,
we assume the following variable declarations:

  var f: Text; c, d: Char; i: Integer; s: String;

  TP01. Turbo Pascal does not implement the file window (in ISO Pascal
        denoted by f^) to look one character ahead into the input stream.
        It also lacks the get (and put) routines.  All I/O must be done
        through the (standard) eof/eoln, read/readln, and write/writeln
        routines.

  TP02. Reading a single character c by read(f,c) does NOT skip leading
        whitespace.  Skipping of leading blanks must be accomplished,
        for example, by "repeat read(f,c) until c<>' '".  A single
        leading blank can be skipped by "read(f,d,c)" and ignoring d.

  TP03. Reading a single character when eoln(f) is true, returns chr(13)
        (=CR), instead of ' ' as required by the ISO Pascal Standard.
        Furthermore, the next character read after CR will be chr(10)
        (=LF).  Note that a single call readln(f) consumes both CR and LF.

  TP04. Reading a single character when eof(f) is true, does NOT raise
        a run-time I/O error (as suggested by the ISO Pascal Standard)
        and returns chr(26).

  TP05. Reading an integer by read(f,i) will skip all leading whitespace
        (blank, tab, end-of-line); the read then consumes ALL subsequent
        non-whitespace characters until another whitespace character is
        encountered.

        A run-time I/O error (106: invalid numeric format) is raised
        when the non-whitespace characters violate the (Pascal) numeric
        format.  Trailing whitespace (incl. end-of-line) is never
        consumed, unless calling readln(f,i), which is equivalent to
        read(f,i);readln(f).

  TP06. Reading a variable-length string by read(f,s) does NOT skip
        leading whitespace.

  TP07. Reading characters for a variable-length string is stopped
        by the maximum length of the string variable or the first
        end-of-line encountered.

        The maximum length for a variable-length string is 255
        characters.  A trailing end-of-line is never consumed,
        unless calling readln(f,s).

  TP08. Readln(f) consumes upto and including the next CR, and
        subsequent LF if present.

  TP09. If the file's last line is not terminated by an end-of-line,
        then eof(f) and eoln(f) become true simultaneously, and NOT,
        as the ISO Pascal Standard requires, first eoln(f) and after
        readln(f) then eof(f).

        Under the Pascal Standard, a text file consists of a sequence
        of zero or more /lines/, terminated by end-of-file (recognized
        by function eof), and each line consists of a sequence of zero
        or more /characters/, terminated by end-of-line (recognized by
        function eoln).  Thus, the only case where an end-of-file is
        not preceded by an end-of-line occurs in an empty file.

        Also, according to the Pascal Standard, eoln should not be
        called when eof returns true.  The subtle difference between
        Turbo Pascal and the Pascal Standard is illustrated by the
        two character-and-line counting programs in the appendix
        "Sample Programs".

  TP10. Writeln(f) produces CR followed by LF.

  TP11. Closing an output file appends the ending Ctrl-Z.

Here are some details concerning the Turbo Pascal Editor (built into
the Turbo Pascal IDE; the built-in Turbo C/C++ editor is very similar):

  IDE1. It is NOT possible to add blanks at the end of a line.
        (Hit the END key to find the "real" end of a line.)

  IDE2. When opening a file, any blanks at the end of lines are
        silently discarded (and lost when the file is saved).

  IDE3. The last line (accessible by the cursor) is NOT written
        with a terminating end-of-line when saving the file.

  IDE4. When opening a file with its last line (properly) terminated
        with an end-of-line, it will show up as a trailing empty line
        (accessible by the cursor).

  IDE5. Fixed tab stops are set every 8 positions in columns 1, 9, 17,
        etc.  Hitting the TAB key will insert an "appropriate" number
        of blanks.  To enter a real Tab (TAB character) in the file,
        use the sequence ^P^I (Ctrl-P, Ctrl-I).  Tabs in a file are
        visually not directly discernable from blanks.

  IDE6. Control characters (other than HT, CR/LF) are shown as "funny"
        graphics symbols.

Now let me deal with some Turbo C/C++ (TC) peculiarities (these
are mostly in agreement with the ANSI C Standard).  For convenience,
we assume the following variable declarations:

  FILE *f; char c; int i; char s[255];

  TC01. Formatted input can easily be done through fscanf with a format
        string and a sufficient number of result arguments (or via fgets
        followed by sscanf).  It can also be done through "cin >> ...;";
        we consider fscanf in the following.

  TC02. Reading a single character by fscanf(f,"%c",&c) does NOT skip
        leading whitespace by itself.  All leading whitespace can be
        skipped by including a blank directly preceding the format
        specifier: fscanf(f," %c",&c).

  TC03. Reading a single character (format "%c") when end-of-line is
        ahead, will return LF (code 10, '\n'), and it skips both CR
        and LF.  Note that putting "\n" in the format string will
        skip ALL whitespace at that point.

  TC04. Function fscanf will NOT read the end-of-file character, instead
        it will return a count less than the number of arguments to read,
        indicating premature termination.

  TC05. Reading an integer by fscanf(f,"%d",&i) will skip all leading
        whitespace; the read consumes only characters that agree with
        the integer syntax and stops at the first character that does
        not match.

  TC06. Reading a variable-length string by fscanf(f,"%s",s) will skip
        all leading whitespace.  This is not the case when using gets(s)
        or fgets(f,s,i).  Note that s should be big enough to accommodate
        all characters read AND the terminating null '\0'.  Therefore
        it is recommended to use fscanf(f,"%255s",s) when s can
        accommodate 256 characters (255 non-nulls).

  TC07. Reading a variable-length string is stopped after reading all
        (at least one) non-whitespace characters, by the first
        whitespace character (which is not consumed).  In particular it
        will never skip trailing end-of-lines, and the returned string
        will never include blanks and will never be empty.

        Again, this is not the case for gets/fgets.  (A subtle difference
        between gets and fgets is that the latter will put the terminating
        newline '\n' into the string returned, whereas gets reads the
        newline and discards it.

  TC08. Reading with fscanf(f,"\n") skips all whitespace at that point,
        including end-of-lines, but also blanks and tabs.

  TC09. End-of-lines are faithfully reported as they appear in the file,
        the end of the file may or may not be preceded by an end-of-line.
        Using fscanf it is not possible to distinghuish between
        end-of-file, an I/O error, or a format error.

  TC10. Calling fprintf (f,"\n") produces both CR and LF.

  TC11. Closing an output file appends the ending Ctrl-Z.

Concerning output, here is an anecdote.  The output format required
a text file with a number N (1<=N<=100) on the first line, and
on the second line a list of N numbers in the range 0..9, separated
by single blanks (no blank before the first item).  This format is
in agreement with the IOI-style meta-format rules.  There are various
ways to produce the output.  Below is some Turbo Pascal code to put
on the dots (...) in the following framework:

  program NumberList;

  var N: 1..100; L: array [1..100] of 0..9;
  ...
  begin
    "produce N and L"
  ; writeln ( N )
    ...
  end.

My preferred solution:

  var i: 1..100; sep: String; { or sep: String[1] if you like }
  ...
  ; s := ''
  ; for i := 1 to N do begin
      write ( s, L[i] )
    ; sep := ' '
    end { for i }
  ; writeln

But you can also get away with

  var i: 1..100;
  ...
  ; for i := 1 to N do begin
      if i > 1 then write(' ')
    ; write ( L[i] )
    end { for i }
  ; writeln

or

  var i: 1..100;
  ...
  ; for i := 1 to N do begin
      write ( L[i] )
    ; if i < N then write(' ')
    end { for i }
  ; writeln

Here are three incorrect fragments (to be enclosed by "var i: 1..100;"
and "; writeln"):

  ; for i := 1 to N do write ( L[i] : 2 )

  ; for i := 1 to N do write ( ' ', L[i] )

  ; for i := 1 to N do write ( L[i], ' ' )

All of them are incorrect because they produce one additional blank.
The first two at the line begin, and the third one at the line end.
An additional weakness of the first fragment, is that changing the range
of the elements in the list L, say from 0..9 to -1..10, will cause
-1 and 10 to miss a preceding blank.  Note also that the output of
the third approach appears to be good when viewed in a Turbo editor
(because that editor discards blanks at the end of a line), or
when dumped to the screen or printed on paper.

At IOI'95 it happened that (at least) one competitor had opted for
the third approach.  This person apparently was bothered by the
trailing blank and changed the finishing writeln into writeln(chr(8)),
that is, included a backspace (BS) "to erase the superfluous blank".
When dumped to the screen, the output does indeed look good.  The IOI'95
evaluation software, however, did not accept this, because it rejected
control codes other than end-of-line and end-of-file.  The IOI'95
evaluation software would have ignored the trailing blank, because
the output format was not strictly enforced (knowing that competitors
make stupid I/O mistakes)!


IOI-Style Meta-Format Restrictions
--------- ----------- ------------
The restrictions on the format of text files given below are inspired
by the language peculariaties explained in the preceding section,
They are intended to make I/O easy for the built-in routines of
the IOI programming languages, and for human use.

The format of input and output text files is restricted by the
following /meta-format rules/:

  MF01. An IOI text file consists of a sequence of ONE or more
        NONempty /lines/.  The sequence is terminated by end-of-file
        (Ctrl-Z).

  MF02. Each line consists of a sequence of ONE or more /items/.
        This sequence is ALWAYS terminated by an end-of-line
        (CR/LF), in particular also the last line.

  MF03. Each item is one of the following:

          /character item/: a single printable NON-blank character
             (codes 33 through 126)

          /integer item/: an integer number in decimal notation,
             possibly negative (preceded by '-'), no leading '+',
             at most 32 bits

          /string item/: a NON-empty variable-length string of
             at most 255 printable NON-blank characters
             (variable-length here means that the length possibly
             cannot easily be determined from the format or the
             preceding text)

  MF04. Adjacent items on the same line are either /contiguous/
        (that is, without any extra characters between them), or
        /separated/ from each other by a SINGLE blank.
        The definition of the format (in the Competition Rules
        and the task description) must clearly state the item
        composition convention.

        MF05 through MF12 concern restrictions on item compostion.

  MF05. A character item must be separated by a single blank from a
        directly preceding number item on the same line.

  MF06. It is discouraged that character items are preceded by a separating
        blank.

  MF07. It is discouraged that character items appear on the same line
        as number items.  (Characters preceding integers are not
        an issue concerning points 5 and 6; only for an integer preceding
        a character are points 5 and 6 in conflict.)

  MF08. A number item is always separated by a single blank from a
        directly preceding number item on the same line.

  MF09. A variable-length string item always appears at the end of a line.

  MF10. It is discouraged that a variable-length string item appears
        with other items on the same line; that is, preferably it occurs
        on a line by itself.

  MF11. A variable-length string item must be separated by a single blank
        from a directly preceding number item on the same line.

  MF12. There are no blanks at the begin or end of a line.

  MF13. The format must be such that type of every item and the
        occurrence of every end-of-line and end-of-file can easily be
        deducted --- before they appear --- from the format definition
        or preceding text (possibly from another file).  Explicit
        end-of-line or end-of-file checking should NEVER necessary,
        not matter which IOI programming language one uses.
        Below are some examples to illustrate this rule.

  MF14. There are no tabs and other control codes (besides CR/LF and Ctrl-Z)
        in text files. There is no real/float number item (reals/floats
        could still be treated as a sequence of characters or a string,
        but the appearance of floating-point arithmetic in tasks is
        frowned upon in the IOI).

<TO BE DONE: tracing MFxx to TPxx, IDExx, TCxx>

Format definitions provided by the IOI organizers must respect the
meta-format rules stated above, and all input files must be in /strict/
agreement with the promised input format.  When checking output files
produced by the competitor programs in order to determine scores,
it is strongly recommended that the format is not enforced strictly,
but rather in a /relaxed/ manner.

Note that the Competition Rules for all IOIs up to IOI2000 have formulated
some input format restrictions, but those formulations have always
been less restrictive than the meta-format rules that I have given above.

Here are some examples to illustrate rule MF13.  The following format
definitions violate rule MF13.

  1. Input file ... contains at least one and at most 20 lines.
     Each line contains three nonnegative integers less than 5000.

  2. The first and only line of input file ... contains at least
     one and at most 1000 characters that are uppercase letters.
     There are no blanks between the characters.

  3. The first line of input file ... contains at most 100 integers
     I with 10 <= I <= 99, followed by a single character '*'.

The first and second format require end-of-file and end-of-line
checking, respectively, to be read.  In C/C++, the second format
can be handled without explicit end-of-line checking, because C/C++
does not have the 255 character limit on variable-length strings.
When using the so-called "extended syntax", Turbo Pascal can also
read null-terminated strings (but this is not Standard Pascal,
and I don't find it required IOI knowledge).

The third format has the problem that it is not known what type
of item to read after the first item (which is always an integer).
Thus, it requires looking ahead into the input stream to detect
the terminating 'A'.  If that look-ahead character turns out to
be digit than it must be combined with the following digits to
make up another integer.  In Pascal, such a look-ahead character
cannot be pushed back into the input stream.  In C/C++, this can
be accomplished (though it is not recommended), but I would not
consider that required IOI knowledge.  The total length of the
input line can exceed 255.  Hence, reading it into a string is
not an option for Turbo Pascal (in my opinion).

Here are similar format definitions that do respect rule MF13.

  1. Input file ... contains at least one and at most 21 lines.
     Each line, except the last, contains three nonnegative integers
     less than 5000.  The last line of the file contains a single -1.

  2. The first line of input file ... contains at least 1 and
     at most 1000 characters that are uppercase letters, followed
     by a single '*'.  There are no blanks between the characters.

  3. The first line of input file ... contains at most 100 integers
     I with 10 <= I <= 99, followed by a single 0.

Although these formats are acceptable, their readability can be
further improved as follows:

  1. The first line of input file ... contains one integer N with
     1 <= N <= 20.  The following N lines contain three nonnegative
     integers less than 5000.

  2. The first line of input file ... contains one integer N with
     1 <= N <= 1000.  The second line contains N characters that
     are uppercase letters.  There are not blanks between the characters.

  3. The first line of input file ... contains one integer N with
     0 <= N <= 100.  If N is positive, then there is also a second
     line, containing exactly N integers I with 10 <= I <= 99.

For the third format, now the question is raised whether the case
with N = 0 really needs to be included at all.  Furthermore, one
could argue that the human writability of the formats has deteriorated,
since the number of items needs to be counted explicitly.


User Requirements for RobIn
---- ------------ --- -----
Here is a list of requirements that I had in mind when designing RobIn:

  UR01  RobIn shall NOT CRASH on errors but report them.

  UR02  RobIn shall be EASY TO USE for developing programs that check
        text files for adherence to IOI-style formats.

  UR03  RobIn shall be FLEXIBLE enough to express all reasonable
        IOI-style formats that adhere to the Meta-Format Rules.

  UR04  RobIn shall ENFORCE the IOI-style Meta-Format Rules.

  UR05  RobIn shall have a RELAXED mode and a STRICT mode for checking
        a format.

  UR06  RobIn shall be usable for developing both INPUT VALIDATORS
        that report to the organizers, and OUTPUT CHECKERS that report
        to the contestants and their coaches.

  UR07  RobIn shall NUMBER all error messages uniquely and take
        them from a text file to simplify translation.

  UR08  RobIn shall support TASK-SPECIFIC checking and error messages.

As a development requirement I order the importance of some software
qualities as follows:

  1. First comes correctness

  2. Second comes usability (simplicity, generality)

  3. Third comes maintainability

  4. Fourth comes portability (though I have not given it much thought)

  5. Only in fifth place comes performance (speed, memory)


How to Use RobIn
--- -- --- -----
RobIn is a Turbo Pascal Unit that provides routines for reading text
files and extracting values from them, while checking

  *  that the format (especially, order and composition of items, and
     the line structure) adheres to the meta-format rules, and

   * that the file content adheres to the format (including range
     restrictions).

RobIn is "robust" and does not crash on "bad" text files, but instead
provides detailed diagnostic messages.  The messages are numbered and
taken from message files to simplify translation and use by non-English
organizers, coaches, and competitors.

RobIn is intended to be used by organizers to construct tools that,
in turn, are used by organizers (to validate input files) and
competitors and coaches (to check the format/correctness of output
files).  This section is addressed to the organizer who wishes
to construct a tool using RobIn.  Coaches and competitors may want
to skip to the section titled "How to Interpret RobIn Messages".

Make sure that you have ROBIN.TPU (obtained by compiling ROBIN.PAS
with TP).  Simply include "uses RobIn;" in your Turbo Pascal and
away you go.  Well, you need to know somewhat more.

RobIn provides the following routines (here shown without parameters):

   BeginLogging     To open the log file and load the messages
   EndLogging       To close the log file and unload the messages

   OpenRIF          To open the text file to be read and checked
   CloseRIF         To terminate reading and close the text file

   ReadCharRIF      To read a character item
   ReadIntegerRIF   To read an integer item
   ReadStringRIF    To read a variable-length string item
   ReadlnRIF        To read an end-of-line

   AssertRIF        To do a task-specific check
   IsAbortedRIF     To find out whether a format error occurred

   IntToStr         To help construct composite identifiers for items

The acronym RIF stands for "RobIn File".  The routines ReadCharRIF,
ReadIntegerRIF, and ReadStringRIF are collectively also denoted
by Read<Item>RIF.

Some restrictions apply to the order in which RobIn routines
are called.  Violating these order restrictions will not crash
RobIn, but may result in early termination, and an error message
that may not explicitly mention the out-of-order calling.

The order restrictions on calls to BeginLogging, EndLogging, and
xxxRIF can be captured by the regular expression:

  (
    1. BeginLogging
    2. ( "format described by program fragment using xxxRIF" )*
    3. EndLogging
  )*

where ( ... )* means "zero or more times the enclosed".

A /format/ is described by writing a Pascal program fragment containing
calls to OpenRIF, CloseRIF, Read<Item>RIF, and ReadlnRIF.  Typically,
the order of these calls should adhere to the regular-expression:

    1. OpenRIF
    2. (
         1. ( Read<Item>RIF )+
         2. ReadlnRIF
       )+
    3. CloseRIF

where ( ... )+ means "one or more times the enclosed".

Anywhere between OpenRIF and the corresponding CloseRIF, the format
program fragment can call AssertRIF to check a task-specific condition.
The program can call IsAbortedRIF any time after OpenRIF --- also after
CloseRIF --- to find out whether reading of the file content had been
aborted (due to a fatal error).  Note that checking of the format
itself against the meta-format rules is always continued until
CloseRIF.

No order restrictions apply to IntToStr.  It is just a "loose" function,
typically used to construct a parameter for Read<Item>RIF when reading
a sequence of items.

As an example, consider the following format definition:

  The first line of file 'EXAMPLE.IN' contains two integers N and K
  with  2 <= N <= 9  and  1 <= K <= N.  The second line contains
  N lowerercase letters ('a'..'z', contiguous).
  The following K lines each contain one nonempty string of at most
  20 uppercase letters ('A'..'Z').

This format is captured by the following program fragment (assuming
appropriate constant definitions and variable declarations; the full
program is Example.pas):

     OpenRIF ( inFile, 'EXAMPLE.IN', ... )

     { read first line }
   ; N := ReadIntegerRIF ( inFile, Separated, 'N', 2, 9 )
   ; K := ReadIntegerRIF ( inFile, Separated, 'K', 1, N )
   ; ReadlnRIF ( inFile )

     { read second line }
   ; for i := 1 to N do begin
       L[i] := ReadCharRIF ( inFile, Contiguous,
                             'L[' + IntToStr(i,1) + ']',
                             UpperCaseChars )
     end { for i }
   ; ReadlnRIF ( inFile )

     { read following K lines }
   ; for j := 1 to K do begin
       S[j] := ReadStringRIF ( inFile, Contiguous,
                               'S[' + IntToStr(j,1) + ']',
                               LowerCaseChars, 1, 20 )
     ; ReadlnRIF ( inFile )
     end { for j }

   ; CloseRIF ( inFile )

Note that Separated, Contiguous, UpperCaseChars, and LowerCaseChars
are predefined by RobIn.

Here is a corresponding fragment of Turbo Pascal program that reads the
input data from the input file (again assuming appropriate variable
declarations).  Note the great similarity with the fragment above.
A fragment of C is given further down.

     Assign ( inFile, 'EXAMPLE.IN' )
   ; Reset ( inFile )

     { read first line }
   ; Read ( inFile, N )
   ; Read ( inFile, K )
   ; Readln ( inFile )

     { read second line }
   ; for i := 1 to N do begin
       Read ( inFile, L[i] )
     end { for i }
   ; Readln ( inFile )

     { read following K lines }
   ; for j := 1 to K do begin
       Read ( inFile, S[j] )
     ; Readln ( inFile )
     end { for j }

   ; Close ( inFile )

or somewhat more compactly, as a competitor might code it:

     Assign ( inFile, 'EXAMPLE.IN' )
   ; Reset ( inFile )
   ; Readln ( inFile, N, K )
   ; for i := 1 to N do Read ( inFile, L[i] )
   ; Readln ( inFile )
   ; for j := 1 to K do Readln ( inFile, S[j] )
   ; Close ( inFile )

Here is a C version (one of many possibilities) to read the format
(assuming appropriate #include and variable declarations):

     inFile = fopen ( "EXAMPLE.IN", "r" );
     fscanf ( inFile, "%d%d\n", &N, &K ); /* \n needed (why?) */
     for ( i=1; i<=N; ++i )
       fscanf ( inFile, "%c", &L[i] );
     fscanf ( inFile, "\n" ); /* not needed (why?) */
     for ( j=1; j<=K; ++j )
       fscanf ( inFile, "%s\n", &S[j] ); /* \n not needed (why?) */
     fclose ( inFile );

<explain parameters of Read<Item>RIF routines: rif, comp, itemId,
and various bounds>

To simplify reading a list of separated items contained on a single line,
RobIn automatically adjusts the itemComp parameter to Contiguous
when reading the first item on a line.  Thus, the format

  The first line of ... contains one integer N with 1 <= N <= 20.
  The second line contains N integers X with 0 <= X <= 999.

is correctly captured by (assuming appropriate variable declarations)

    N := ReadIntegerRIF ( inFile, Separated, 'N', 1, 20 )
  ; ReadlnRIF ( inFile )
  ; for i := 1 to N do begin
      X[i] := ReadIntegerRIF ( inFile, Separated, 'X', 0, 999 )
    end { for i }
  ; ReadlnRIF ( inFile )

It is recommended that the format program captures bounds as close as
possible to the way they are given in the task description.  Consider
the following two equivalent format definitions:

  The first line of ... contains two positive integers N and K at
  most 100 and with K <= N.

  The first line of ... contains two integers N and K with
  1 <= K <= N <= 100.

Here are two corresponding format program fragments:

    N := ReadIntRIF ( inFile, 'N', 1, 100 )
  ; K := ReadIntRIF ( inFile, 'K', 1, 100 )
    { check K < N, if not report "K > N" }
  ; AssertRIF ( inFile, K <= N, AssertError, {M#}900,
                IntToStr ( K, 1 ) + ' > ' + IntToStr ( N, 1 ) )

and

    N := ReadIntRIF ( inFile, 'N', 1, 100 )
  ; K := ReadIntRIF ( inFile, 'K', 1, N )

These fragments are equivalent, except for the kind of diagnostics
they produce, and how the competitor can interpret the diagnostic
given the format definition.  Consider the input file with first
line "10 20".  The first program will give an AssertError of the
form "K > N (20 > 10)", and the second program will give a
FormatError of the form "Integer not in range (20 not in [1..10])".

Use of AssertRIF and IsAbortedRIF

<to be supplied>

The Structure of a Message Definition File

Each diagnostic message is identified by a number in the range 0..900.
The numbers in the range 0..899 are reserved for use within RobIn.
The numbers in the range 900..999 are available for task-specific
messages issued via AssertRIF by the using program and loaded
by BeginLogging.

The association between a message number and its text is recorded
in a Message Definition File (typical extension .mdf).  In a message
definition file, empty lines and lines starting with '#' are ignored.
These can be used for documentation.  Other lines should be of the
form

  <message number> <non-empty string of at most 50 characters>

Leading and trailing blanks are ignored.  The format of the message
definition file is checked when it is loaded (though in a way that
differs from the checking that RobIn does for RobIn files :-) and
warnings are generated when the message definition file is badly
formatted.

Diagnostic Categories

Every message is issued within a category.  Usually --- but not
necessarily always --- a message occurs in one category only.
However, some messages are more general and make (different) sense
in different categories (e.g. "Empty string").

The following table summarizes the "philosphy" behind the various
Error dianostics.  As with all error reporting, your mileage may vary,
since RobIn cannot know what the user's intentions were.  Compare this
to the art of interpreting compiler error messages.

  IOError         --> need to correct problem with file in OS
                      its existence, its location or name, its permissions
  InternalError   --> need to correct mistake inside RobIn
                      report this to me, with all evidence you can supply
  UsageError      --> need to correct mistake in program using RobIn
  MetaFormatError --> need to correct format definition
  FormatError     --> need to correct file content (general isolated issue)
  AssertError     --> need to correct file content (task-specific issue)

The distinction between UsageError and MetaFormatError is not always
as clearcut as one might hope.  Generally speaking, the format definition
need not change when correcting a UsageError.  Nevertheless, the format
may still turn out to be the real source of the problem.  Consider the
following format definition:

  The first line of input file ... contains two integers N and K with
  1 <= N <= 100 and 3 <= K <= N.

The straightforward format program fragment is

    N := ReadIntegerRIF ( inFile, Separated, 'N', 1, 100 )
  ; K := ReadIntegerRIF ( inFile, Separated, 'K', 3, N )

However, this may give rise to a UsageError "Empty integer range"
on the second ReadIntegerRIF call.  This is, for instance, the case when
reading was aborted (file does not exist), in which case only the
meta-format is checked.  The first ReadIntegerRIF then returns the
lower bound 1, making the second range empty.  This is clearly an
inconsistent read request.  What is the matter here?  Well, 1 is not
really the lower bound on N, instead the actual consistent minimum
value for N is 3, because of the constraint  3 <= K <= N .  It
is advisable to change the format definition in such a case.

Warnings

Warnings point to a "suspicious" situation or a discouraged practice.
Sometimes they can easily be explained and subsequently be ignored.
In other cases, it may turn out to be a "hard to catch" mistake.
Compare this again to compiler warnings.

<explain selection and limiting to tune diagnostic output
  BeginLogging: parameters diagSelection, reportingLimit>

<concurrent reading/checking of multiple files can be done

    Exception handling:

      *  We distinguish the follow exceptions:
           -  meta-format warnings, in case the format involves a
              discouraged practice
           -  meta-format errors, in case the format violates one
              of the IOI format restrictions
           -  format warnings, in case the content of the file being
              read violates the format
           -  format errors

      *  In principle, after the first error has occurred, it no
         longer makes sense to continue reading the file.
         However Meta-Format checking proceeds even if a Format Error
         has occurred.
>

< STILL INCOMPLETE >


How to Interpret RobIn Messages
--- -- --------- ----- --------
<what does competitor have to know about RobIn to understand its diagnostic
 output?>

<file being read and (intended) format begin processed need not by in sync,
 especially if eoln instead of blank or vice versa.

 coordinates:
   lineNr, columnNr are coordinates of nextChar within file,
   fmtLineNr, itemNr are coordinates of current item within format;
   nextChar looks AHEAD!

 current item may not have been completely handled by Read<Item>RIF,
 because it may still be followed by AssertRIF

 Integer item values are reported in decimal
 Character item values are reported as 'c', '''', or chr(#)
 String item values are reported as "c...c" (embedded '"' appear singly!)

 consequences of limiting diagnostics>


Design and Implementation of RobIn
------ --- -------------- -- -----
The implementation of RobIn resides in the file "RobIn.pas" and
contains comments as documentation.  Here are some very global
additional remarks.

Expressing the Format

I have chosen to express the format as a (Pascal) program fragment
that makes calls to the routines reading the elements in an order
appropriate for the format.  An alternative could have been to
introduce a new notation for expressing formats and to store the
format description in a separate file.  I think that my approach
is more flexible and easier to use directly (unless you hate
Pascal).

Type RobInFile

The file being checked is read via a variable rif: RobInFile, in
particular via the text file variable rif.f.  The variable rif
contains other fields besides f to capture the current state
of the reading and checking process, including some statistics.

The sequence of characters that "passes through" rif.nextChar is
exactly the contents of the file, with two exceptions.  The first
exception is that, under Relaxed Checking, every TAB characters is
replaced by a blank.  The second exception is that, under Relaxed
Checking again, if the last line in the file is not terminated by
an explicit end-of-line, then and EolnChar is inserted in the sequence.

The window rif.nextChar acts as a look-ahead character.  It is needed,
because Turbo Pascal lacks a (Standard Pascal) file window to look
ahead into the input stream.  The procedure StepNextChar advances the
rif.nextChar "window" one position.

The look-ahead character is always valid after opening a RobInFile.
Procedure StepNextChar gets the next character into nextChar.
When eof is encountered, nextChar is set to EofChar.
When eoln is encountered, nextChar is set to EolnChar.

RobIn parses items by reading characters.  The end of an integer item
is detected by hitting the end-of-line or by reading a non-digit.
The latter cannot be pushed back into the input stream and is stored
as look-ahead character.

Diagnostic messages

Diagnostic output is done through theLogFileP, which is a pointer
to a text file.  I use such a pointer for two reasons.  First of all,
that way the using program can "own" the file variable and still
tell only once to RobIn which file to use.  Second, it simplifies
switching on-the-fly between log files without having to open and
close them.  You can just move the pointers around.  For instance,
to get diagnostic output temporarily to standard output, simply
insert:

    theLogFileP := @ output
    { diagnostic output now goes to output }
    ...
    theLogFileP := @ logFile
    { diagnostic output now goes to logFile
      (logFile should be open for writing) }

Similarly, the selection of diagnostic messages can be changed
on-the-fly by assigning to theDiagSel.

All diagnostic messages produced by RobIn are preceded by "RobIn: ".
Subsequent lines are indented by two blanks (implementation
string constant Indent).

Diagnostic messages are formulated to state what is actually the
case, not what should have been the case.  For example, "Empty
string" (when some empty string was encountered where it should
not have been), rather than "Empty string not allowed".

Some diagnostic messages are followed by a /parenthetic note/, which
is NOT taken from a message file, but which is constructed depending
on the current situation.  For example, if a value is out of bounds,
the value and bounds are shown.  I have strived to use as little
English as possible in the parenthetic remarks, and preferably only
symbolic information, to avoid the need for translation.  Note that
only a limited set of diagnostic messages is relevant to the
competitors.

<<< There should be a complete list showing which notes can occur. >>>

Message numbers in the range 1 .. 255 are reserved for Turbo Pascal
run-time error messages.  Message numbers in the range 256 .. 899
are general messages of RobIn.

Type Compatibility Issue for Read<Item>RIF

The routines Read<Item>RIF have been implemented as functions.
They "should" have been implemented as procedures (like the predefined
read in Pascal), because they have (major) side-effects.  Here is a
motivation for my choice.

In case of read procedures, the result (the value read) would have to
be communicated via a var-parameter.  These var-parameters need to
have the "widest" type for the kind of item read, that is, Char for
characters, LongInt for integers, and String for variable-length
strings.  Turbo Pascal insists that the actual var-parameter has
exactly the same type as the formal var-parameter (and not a subtype).
There is no automatic type conversion when assigning the result value
in the procedure to the var-parameter.  (Note: The predefined read
procedures form an exception to this rule!)  Consequently, if you
have "var N: 1..100" and want to read a value using a
"procedure ReadInt ( var i: LongInt )", then "ReadInt ( N )"
is rejected, and you have to resort to the use of an auxiliary
variable: "var ri: LongInt; ... ReadInt ( ri ) ; N := ri".

On the other hand, for functions, the result type only needs to be
assign compatible --- not identical --- when used in an assignment
statement, and the value is automatically converted.  Thus,
"var N: 1..100; ... function ReadInt: LongInt; ... N := ReadInt"
is correct usage.

To simplify the use of RobIn (avoiding auxiliary variables), I have
sacrificed elegance (no side-effects in functions).

In What Order to Check Diagnostic Conditions?

The order in which routines check conditions to generate diagnostic
output poses some dilemmas.  Often, it does not make sense to continue
processing if a UsageError condition occurs, such as an empty
character range in ReadStringRIF.  In fact, it may be dangerous
or senseless to test for a UsageWarning condition if an error
is present.  For example, in ReadStringRIF, if range contains
a single character and minLength = maxLength, then I find that
a good reason to give a UsageWarning (because at this point,
the input file can contain only a single value).  As a parenthetic
remark, I produce that single value.  But what if minLength and
maxLength are negative?  In that case, the UsageError should take
precedence, because it violates the precondition (and no string
value in the file can be acceptable).  But, always first checking
for UsageError conditions and, if none occurred, then checking
for UsageWarning conditions, reduces the amount of diagnostic
information in the log file.  It may be very helpful in locating
an error to receive warnings as well, because they can point
to overlooked situations.

As a result, there are places where I first check for UsageWarning
conditions, and also places where I first check for UsageError
conditions.  It all depends on the context.  If a UsageWarning
can safely and sensibly be issued before the UsageErrors, then
I do so.

Turbo Pascal Extensions Used in RobIn

Here is a (complete?) list of Turbo extensions to Standard Pascal
that were used in the implementation of RobIn.

  LongInt
  Inc, Dec
  UpCase
  String, + on strings, Val, Str, Delete
  Exit, RunError
  Assign, Close
  {$I+/-}, IOResult
  else in a case statement

Testing RobIn

<to be written; RobIn was applied to some of the IOI2000 input files
at the moment of writing this document>

< STILL INCOMPLETE >


Limitations of RobIn
----------- -- -----
RobIn has (of course) some limitations.  Here is a brief discussion
of some of the known limitations.

Diagnostics for Out-of-Order Calls

<following needs to be updated, since I improved RobIn>
The diagnostics for out-of-order calls to xxxRIF are implicit, rather
than explicit.  In particular, the following restrictions should be
checked explicitly but are not (all calls concern the SAME variable
rif: RobInFile):

  Routine        Condition

  OpenRIF        # OpenRIF = # CloseRIF      (rif not open)
  CloseRIF       # OpenRIF = # CloseRIF + 1  (rif not closed)
  Read<Item>RIF  same as CloseRIF
  ReadlnRIF      same as CloseRIF
  AssertedRIF    same as CloseRIF
  IsAbortedRIF   # OpenRIF > 0               (rif was opened)

Note that "rif is open" does not mean the same thing as "rif.f is
open", because a call to OpenRIF may fail to open rif.f.  In that
case, format checking is aborted, but meta-format checking continues,
and a call to CloseRIF is still needed to terminate the checking,
before opening another file.

Another thing that cannot be diagnosed by RobIn is a missing
CloseRIF call.  If the using program does not make the call,
then RobIn will never know.

Similar limitations apply to the pair BeginLogging and EndLogging.

The reason that such protection is "hard" to implement in Pascal
is that no variables (except file variables) are automatically
initialized to some known value.  Therefore, RobIn (which does not
"own" the RobInFile variable that acts as parameter to xxxRIF) must
always rely on the using program to take care of initialization,
for example by calling an appropriate initialization routine
(in our case, OpenRIF).  (If only Pascal had had a single
auto-initialized type for variables; one bit would have been
enough. :-)

Concerning the second limitation (missing call to CloseRIF),
the thing is that Turbo Pascal has no way to guarantee that
control goes back to RobIn when the program exits.  EndLogging
does a check, but the call to EndLogging may also be missing.
(Delphi does have have that facility.)

< STILL INCOMPLETE >


Other Tools
----- -----

CheckMsg

CheckMsg.pas is a program to help check to check the structure of
message files used by RobIn, especially task-specific message files.
CheckMsg checks the files named by each of its command-line parameters.
If no command-line parameters are given, then it asks for a file name
and checks it.

Application Framework

<to be finished: now the general code for each validator/checker is
simply copied.  That code either handles all command-line parameters
and interprets wildcards, or it generates the file names of all files
to be tested internally (based on the root of the file name
and appending numbers for each case).  For each file it calls
a routine that handles that file (either validates the input format,
analyzes the data, checks the output format, or checks the output
correctness.  The task-specific code in the current "framework"
is kept to a minimum, the remainder is put into a task-specific
unit.  See example "ValidEx.pas".>


Concluding Remarks
---------- -------
Here are some open ends.

Porting to Linux and/or C

Not me.  But if you want to give it a try, look under "Design and
Implementation of RobIn" for a list of Turbo extensions to Standard
Pascal that were used.

ReadRealRIF or ReadFloatRIF

I will not put in a ReadRealRIF/ReadFloatRIF.  For one thing, the use
of floating point arithmetic at the IOI is controversial (for several
reasons, such as rounding, numerical stability, etc., ask Gyula Horvath).
Furthermore, the format for a real/float item must be chosen such
that it is common to all IOI programming languages.  Its definition
is more complicated than that of the items supported now.

<application to IOI2000 input validators>


Acronyms
--------
CR      Carriage Return (ASCII character 13)
HT      (Horizontal) Tab (ASCII character 9)
IDE     Integrated Development Environment
I/O     Input/Output
IOI     International Olympiad in Informatics
LF      Line Feed (ASCII character 10)
OS      Operating System
RobIn   Robust Input module (Turbo Pascal Unit)
TC      Turbo C/C++ by Borland (version 3.0)
TP      Turbo Pascal by Borland (version 7.0)


Sample Programs
------ --------
This appendix contains some programs to illustrate certain details
that I did not want to include in the main text.  The layout for
these programs has been kept compact to improve the overview, at
the expense of their maintainability.

End-of-Line and Turbo Pascal

The following two programs count the number of characters and lines
in standard input.  Under the ISO Standard Pascal definition, these
programs are equivalent, but not under Turbo Pascal.  Try the following
two possible inputs, each consisting of two lines (^Z denotes Ctrl-Z):

abc
def
^Z

abc
def^Z

Program TextCounter1 (correctly) reports 6 characters on 2 lines
in both cases.  Program TextCounter2 reports the same for the first
input, but 6 characters and 1 line for the second input, because the
outer loop terminates early.  This can be compensated by including
the line

; if eoln then Inc(lines)

after the loop in TextCounter2.  Note, however, that the resulting
program is no longer correct ISO Pascal (because eoln is inspected
when eof is true)!

program TextCounter1;

var chars, lines: longint; c: char;

begin
  writeln('Count # characters and lines in standard input.')
; writeln('Standard implementation.')
; chars := 0 ; lines := 0
; while not eof do begin
    while not eoln do begin read(c) ; Inc(chars) end
  ; readln ; Inc(lines)
  end
; writeln('# chars = ', chars:6)
; writeln('# lines = ', lines:6)
end.

program TextCounter2;

var chars, lines: longint; c: char;

begin
  writeln('Count # characters and lines in standard input.')
; writeln('Alternative implementation.')
; chars := 0 ; lines := 0
; while not eof do begin
    if eoln
    then begin readln  ; Inc(lines) end
    else begin read(c) ; Inc(chars) end
  end
  { N.B. Turbo Pascal needs "; if eoln then Inc(lines)" here,
    but this would be wrong in Standard Pascal }
; writeln('# chars = ', chars:6)
; writeln('# lines = ', lines:6)
end.

~~~~
NOT YET INCORPORATED ABOVE

  Item composition mistakes are
      *  multiple consecutive blanks
      *  blanks at begin of line or end of line
      *  empty lines
      *  missing blanks
      *  missing end-of-lines
      *  end-of-lines instead of blanks
      *  blanks instead of end-of-lines
      *  junk at end-of-file