RobIn for IOI I/O ===== === === === by Tom Verhoeff (Chair IOI Scientific Committee) August 2000, Version 0.3 (incomplete unreleased draft) Documents RobIn Version 0.3 Text bracketed by <...> mostly indicates unfinished business! Contents -------- Introduction Historical Intermezzo I/O in Turbo Pascal and Turbo C/C++ IOI-Style Meta-Format Restrictions User Requirements for RobIn How to Use RobIn How to Interpret RobIn Messages Design and Implementation of RobIn Limitations of RobIn Other Tools Concluding Remarks Acronyms Sample Programs Introduction ------------ RobIn is a software package featuring a Turbo Pascal Unit for robust input from text files with an IOI-style format (IOI = International Olympiad in Informatics). In this document, I provide some motivation for RobIn and I explain its use and design. Just in case you want to see it immediately, the main implementation source file is named "RobIn.pas", its usage is explained in the section "How to Use RobIn", and "Example.pas" and "ValidEx.pas" are programs that illustrate how to use RobIn. RobIn simplifies the development (in Turbo Pascal) of * /input validators/, viz. programs that check the validity of the input files produced by the organizers, and * /output checkers/, viz. programs that check the format and/or correctness of output files produced by the competitor programs. The description of an IOI programming task defines the format of every input and output file involved. The IOI organizers are responsible for producing valid input files, and the competitors may assume that the input files are valid, that is, their programs may crash on invalid input without being penalized for it. Likewise, the competitors are responsible for delivering programs that produce correct output files. When grading the programs, format restrictions are enforced in a more relaxed way for output files than for input files. In the IOI competition, the file formats are kept simple because * the IOI focuses on algorithms and not on I/O; * the same format must be both easy to read and write "by" all of the IOI programming languages, and also by the competitors (the latter mainly through the editors built into the IDE for each programming language). The competitor programs must read the input files and must write the output files. We do not want competitors to waste their time on programming input parsers, when the task is really about some complex graph algorithm. The competitors most likely must also somehow put together various input files for testing and they must also inspect the resulting output files. This is often done "by hand" through an editor and dumping to the screen or even printing on paper. The organizers of an IOI have the responsibility to define file formats and to verify them. Developing an input validator or output checker forces one to think carefully about the format involved. This turns out to be a great help in formulating the format definitions. Historical Intermezzo ---------- ---------- Traditionally, the algorithms developed by the IOI competitors are delivered as executable programs and they are graded by executing them for a (necessarily small) set of test cases. They are not graded by inspecting the algorithm designs or program texts. In the scientific and engineering world, it has long been known that execution-based testing is a very poor way of verifying the performance characterictics of algorithms (such as correctness, speed, and memory usage). Still, there are some (compelling?) reasons for execution-based testing at the IOI: * the language barrier, * lack of agreement on alternatives, and * executable products need to be demonstrated by execution. At the IOI, the competitors are high-school students and they are not expected to be capable of adequate technical communication with foreign organizers and evaluators, neither written nor oral. This is known as the language barrier. Therefore, the tasks are translated to the competitor's native language before the competition. At the IOI, a translation step in the opposite direction, from competitor to organizer, is avoided by requiring the delivery of machine-executable products. (By the way, I still think the IOI should look for ways of expanding the evaluation process to include non-execution-based methods of verification.) Another reason for execution-based testing at the IOI is that it apprears hard to agree upon alternatives. What constitutes a design of an algorithm? How should such a design be presented and judged? We have no international notation for algorithms, like they do in mathematics. (Though we should be able to get a long way; after all, the delegation leaders have little trouble discussing the tasks, and approaches and obstacles to solutions.) Finally, a correctness proof alone is not good enough for an executable product. Obviously, we need to see the thing "fly" to believe it. (The IOI hovers dangerously between theoretical computer science and software engineering.) For execution, the algorithms somehow need to be provided with various input parameters, and the resulting output needs to be observed. Since IOI'94 in Sweden, such I/O is done through text files. All input parameters are offered in a text file, even if this concerns just a single integer. All output must be written to a text file. The text file interface provides a convenient abstraction from the programming language used to express the algorithm. Note that the IOI has always allowed a choice among multiple programming languages (for lack of agreement on a single language). Alternatives to (the direct use of) text files are * task-specific libraries that competitor programs must use, * tasks that ask for parameterized routines instead of programs. The organizers of an IOI can develop special-purpose libraries to encapsulate I/O. Such a library provides a means of I/O on the language level: variables or parameters with native types can be directly used; there is no need for formatting and conversions. This use of libraries is now common practice for reactive programming tasks. There is still the issue of restrictions on the use of the library (its interface definition), in particular, also restrictions on the calling order of library routines. Another alternative is to require that the competitors develop a routine with input and output parameters in the programming language of their choice. For evaluation, the routine is embedded into an evalution framework to produce an executable program. You might say that here the competitors develop a library consisting a single routine. This may work better in the case of source code submission than submission of object files. The major disadvantage of both alternative approaches is that the details of the interface now depend on the programming language. Even with only two olympic programming languages, this is quite a burden on the organization. (But this does not stop us from using it for reactive tasks.) It seems to me that the current situation at the IOI reflects the current state of computer science in general quite well. We have come a long way, but we are not "there" yet. I/O in Turbo Pascal and Turbo C/C++ --- -- ----- ------ --- ----- ----- Until IOI2000, the IOI programming languages are * Turbo Pascal 7.0 * Turbo C/C++ 3.0 (After IOI2000, we move to other Pascal and C/C++ compilers.) Let me consider some of the subtleties of text I/O in these languages, and in particular text input. First, some general remarks about text files on the MS-DOS/Windows platform are necessary. The character set used for MS-DOS/Windows text files is Extended ASCII (256 8-bit codes). Of these, the "Standard" ASCII codes (128 7-bit codes from 0 through 127, also called Low ASCII) include * 33 control codes (0 through 31, and 127), * 1 blank (SP, space, code 32), * 94 printing characters (Latin uppercase and lowercase letters, digits, and various symbols, with codes 33 through 126). The extended codes (128 through 255, also called High ASCII) cover adorned letters (with various accents for non-English languages), some graphics and special symbols. Of the control codes, the following are worth mentioning: BS Backspace, Ctrl-H, ^H, code 8 HT Horizontal Tab, Tab Ctrl-I, ^I, code 9 LF Line Feed, Ctrl-J, ^J, code 10 CR Carriage Return, Ctrl-M, ^M, code 13 SUB Substitute character, Ctrl-Z, ^Z, code 26 ESC Escape, Ctrl-[, ^[, code 27 DEL Delete, Rub out, code 127 On the MS-DOS/Windows platform, the two-character combination CR/LF (codes 13, 10) encodes the transition to a new line (end-of-line in Pascal parlance, newline in C terminology). Note that under Unix, this is accomplished by a single LF, whereas under the MacOS, a single CR is used. (On the platform-less web you can often see the disastrous consequences of these differences.) Under MS-DOS/Windows, the end of a text file is encoded by SUB (Ctrl-Z, code 26). The other control codes, with the possible exception of the Tab (HT, Ctrl-I, code 9), nowadays have no generally accepted role in text files. The characters CR/LF, HT (tab), and SP (blank) are collectively known as /whitespace/. Here are some Turbo Pascal (TP) text I/O details. For convenience, we assume the following variable declarations: var f: Text; c, d: Char; i: Integer; s: String; TP01. Turbo Pascal does not implement the file window (in ISO Pascal denoted by f^) to look one character ahead into the input stream. It also lacks the get (and put) routines. All I/O must be done through the (standard) eof/eoln, read/readln, and write/writeln routines. TP02. Reading a single character c by read(f,c) does NOT skip leading whitespace. Skipping of leading blanks must be accomplished, for example, by "repeat read(f,c) until c<>' '". A single leading blank can be skipped by "read(f,d,c)" and ignoring d. TP03. Reading a single character when eoln(f) is true, returns chr(13) (=CR), instead of ' ' as required by the ISO Pascal Standard. Furthermore, the next character read after CR will be chr(10) (=LF). Note that a single call readln(f) consumes both CR and LF. TP04. Reading a single character when eof(f) is true, does NOT raise a run-time I/O error (as suggested by the ISO Pascal Standard) and returns chr(26). TP05. Reading an integer by read(f,i) will skip all leading whitespace (blank, tab, end-of-line); the read then consumes ALL subsequent non-whitespace characters until another whitespace character is encountered. A run-time I/O error (106: invalid numeric format) is raised when the non-whitespace characters violate the (Pascal) numeric format. Trailing whitespace (incl. end-of-line) is never consumed, unless calling readln(f,i), which is equivalent to read(f,i);readln(f). TP06. Reading a variable-length string by read(f,s) does NOT skip leading whitespace. TP07. Reading characters for a variable-length string is stopped by the maximum length of the string variable or the first end-of-line encountered. The maximum length for a variable-length string is 255 characters. A trailing end-of-line is never consumed, unless calling readln(f,s). TP08. Readln(f) consumes upto and including the next CR, and subsequent LF if present. TP09. If the file's last line is not terminated by an end-of-line, then eof(f) and eoln(f) become true simultaneously, and NOT, as the ISO Pascal Standard requires, first eoln(f) and after readln(f) then eof(f). Under the Pascal Standard, a text file consists of a sequence of zero or more /lines/, terminated by end-of-file (recognized by function eof), and each line consists of a sequence of zero or more /characters/, terminated by end-of-line (recognized by function eoln). Thus, the only case where an end-of-file is not preceded by an end-of-line occurs in an empty file. Also, according to the Pascal Standard, eoln should not be called when eof returns true. The subtle difference between Turbo Pascal and the Pascal Standard is illustrated by the two character-and-line counting programs in the appendix "Sample Programs". TP10. Writeln(f) produces CR followed by LF. TP11. Closing an output file appends the ending Ctrl-Z. Here are some details concerning the Turbo Pascal Editor (built into the Turbo Pascal IDE; the built-in Turbo C/C++ editor is very similar): IDE1. It is NOT possible to add blanks at the end of a line. (Hit the END key to find the "real" end of a line.) IDE2. When opening a file, any blanks at the end of lines are silently discarded (and lost when the file is saved). IDE3. The last line (accessible by the cursor) is NOT written with a terminating end-of-line when saving the file. IDE4. When opening a file with its last line (properly) terminated with an end-of-line, it will show up as a trailing empty line (accessible by the cursor). IDE5. Fixed tab stops are set every 8 positions in columns 1, 9, 17, etc. Hitting the TAB key will insert an "appropriate" number of blanks. To enter a real Tab (TAB character) in the file, use the sequence ^P^I (Ctrl-P, Ctrl-I). Tabs in a file are visually not directly discernable from blanks. IDE6. Control characters (other than HT, CR/LF) are shown as "funny" graphics symbols. Now let me deal with some Turbo C/C++ (TC) peculiarities (these are mostly in agreement with the ANSI C Standard). For convenience, we assume the following variable declarations: FILE *f; char c; int i; char s[255]; TC01. Formatted input can easily be done through fscanf with a format string and a sufficient number of result arguments (or via fgets followed by sscanf). It can also be done through "cin >> ...;"; we consider fscanf in the following. TC02. Reading a single character by fscanf(f,"%c",&c) does NOT skip leading whitespace by itself. All leading whitespace can be skipped by including a blank directly preceding the format specifier: fscanf(f," %c",&c). TC03. Reading a single character (format "%c") when end-of-line is ahead, will return LF (code 10, '\n'), and it skips both CR and LF. Note that putting "\n" in the format string will skip ALL whitespace at that point. TC04. Function fscanf will NOT read the end-of-file character, instead it will return a count less than the number of arguments to read, indicating premature termination. TC05. Reading an integer by fscanf(f,"%d",&i) will skip all leading whitespace; the read consumes only characters that agree with the integer syntax and stops at the first character that does not match. TC06. Reading a variable-length string by fscanf(f,"%s",s) will skip all leading whitespace. This is not the case when using gets(s) or fgets(f,s,i). Note that s should be big enough to accommodate all characters read AND the terminating null '\0'. Therefore it is recommended to use fscanf(f,"%255s",s) when s can accommodate 256 characters (255 non-nulls). TC07. Reading a variable-length string is stopped after reading all (at least one) non-whitespace characters, by the first whitespace character (which is not consumed). In particular it will never skip trailing end-of-lines, and the returned string will never include blanks and will never be empty. Again, this is not the case for gets/fgets. (A subtle difference between gets and fgets is that the latter will put the terminating newline '\n' into the string returned, whereas gets reads the newline and discards it. TC08. Reading with fscanf(f,"\n") skips all whitespace at that point, including end-of-lines, but also blanks and tabs. TC09. End-of-lines are faithfully reported as they appear in the file, the end of the file may or may not be preceded by an end-of-line. Using fscanf it is not possible to distinghuish between end-of-file, an I/O error, or a format error. TC10. Calling fprintf (f,"\n") produces both CR and LF. TC11. Closing an output file appends the ending Ctrl-Z. Concerning output, here is an anecdote. The output format required a text file with a number N (1<=N<=100) on the first line, and on the second line a list of N numbers in the range 0..9, separated by single blanks (no blank before the first item). This format is in agreement with the IOI-style meta-format rules. There are various ways to produce the output. Below is some Turbo Pascal code to put on the dots (...) in the following framework: program NumberList; var N: 1..100; L: array [1..100] of 0..9; ... begin "produce N and L" ; writeln ( N ) ... end. My preferred solution: var i: 1..100; sep: String; { or sep: String[1] if you like } ... ; s := '' ; for i := 1 to N do begin write ( s, L[i] ) ; sep := ' ' end { for i } ; writeln But you can also get away with var i: 1..100; ... ; for i := 1 to N do begin if i > 1 then write(' ') ; write ( L[i] ) end { for i } ; writeln or var i: 1..100; ... ; for i := 1 to N do begin write ( L[i] ) ; if i < N then write(' ') end { for i } ; writeln Here are three incorrect fragments (to be enclosed by "var i: 1..100;" and "; writeln"): ; for i := 1 to N do write ( L[i] : 2 ) ; for i := 1 to N do write ( ' ', L[i] ) ; for i := 1 to N do write ( L[i], ' ' ) All of them are incorrect because they produce one additional blank. The first two at the line begin, and the third one at the line end. An additional weakness of the first fragment, is that changing the range of the elements in the list L, say from 0..9 to -1..10, will cause -1 and 10 to miss a preceding blank. Note also that the output of the third approach appears to be good when viewed in a Turbo editor (because that editor discards blanks at the end of a line), or when dumped to the screen or printed on paper. At IOI'95 it happened that (at least) one competitor had opted for the third approach. This person apparently was bothered by the trailing blank and changed the finishing writeln into writeln(chr(8)), that is, included a backspace (BS) "to erase the superfluous blank". When dumped to the screen, the output does indeed look good. The IOI'95 evaluation software, however, did not accept this, because it rejected control codes other than end-of-line and end-of-file. The IOI'95 evaluation software would have ignored the trailing blank, because the output format was not strictly enforced (knowing that competitors make stupid I/O mistakes)! IOI-Style Meta-Format Restrictions --------- ----------- ------------ The restrictions on the format of text files given below are inspired by the language peculariaties explained in the preceding section, They are intended to make I/O easy for the built-in routines of the IOI programming languages, and for human use. The format of input and output text files is restricted by the following /meta-format rules/: MF01. An IOI text file consists of a sequence of ONE or more NONempty /lines/. The sequence is terminated by end-of-file (Ctrl-Z). MF02. Each line consists of a sequence of ONE or more /items/. This sequence is ALWAYS terminated by an end-of-line (CR/LF), in particular also the last line. MF03. Each item is one of the following: /character item/: a single printable NON-blank character (codes 33 through 126) /integer item/: an integer number in decimal notation, possibly negative (preceded by '-'), no leading '+', at most 32 bits /string item/: a NON-empty variable-length string of at most 255 printable NON-blank characters (variable-length here means that the length possibly cannot easily be determined from the format or the preceding text) MF04. Adjacent items on the same line are either /contiguous/ (that is, without any extra characters between them), or /separated/ from each other by a SINGLE blank. The definition of the format (in the Competition Rules and the task description) must clearly state the item composition convention. MF05 through MF12 concern restrictions on item compostion. MF05. A character item must be separated by a single blank from a directly preceding number item on the same line. MF06. It is discouraged that character items are preceded by a separating blank. MF07. It is discouraged that character items appear on the same line as number items. (Characters preceding integers are not an issue concerning points 5 and 6; only for an integer preceding a character are points 5 and 6 in conflict.) MF08. A number item is always separated by a single blank from a directly preceding number item on the same line. MF09. A variable-length string item always appears at the end of a line. MF10. It is discouraged that a variable-length string item appears with other items on the same line; that is, preferably it occurs on a line by itself. MF11. A variable-length string item must be separated by a single blank from a directly preceding number item on the same line. MF12. There are no blanks at the begin or end of a line. MF13. The format must be such that type of every item and the occurrence of every end-of-line and end-of-file can easily be deducted --- before they appear --- from the format definition or preceding text (possibly from another file). Explicit end-of-line or end-of-file checking should NEVER necessary, not matter which IOI programming language one uses. Below are some examples to illustrate this rule. MF14. There are no tabs and other control codes (besides CR/LF and Ctrl-Z) in text files. There is no real/float number item (reals/floats could still be treated as a sequence of characters or a string, but the appearance of floating-point arithmetic in tasks is frowned upon in the IOI). Format definitions provided by the IOI organizers must respect the meta-format rules stated above, and all input files must be in /strict/ agreement with the promised input format. When checking output files produced by the competitor programs in order to determine scores, it is strongly recommended that the format is not enforced strictly, but rather in a /relaxed/ manner. Note that the Competition Rules for all IOIs up to IOI2000 have formulated some input format restrictions, but those formulations have always been less restrictive than the meta-format rules that I have given above. Here are some examples to illustrate rule MF13. The following format definitions violate rule MF13. 1. Input file ... contains at least one and at most 20 lines. Each line contains three nonnegative integers less than 5000. 2. The first and only line of input file ... contains at least one and at most 1000 characters that are uppercase letters. There are no blanks between the characters. 3. The first line of input file ... contains at most 100 integers I with 10 <= I <= 99, followed by a single character '*'. The first and second format require end-of-file and end-of-line checking, respectively, to be read. In C/C++, the second format can be handled without explicit end-of-line checking, because C/C++ does not have the 255 character limit on variable-length strings. When using the so-called "extended syntax", Turbo Pascal can also read null-terminated strings (but this is not Standard Pascal, and I don't find it required IOI knowledge). The third format has the problem that it is not known what type of item to read after the first item (which is always an integer). Thus, it requires looking ahead into the input stream to detect the terminating 'A'. If that look-ahead character turns out to be digit than it must be combined with the following digits to make up another integer. In Pascal, such a look-ahead character cannot be pushed back into the input stream. In C/C++, this can be accomplished (though it is not recommended), but I would not consider that required IOI knowledge. The total length of the input line can exceed 255. Hence, reading it into a string is not an option for Turbo Pascal (in my opinion). Here are similar format definitions that do respect rule MF13. 1. Input file ... contains at least one and at most 21 lines. Each line, except the last, contains three nonnegative integers less than 5000. The last line of the file contains a single -1. 2. The first line of input file ... contains at least 1 and at most 1000 characters that are uppercase letters, followed by a single '*'. There are no blanks between the characters. 3. The first line of input file ... contains at most 100 integers I with 10 <= I <= 99, followed by a single 0. Although these formats are acceptable, their readability can be further improved as follows: 1. The first line of input file ... contains one integer N with 1 <= N <= 20. The following N lines contain three nonnegative integers less than 5000. 2. The first line of input file ... contains one integer N with 1 <= N <= 1000. The second line contains N characters that are uppercase letters. There are not blanks between the characters. 3. The first line of input file ... contains one integer N with 0 <= N <= 100. If N is positive, then there is also a second line, containing exactly N integers I with 10 <= I <= 99. For the third format, now the question is raised whether the case with N = 0 really needs to be included at all. Furthermore, one could argue that the human writability of the formats has deteriorated, since the number of items needs to be counted explicitly. User Requirements for RobIn ---- ------------ --- ----- Here is a list of requirements that I had in mind when designing RobIn: UR01 RobIn shall NOT CRASH on errors but report them. UR02 RobIn shall be EASY TO USE for developing programs that check text files for adherence to IOI-style formats. UR03 RobIn shall be FLEXIBLE enough to express all reasonable IOI-style formats that adhere to the Meta-Format Rules. UR04 RobIn shall ENFORCE the IOI-style Meta-Format Rules. UR05 RobIn shall have a RELAXED mode and a STRICT mode for checking a format. UR06 RobIn shall be usable for developing both INPUT VALIDATORS that report to the organizers, and OUTPUT CHECKERS that report to the contestants and their coaches. UR07 RobIn shall NUMBER all error messages uniquely and take them from a text file to simplify translation. UR08 RobIn shall support TASK-SPECIFIC checking and error messages. As a development requirement I order the importance of some software qualities as follows: 1. First comes correctness 2. Second comes usability (simplicity, generality) 3. Third comes maintainability 4. Fourth comes portability (though I have not given it much thought) 5. Only in fifth place comes performance (speed, memory) How to Use RobIn --- -- --- ----- RobIn is a Turbo Pascal Unit that provides routines for reading text files and extracting values from them, while checking * that the format (especially, order and composition of items, and the line structure) adheres to the meta-format rules, and * that the file content adheres to the format (including range restrictions). RobIn is "robust" and does not crash on "bad" text files, but instead provides detailed diagnostic messages. The messages are numbered and taken from message files to simplify translation and use by non-English organizers, coaches, and competitors. RobIn is intended to be used by organizers to construct tools that, in turn, are used by organizers (to validate input files) and competitors and coaches (to check the format/correctness of output files). This section is addressed to the organizer who wishes to construct a tool using RobIn. Coaches and competitors may want to skip to the section titled "How to Interpret RobIn Messages". Make sure that you have ROBIN.TPU (obtained by compiling ROBIN.PAS with TP). Simply include "uses RobIn;" in your Turbo Pascal and away you go. Well, you need to know somewhat more. RobIn provides the following routines (here shown without parameters): BeginLogging To open the log file and load the messages EndLogging To close the log file and unload the messages OpenRIF To open the text file to be read and checked CloseRIF To terminate reading and close the text file ReadCharRIF To read a character item ReadIntegerRIF To read an integer item ReadStringRIF To read a variable-length string item ReadlnRIF To read an end-of-line AssertRIF To do a task-specific check IsAbortedRIF To find out whether a format error occurred IntToStr To help construct composite identifiers for items The acronym RIF stands for "RobIn File". The routines ReadCharRIF, ReadIntegerRIF, and ReadStringRIF are collectively also denoted by ReadRIF. Some restrictions apply to the order in which RobIn routines are called. Violating these order restrictions will not crash RobIn, but may result in early termination, and an error message that may not explicitly mention the out-of-order calling. The order restrictions on calls to BeginLogging, EndLogging, and xxxRIF can be captured by the regular expression: ( 1. BeginLogging 2. ( "format described by program fragment using xxxRIF" )* 3. EndLogging )* where ( ... )* means "zero or more times the enclosed". A /format/ is described by writing a Pascal program fragment containing calls to OpenRIF, CloseRIF, ReadRIF, and ReadlnRIF. Typically, the order of these calls should adhere to the regular-expression: 1. OpenRIF 2. ( 1. ( ReadRIF )+ 2. ReadlnRIF )+ 3. CloseRIF where ( ... )+ means "one or more times the enclosed". Anywhere between OpenRIF and the corresponding CloseRIF, the format program fragment can call AssertRIF to check a task-specific condition. The program can call IsAbortedRIF any time after OpenRIF --- also after CloseRIF --- to find out whether reading of the file content had been aborted (due to a fatal error). Note that checking of the format itself against the meta-format rules is always continued until CloseRIF. No order restrictions apply to IntToStr. It is just a "loose" function, typically used to construct a parameter for ReadRIF when reading a sequence of items. As an example, consider the following format definition: The first line of file 'EXAMPLE.IN' contains two integers N and K with 2 <= N <= 9 and 1 <= K <= N. The second line contains N lowerercase letters ('a'..'z', contiguous). The following K lines each contain one nonempty string of at most 20 uppercase letters ('A'..'Z'). This format is captured by the following program fragment (assuming appropriate constant definitions and variable declarations; the full program is Example.pas): OpenRIF ( inFile, 'EXAMPLE.IN', ... ) { read first line } ; N := ReadIntegerRIF ( inFile, Separated, 'N', 2, 9 ) ; K := ReadIntegerRIF ( inFile, Separated, 'K', 1, N ) ; ReadlnRIF ( inFile ) { read second line } ; for i := 1 to N do begin L[i] := ReadCharRIF ( inFile, Contiguous, 'L[' + IntToStr(i,1) + ']', UpperCaseChars ) end { for i } ; ReadlnRIF ( inFile ) { read following K lines } ; for j := 1 to K do begin S[j] := ReadStringRIF ( inFile, Contiguous, 'S[' + IntToStr(j,1) + ']', LowerCaseChars, 1, 20 ) ; ReadlnRIF ( inFile ) end { for j } ; CloseRIF ( inFile ) Note that Separated, Contiguous, UpperCaseChars, and LowerCaseChars are predefined by RobIn. Here is a corresponding fragment of Turbo Pascal program that reads the input data from the input file (again assuming appropriate variable declarations). Note the great similarity with the fragment above. A fragment of C is given further down. Assign ( inFile, 'EXAMPLE.IN' ) ; Reset ( inFile ) { read first line } ; Read ( inFile, N ) ; Read ( inFile, K ) ; Readln ( inFile ) { read second line } ; for i := 1 to N do begin Read ( inFile, L[i] ) end { for i } ; Readln ( inFile ) { read following K lines } ; for j := 1 to K do begin Read ( inFile, S[j] ) ; Readln ( inFile ) end { for j } ; Close ( inFile ) or somewhat more compactly, as a competitor might code it: Assign ( inFile, 'EXAMPLE.IN' ) ; Reset ( inFile ) ; Readln ( inFile, N, K ) ; for i := 1 to N do Read ( inFile, L[i] ) ; Readln ( inFile ) ; for j := 1 to K do Readln ( inFile, S[j] ) ; Close ( inFile ) Here is a C version (one of many possibilities) to read the format (assuming appropriate #include and variable declarations): inFile = fopen ( "EXAMPLE.IN", "r" ); fscanf ( inFile, "%d%d\n", &N, &K ); /* \n needed (why?) */ for ( i=1; i<=N; ++i ) fscanf ( inFile, "%c", &L[i] ); fscanf ( inFile, "\n" ); /* not needed (why?) */ for ( j=1; j<=K; ++j ) fscanf ( inFile, "%s\n", &S[j] ); /* \n not needed (why?) */ fclose ( inFile ); RIF routines: rif, comp, itemId, and various bounds> To simplify reading a list of separated items contained on a single line, RobIn automatically adjusts the itemComp parameter to Contiguous when reading the first item on a line. Thus, the format The first line of ... contains one integer N with 1 <= N <= 20. The second line contains N integers X with 0 <= X <= 999. is correctly captured by (assuming appropriate variable declarations) N := ReadIntegerRIF ( inFile, Separated, 'N', 1, 20 ) ; ReadlnRIF ( inFile ) ; for i := 1 to N do begin X[i] := ReadIntegerRIF ( inFile, Separated, 'X', 0, 999 ) end { for i } ; ReadlnRIF ( inFile ) It is recommended that the format program captures bounds as close as possible to the way they are given in the task description. Consider the following two equivalent format definitions: The first line of ... contains two positive integers N and K at most 100 and with K <= N. The first line of ... contains two integers N and K with 1 <= K <= N <= 100. Here are two corresponding format program fragments: N := ReadIntRIF ( inFile, 'N', 1, 100 ) ; K := ReadIntRIF ( inFile, 'K', 1, 100 ) { check K < N, if not report "K > N" } ; AssertRIF ( inFile, K <= N, AssertError, {M#}900, IntToStr ( K, 1 ) + ' > ' + IntToStr ( N, 1 ) ) and N := ReadIntRIF ( inFile, 'N', 1, 100 ) ; K := ReadIntRIF ( inFile, 'K', 1, N ) These fragments are equivalent, except for the kind of diagnostics they produce, and how the competitor can interpret the diagnostic given the format definition. Consider the input file with first line "10 20". The first program will give an AssertError of the form "K > N (20 > 10)", and the second program will give a FormatError of the form "Integer not in range (20 not in [1..10])". Use of AssertRIF and IsAbortedRIF The Structure of a Message Definition File Each diagnostic message is identified by a number in the range 0..900. The numbers in the range 0..899 are reserved for use within RobIn. The numbers in the range 900..999 are available for task-specific messages issued via AssertRIF by the using program and loaded by BeginLogging. The association between a message number and its text is recorded in a Message Definition File (typical extension .mdf). In a message definition file, empty lines and lines starting with '#' are ignored. These can be used for documentation. Other lines should be of the form Leading and trailing blanks are ignored. The format of the message definition file is checked when it is loaded (though in a way that differs from the checking that RobIn does for RobIn files :-) and warnings are generated when the message definition file is badly formatted. Diagnostic Categories Every message is issued within a category. Usually --- but not necessarily always --- a message occurs in one category only. However, some messages are more general and make (different) sense in different categories (e.g. "Empty string"). The following table summarizes the "philosphy" behind the various Error dianostics. As with all error reporting, your mileage may vary, since RobIn cannot know what the user's intentions were. Compare this to the art of interpreting compiler error messages. IOError --> need to correct problem with file in OS its existence, its location or name, its permissions InternalError --> need to correct mistake inside RobIn report this to me, with all evidence you can supply UsageError --> need to correct mistake in program using RobIn MetaFormatError --> need to correct format definition FormatError --> need to correct file content (general isolated issue) AssertError --> need to correct file content (task-specific issue) The distinction between UsageError and MetaFormatError is not always as clearcut as one might hope. Generally speaking, the format definition need not change when correcting a UsageError. Nevertheless, the format may still turn out to be the real source of the problem. Consider the following format definition: The first line of input file ... contains two integers N and K with 1 <= N <= 100 and 3 <= K <= N. The straightforward format program fragment is N := ReadIntegerRIF ( inFile, Separated, 'N', 1, 100 ) ; K := ReadIntegerRIF ( inFile, Separated, 'K', 3, N ) However, this may give rise to a UsageError "Empty integer range" on the second ReadIntegerRIF call. This is, for instance, the case when reading was aborted (file does not exist), in which case only the meta-format is checked. The first ReadIntegerRIF then returns the lower bound 1, making the second range empty. This is clearly an inconsistent read request. What is the matter here? Well, 1 is not really the lower bound on N, instead the actual consistent minimum value for N is 3, because of the constraint 3 <= K <= N . It is advisable to change the format definition in such a case. Warnings Warnings point to a "suspicious" situation or a discouraged practice. Sometimes they can easily be explained and subsequently be ignored. In other cases, it may turn out to be a "hard to catch" mistake. Compare this again to compiler warnings. < STILL INCOMPLETE > How to Interpret RobIn Messages --- -- --------- ----- -------- RIF, because it may still be followed by AssertRIF Integer item values are reported in decimal Character item values are reported as 'c', '''', or chr(#) String item values are reported as "c...c" (embedded '"' appear singly!) consequences of limiting diagnostics> Design and Implementation of RobIn ------ --- -------------- -- ----- The implementation of RobIn resides in the file "RobIn.pas" and contains comments as documentation. Here are some very global additional remarks. Expressing the Format I have chosen to express the format as a (Pascal) program fragment that makes calls to the routines reading the elements in an order appropriate for the format. An alternative could have been to introduce a new notation for expressing formats and to store the format description in a separate file. I think that my approach is more flexible and easier to use directly (unless you hate Pascal). Type RobInFile The file being checked is read via a variable rif: RobInFile, in particular via the text file variable rif.f. The variable rif contains other fields besides f to capture the current state of the reading and checking process, including some statistics. The sequence of characters that "passes through" rif.nextChar is exactly the contents of the file, with two exceptions. The first exception is that, under Relaxed Checking, every TAB characters is replaced by a blank. The second exception is that, under Relaxed Checking again, if the last line in the file is not terminated by an explicit end-of-line, then and EolnChar is inserted in the sequence. The window rif.nextChar acts as a look-ahead character. It is needed, because Turbo Pascal lacks a (Standard Pascal) file window to look ahead into the input stream. The procedure StepNextChar advances the rif.nextChar "window" one position. The look-ahead character is always valid after opening a RobInFile. Procedure StepNextChar gets the next character into nextChar. When eof is encountered, nextChar is set to EofChar. When eoln is encountered, nextChar is set to EolnChar. RobIn parses items by reading characters. The end of an integer item is detected by hitting the end-of-line or by reading a non-digit. The latter cannot be pushed back into the input stream and is stored as look-ahead character. Diagnostic messages Diagnostic output is done through theLogFileP, which is a pointer to a text file. I use such a pointer for two reasons. First of all, that way the using program can "own" the file variable and still tell only once to RobIn which file to use. Second, it simplifies switching on-the-fly between log files without having to open and close them. You can just move the pointers around. For instance, to get diagnostic output temporarily to standard output, simply insert: theLogFileP := @ output { diagnostic output now goes to output } ... theLogFileP := @ logFile { diagnostic output now goes to logFile (logFile should be open for writing) } Similarly, the selection of diagnostic messages can be changed on-the-fly by assigning to theDiagSel. All diagnostic messages produced by RobIn are preceded by "RobIn: ". Subsequent lines are indented by two blanks (implementation string constant Indent). Diagnostic messages are formulated to state what is actually the case, not what should have been the case. For example, "Empty string" (when some empty string was encountered where it should not have been), rather than "Empty string not allowed". Some diagnostic messages are followed by a /parenthetic note/, which is NOT taken from a message file, but which is constructed depending on the current situation. For example, if a value is out of bounds, the value and bounds are shown. I have strived to use as little English as possible in the parenthetic remarks, and preferably only symbolic information, to avoid the need for translation. Note that only a limited set of diagnostic messages is relevant to the competitors. <<< There should be a complete list showing which notes can occur. >>> Message numbers in the range 1 .. 255 are reserved for Turbo Pascal run-time error messages. Message numbers in the range 256 .. 899 are general messages of RobIn. Type Compatibility Issue for ReadRIF The routines ReadRIF have been implemented as functions. They "should" have been implemented as procedures (like the predefined read in Pascal), because they have (major) side-effects. Here is a motivation for my choice. In case of read procedures, the result (the value read) would have to be communicated via a var-parameter. These var-parameters need to have the "widest" type for the kind of item read, that is, Char for characters, LongInt for integers, and String for variable-length strings. Turbo Pascal insists that the actual var-parameter has exactly the same type as the formal var-parameter (and not a subtype). There is no automatic type conversion when assigning the result value in the procedure to the var-parameter. (Note: The predefined read procedures form an exception to this rule!) Consequently, if you have "var N: 1..100" and want to read a value using a "procedure ReadInt ( var i: LongInt )", then "ReadInt ( N )" is rejected, and you have to resort to the use of an auxiliary variable: "var ri: LongInt; ... ReadInt ( ri ) ; N := ri". On the other hand, for functions, the result type only needs to be assign compatible --- not identical --- when used in an assignment statement, and the value is automatically converted. Thus, "var N: 1..100; ... function ReadInt: LongInt; ... N := ReadInt" is correct usage. To simplify the use of RobIn (avoiding auxiliary variables), I have sacrificed elegance (no side-effects in functions). In What Order to Check Diagnostic Conditions? The order in which routines check conditions to generate diagnostic output poses some dilemmas. Often, it does not make sense to continue processing if a UsageError condition occurs, such as an empty character range in ReadStringRIF. In fact, it may be dangerous or senseless to test for a UsageWarning condition if an error is present. For example, in ReadStringRIF, if range contains a single character and minLength = maxLength, then I find that a good reason to give a UsageWarning (because at this point, the input file can contain only a single value). As a parenthetic remark, I produce that single value. But what if minLength and maxLength are negative? In that case, the UsageError should take precedence, because it violates the precondition (and no string value in the file can be acceptable). But, always first checking for UsageError conditions and, if none occurred, then checking for UsageWarning conditions, reduces the amount of diagnostic information in the log file. It may be very helpful in locating an error to receive warnings as well, because they can point to overlooked situations. As a result, there are places where I first check for UsageWarning conditions, and also places where I first check for UsageError conditions. It all depends on the context. If a UsageWarning can safely and sensibly be issued before the UsageErrors, then I do so. Turbo Pascal Extensions Used in RobIn Here is a (complete?) list of Turbo extensions to Standard Pascal that were used in the implementation of RobIn. LongInt Inc, Dec UpCase String, + on strings, Val, Str, Delete Exit, RunError Assign, Close {$I+/-}, IOResult else in a case statement Testing RobIn < STILL INCOMPLETE > Limitations of RobIn ----------- -- ----- RobIn has (of course) some limitations. Here is a brief discussion of some of the known limitations. Diagnostics for Out-of-Order Calls The diagnostics for out-of-order calls to xxxRIF are implicit, rather than explicit. In particular, the following restrictions should be checked explicitly but are not (all calls concern the SAME variable rif: RobInFile): Routine Condition OpenRIF # OpenRIF = # CloseRIF (rif not open) CloseRIF # OpenRIF = # CloseRIF + 1 (rif not closed) ReadRIF same as CloseRIF ReadlnRIF same as CloseRIF AssertedRIF same as CloseRIF IsAbortedRIF # OpenRIF > 0 (rif was opened) Note that "rif is open" does not mean the same thing as "rif.f is open", because a call to OpenRIF may fail to open rif.f. In that case, format checking is aborted, but meta-format checking continues, and a call to CloseRIF is still needed to terminate the checking, before opening another file. Another thing that cannot be diagnosed by RobIn is a missing CloseRIF call. If the using program does not make the call, then RobIn will never know. Similar limitations apply to the pair BeginLogging and EndLogging. The reason that such protection is "hard" to implement in Pascal is that no variables (except file variables) are automatically initialized to some known value. Therefore, RobIn (which does not "own" the RobInFile variable that acts as parameter to xxxRIF) must always rely on the using program to take care of initialization, for example by calling an appropriate initialization routine (in our case, OpenRIF). (If only Pascal had had a single auto-initialized type for variables; one bit would have been enough. :-) Concerning the second limitation (missing call to CloseRIF), the thing is that Turbo Pascal has no way to guarantee that control goes back to RobIn when the program exits. EndLogging does a check, but the call to EndLogging may also be missing. (Delphi does have have that facility.) < STILL INCOMPLETE > Other Tools ----- ----- CheckMsg CheckMsg.pas is a program to help check to check the structure of message files used by RobIn, especially task-specific message files. CheckMsg checks the files named by each of its command-line parameters. If no command-line parameters are given, then it asks for a file name and checks it. Application Framework Concluding Remarks ---------- ------- Here are some open ends. Porting to Linux and/or C Not me. But if you want to give it a try, look under "Design and Implementation of RobIn" for a list of Turbo extensions to Standard Pascal that were used. ReadRealRIF or ReadFloatRIF I will not put in a ReadRealRIF/ReadFloatRIF. For one thing, the use of floating point arithmetic at the IOI is controversial (for several reasons, such as rounding, numerical stability, etc., ask Gyula Horvath). Furthermore, the format for a real/float item must be chosen such that it is common to all IOI programming languages. Its definition is more complicated than that of the items supported now. Acronyms -------- CR Carriage Return (ASCII character 13) HT (Horizontal) Tab (ASCII character 9) IDE Integrated Development Environment I/O Input/Output IOI International Olympiad in Informatics LF Line Feed (ASCII character 10) OS Operating System RobIn Robust Input module (Turbo Pascal Unit) TC Turbo C/C++ by Borland (version 3.0) TP Turbo Pascal by Borland (version 7.0) Sample Programs ------ -------- This appendix contains some programs to illustrate certain details that I did not want to include in the main text. The layout for these programs has been kept compact to improve the overview, at the expense of their maintainability. End-of-Line and Turbo Pascal The following two programs count the number of characters and lines in standard input. Under the ISO Standard Pascal definition, these programs are equivalent, but not under Turbo Pascal. Try the following two possible inputs, each consisting of two lines (^Z denotes Ctrl-Z): abc def ^Z abc def^Z Program TextCounter1 (correctly) reports 6 characters on 2 lines in both cases. Program TextCounter2 reports the same for the first input, but 6 characters and 1 line for the second input, because the outer loop terminates early. This can be compensated by including the line ; if eoln then Inc(lines) after the loop in TextCounter2. Note, however, that the resulting program is no longer correct ISO Pascal (because eoln is inspected when eof is true)! program TextCounter1; var chars, lines: longint; c: char; begin writeln('Count # characters and lines in standard input.') ; writeln('Standard implementation.') ; chars := 0 ; lines := 0 ; while not eof do begin while not eoln do begin read(c) ; Inc(chars) end ; readln ; Inc(lines) end ; writeln('# chars = ', chars:6) ; writeln('# lines = ', lines:6) end. program TextCounter2; var chars, lines: longint; c: char; begin writeln('Count # characters and lines in standard input.') ; writeln('Alternative implementation.') ; chars := 0 ; lines := 0 ; while not eof do begin if eoln then begin readln ; Inc(lines) end else begin read(c) ; Inc(chars) end end { N.B. Turbo Pascal needs "; if eoln then Inc(lines)" here, but this would be wrong in Standard Pascal } ; writeln('# chars = ', chars:6) ; writeln('# lines = ', lines:6) end. ~~~~ NOT YET INCORPORATED ABOVE Item composition mistakes are * multiple consecutive blanks * blanks at begin of line or end of line * empty lines * missing blanks * missing end-of-lines * end-of-lines instead of blanks * blanks instead of end-of-lines * junk at end-of-file