IOI2001 Questionnaire for Delegations

Summary of further comments written on 32 returned questionnaires
[Remarks in square brackets are mine, Tom Verhoeff]
=======

Tasks
-----
Idea of output-only tasks OK, but at most one.  [Per day?  Why?]

One output-only task per IOI is a good decision.  [Why?]

Why does it take so long to prepare the final versions of the tasks?
  E.g. SCORE was completely changed at 01:30 (AM).
  [Simple: Bad preparation.]

There was insufficient diversity in theme and difficulty of the
  tasks.  [How to improve that?]

There should be several tasks (3 or 4) such that 1/2 of the
  competitors get 100%.  The task set was such that very few
  competitors received 100% on any task.  I think this encourages
  "hacker" approach -- just start coding and incrementally try
  to get more cases.  If the candidates knew that 3 or 4 tasks
  were completely solvable, they might spend more time thinking
  and doing analysis.  By all means have a couple of open-ended
  tasks, but also closed deterministic ones.  [What do you mean
  by open-ended tasks?]

The test data was far too difficult in most cases.  [However, the
  best competitors could now be distinguished.  Don't we want that?
  Maybe there is no need to distinguish among gold medal winners.
  I don't believe IOI should strive so much for a unique winner.
  That is not in the spirit of the olympiads for high schools.
  We must keep in mind that there is a huge difference in performance
  between the top and bottom; not everything in the competition is
  aimed at every competitor.]

Time limits should distinguish complexity differences, not small
  constants.  To this end, the limit should be at least 3 times
  the organizer's run-time.  For this reason, I REALLY like the
  output-only submission.  IPSC works well using exclusively
  output-only.  One can also broaden the kind of tasks using
  output-only.  [Time limits serve two purposes: They make the
  grading finite, and they help distinguish various types of
  algorithms.  For the latter, I would prefer some other budgeting
  mechanism than time.  In some reactive tasks, you can
  simply count the number of operations (cf. Wires and Switches
  IOI'95; Median IOI2000) and relate that to the instance size.
  This is much better.  In Java, you might count specific
  instructions executed by the Java Virtual Machine.]

The time limit for MOBILES was too tight.  I think that we should
  not require competitors to squeeze every byte from their data
  structures to gain the full score.  An asymptotically optimal
  solution programmed in a way that it doesn't waste resources
  unnecessarily should receive full score without having to
  resort to tricky optimizations.

The MOBILES task is interesting.   Having many different scores
  depending on the quality of the algoritm is great, but it seems
  that it was not easy at all to see that some solutions would
  not give 100% scores in less than 1 second.  So, some competitors
  did not even try to find better solutions, even if they would
  have been able to find some.

As a GA, we cannot control national selection procedures.  The SC/ISC
  have done as much as could be expected (maybe more) to eliminate
  zero scores.  Nothing further need be done.  [Still, it is
  important to find out what caused the zero scores.  If it is
  in the preparation of the delegation, then we cannot do much
  more, except maybe to provide some additional help before the IOI.
  If it is the consequence of some unfortunate decisions or
  events concerning the competition, then we do have an obligation
  to improve that.]

The difficulty between the days was not consistent.  Day 2 was
  much easier than Day 1.  [This was a deliberate choice of the Host
  SC "to guarantee early separation at the top".  It was not
  unanimously supported by the ISC.  Furthermore, technical
  difficulties influenced the selection for the first day, e.g.
  forcing DOUBLE to the second day.]

The test data appeared to be step-like instead of smooth.  Thus,
  it tended to lump all O(n^2) algorithms in the same bucket,
  as opposed to try to distinguish between "better" (more efficient)
  implementations and less efficient ones.  [This was mostly
  on purpose.  The question is whether we want to reward small
  optimizations and language/compiler differences.  It has been
  the ISC's opinion that this should not be the case.]

The notion of "the" correct solution is bogus.  There are many
  correct solutions.  The SC almost certainly does not have THE
  optimal program, although there may not be a better algorithm
  in terms of big-O notation.  The fact that you have an
  implementation that runs twice (or more) as fast as required
  is not an issue.  The issue is how fast another program that
  you have runs and how many points you want it to get.

Just because of the bad formulation of task SCORE as originally
  presented, even we were not sure whether we understood the
  conditions properly...

If complex data structures are required to get "good" solutions
  (e.g. trees), perhaps it is unfair to include languages (like C++)
  that have more "built-in" support for such data structures.   Or,
  for instance, using Unix system calls from C and not on Windows.
  Is there a way to "tighten" the features that are available
  for use?  After all, IOI should be more about algorithms and
  problem solving than tricky programming.  [This is indeed a
  controversial situation, which unfortunately has no
  straightforward solution.  The situation is further complicated
  by offering multiple languages for expressing algorithms
  (Pascal, C, C++), and this year by offering multiple development
  platforms.  The use of "real" programming languages in full glory,
  generally also means that you "drag" in standardized solutions
  for frequently occurring programming constructs.  Some people
  have strongly objected against restricting the full languages
  (why limit good competitors in access to the toolbox they are
  familiar with).  We already impose certain limitations (some
  for practical reasons), such as forbidding the use of auxiliary
  files.  In my opinion, it should be possible to impose further
  restrictions, if that gives rise to a good problem context
  (e.g. it may be interesting to see how you can solve a problem
  without the use of multiplication).  However, explaining in
  sufficient detail what limitations apply for each task is
  cumbersome.  Finally, in reactive tasks, there sometimes is
  a natural way to control the availability of some operations
  (e.g. cf. task MEDIAN at IOI2000).]

Task descriptions should not be too long, and should be informative.
  [How long is too long?  Do you want to cut out any kind of story?
  Sometimes, finding the right "model" by abstracting from the story,
  is considered part of the task.]

Tasks should be solvable by all competitors.  This year 18 competitors
  could not get any score at all.  At least include a simpler
  task for them.  [The scores are affected by numerous variables,
  many of them not under control of the organization (incl. proper
  training and preparation for IOI; psychological pressure, etc.).
  Even inclusion of a task that only requires a program to add two
  numbers will still draw some zeroes.  For task DOUBLE, one of the
  cases was the example, for which the output file was in fact given.
  The only thing the competitor had to do, was inspect the 10 given
  input cases, recognize the example among them, and submit the given
  example output file after editing the case number.  So, we should
  also carefully explain to the competitors that there is something
  out there for which it is close to trivial to score some points.
  But then, a competitor failing to do so, would feel even more
  embarrassed.  At IOI2000, the competitors got 50 points per day
  for showing up.  That way, there were also no zero scores,
  guaranteed.  The majority of the GA was not happy with that either.]

Frankly speaking, the tasks are not at the high school level.
  They are for specially trained competitors only.  Do we want that?
  [The IOI, like the other science olympiads, is aimed at TALENTED
  pupils.  These usually perform well above the common "high school
  level".  It is also true that there are great differences
  between the skill levels of the selected competitors.  This is
  also the case in other events, such as professional sports
  competitions.  Training is another matter.  One hundred years ago,
  it was considered un-sportsman-like to train for the Olympic Games,
  which were intended for amateurs only.  Because of the current
  predictability of IOI competition tasks, it is possible to
  improve the performance of competitors by special training sessions.
  The International Biology Olympiad requires that such training
  should be beneficial to a broader group than just the selected
  competitors.  Making the tasks less predictable might reduce
  the effect of training.  On the other hand, it also takes talent
  to benefit from training.  You cannot expect to train an arbitrary
  high school student to do well at the IOI.]

The test data for DOUBLE allowed a brute force approach to solve
  8 cases out of 10.  [But not without making some "intelligent"
  inferences.]

The test cases and grading of DEPOT allowed a competitor who just
  gave the simple solution to get 33 out of 100.  [And still,
  others complain that there are so many zero scores.  Apparently,
  it was not that simple.]


Task finalization (approval, confirmation, ...)
---- ------------
Approval procedure for questions: much improved, but it would be
  good if the GA got a chance to confirm they are happy with the
  REVISED problems (and to confirm that they are the FINAL versions).
  [I agree.]

I think that knowing the whole set of tasks in advance is much
  better than having one at a time.  On the other hand, I think
  that the semantical correction stage was pretty bad, mostly
  because the latest versions came out really late, when a lot
  of delegations had already finished their translations, and
  sometimes they were really different from earlier versions.
  [I agree.]

Task approval procedure: good

Task approval procedure: okay

Task approval procedure: Was there any?

It was an improvement over some procedures in the last 6 years.

We think the task upgrades (incorporating minor remarks) should
  be communicated in a more clear way.  [I agree.]

Better tracking of changes in task descriptions that were introduced
  while the delegations were translating.  [I agree.]

This needs to be delegated totally to the GA-appointed ISC.
  [Not yet; for that, there need to be GA-approved guidelines as
  well.]

We propose that the task approval procedure be done in a time just
  before the competition.  We suggest to follow the table below:
    06:00 - 09:30 Translation
    10:00 - 15:00 Competition
  [This was done in the past; it changed at IOI'96.  For a good
  reason, I believe.  Getting started on time early in the morning
  is not that easy.  There is much more pressure to do things in
  hurry, and this leads to mistakes.  I believe the tasks, incl.
  the descriptions, must be prepared more carefully, so that
  translation can start earlier.]

Good, only the preparation of final versions of the tasks after the
  tasks were approved took too much time.  Also for some tasks the
  differences between versions were not recorded.  [I agree.]

The task approval process does streamline acceptance.  However, the
  importance of minor changes is lost sometimes.  Statements that
  seemed ambiguous were not changed to be completely clear.
  [I agree.  However, this is the best we could get this year.
  The final preparation of the task descriptions was not up to ISC
  standards.  There were not enough resources to get more done
  in less time.]

I think screening of tasks by ISC seems to make task selection
  move more smoothly (than I expected).  However, I was surprised
  at how long it took to "freeze" relatively mnor changes in
  language.  [So was I; this part was not properly prepared.]

Evaluation solely by test data requires careful choice of data.
  Generally, I have no complaints, but for IOIWARI I felt it was
  extreme to select exactly those 50% of input cases that do badly
  on a naive greedy strategy.  A strategy that works on approx.
  40-50% of input instances should get more 12 out of 100 (6 draws).

The time is too short to fully understand all issues surrounding
  the tasks.  What about "pick 3 out of 4"?  [How to pick the 3?
  Simply by voting may seem democratic, but is no better guarantee
  that the real issues are resolved in an appropriate way.] 

The task approval procedure is OK.  However, we need to limit the
  deadline to finalize the tasks.  if we keep on going changing the
  wordings, the translations will be slow and error prone.  Two hours
  after approval, the wordings should be finalized.

We need the opportunity as a GA to discuss CONTENTIOUS minor issues
  (i.e. issues sparking disagreement that does not require the task
  to be dropped, e.g. 0.02s time limit) and VOTE if necessary.
  [Discussion should also be limited, and in some cases it cannot
  be expected that the GA can resolve the issue in a way that
  can be implemented in the competition to take place on the next
  day.  It is easy for the GA to decide that they (or its majority)
  does not accept a certain aspect of a task.  It is quite another
  matter to make sure that all aspects mesh together well enough
  to get a good competition.  Changing one aspect in the last hours
  before the competition is asking for trouble.  What would have
  happened if the time limit was raised, by democratic voting,
  to 2 seconds?  Who would have re-assessed the test data and
  possibly changed it, in controlled way?]

When voting for major issues, we heard objections to all 3 tasks,
  then voted for all 3.  It would be better to discuss taks 1,
  then vote for it before discussing task 2.  [That was the intention
  of the proposed procedure.  The chairing of these meetings was
  not done appropriately, and should not be done by the ISC, because
  of their involvement in the discussions.]


Development environment for competitors
----------- -----------
Preference for single development environment depends on what
  the majority of competitors use; both would be better.
  ["The competitor" is an elusive concept.  We "create" them to a
  large extent.]

Linux preferred: Red Hat (most common variant)

Linux preferred: Debian

Linux preferred: Debian, kernel 2.4

Linux preferred, but Linux alone probably won't work for the masses.

Windows NT-4.0 preferred

If single system, then Windows 2000 preferred.  However, we prefer
  to have both Linux and Windows 2000.

Availability of both is preferred.

Windows 2000 preferred.

Provide backup copies of task-specific material (such as example
  input files, and the source code for task DOUBLE).  [I agree.]

FreePascal IDE is still very unstable, but newer versions of RHIDE
  support not only the GNU compilers, but also FPC.  So, I propose
  moving to RHIDE completely or at least providing RHIDE for FP
  as an alternative.  [RHIDE is not under active development and
  also has its limitation, such as problems under Win2000 etc.
  The FP IDE has an active development group, but they need feedback
  from actual users.  We might provide them with an IOI environment
  to help them reproduce our bug reports.  That way, they can
  better see to it that at least under the IOI environment, the IDE
  is going to be usable.  However, the key issue here is that we
  are dealing with open source software, which requires an active user
  community to make it work well.  If the IOI community is not
  going to participate actively in this process, then we will not
  have better IDEs, and we must resort to the alternative of
  commercial tools.  The latter have major drawbacks, such as big,
  expensive software packages that offer way too much for what is
  needed in the IOI competition.]

Many competitors would like to use Emacs, but could not have it with
  the configuration they are used to have.  So, they chose other
  editors.  Would it be possible, for the next IOI, to
  be able to bring their own configuration files, or choose between
  a few ones?  [We might consider this.  However, I am not in favor
  of increasing the technicalities at the IOI.  If that is really
  what competitors worry about, then something is wrong with the
  competition tasks.  The tools should be a minor concern.  If they
  are not, then we have moved to much in the direction of a
  coding contest.  I had much rather find a way to overcome this,
  then to introduce more technicalities.]

The website always stated that the Linux installation would have
  sported a GNOME desktop.  Instead, GNUstep was installed, a
  different interface with a different look-and-feel, keyboard
  shortcuts, etc.  We invested a certain amount of efforts
  in training our team on the SAME environment (as promised)
  for IOI2001, and it was worthless.  [I must admit that this
  has escaped the ISC's attention.  The ISC should more carefully
  check that promises are indeed kept.]


Grading
------- 
I think the automatic grading system was a big improvement on earlier
  years.  One suggestion for change: it would be good if
  competitors could submit programs even if they do not solve small
  test cases (they could still solve certain special cases).
  [I agree; ISC did not press enough for that]

After-grading support should allow download of full test data and
  tasks, and our competitors' solutions. [I agree.]

Provide two copies of grading info handouts.  [Can be done]

Provide some statistics of grading results, like some anonymous
  histograms... [I agree; also after first day?]

Complete transcript of communication between interactive tasks and
  the tester should have been available.

The competitors' home directories should be backed up after the
  competition and made available to the delegations for a much
  longer time---preferably for a few weeks after the competition
  via a web/ftp site (e.g. using their passwords from the competition).
  [I agree; also the organization/ISC might wish to inspect or
  investigate this material afterwards.  I have asked for this on
  several occasions, but it was rejected.]

Web-based submission/printing/... is not much convenient (selection
  of files using a dialogue instead of just passing a file name etc.).
  We would suggest a couple of scripts (at least under Linux)
  available as part of the environment as an alternative to the
  web interface.  Also, logging in to the system twice (once to the
  workstation and second time to the competition system) should be
  avoided.  [The various interfaces to the grading system have been
  defined much too late.  They were not ready for review in May.
  Of course, concerning user interfaces, there are many opinions
  on what is and is not convenient.  Besides convenience, one
  should first consider the principles.  We have much to learn.]

Precise timing of program execution still ahs its problems, and I
  think they weren't fully understood yet.  Setting time limits
  of 20ms in case the system measures execution times by SAMPLING
  once every 10ms, does not seem reasonable.  I'll try to write
  a more detailed technical analysis on this issue and send it
  by e-mail later.  We were already solving these problems for
  our national olympiad some time ago...  [Note that the ISC has
  specifically expressed its concerns about these timing issues
  in its May review meeting.  However, the matter was never
  satisfactorily resolved.  Your analysis is certainly welcome.]

It would be worth considering to do the test compile/test runs
  during the competition directly on the competitor's computers.
  This way the submitting should be much "smoother", although it
  would be possible to trick the system to submit an invalid
  solution.  But I think this cannot do any harm.  Anyway, Windows
  seems to be a problem here.  [In its May review, the ISC has
  specifically requested the Test facility for use by competitors
  developing under Windows.  In that case, it would be too cumbersome
  to have them reboot their machine under Linux.  I would be
  very much in favor of Linux as only competition environment,
  since it would greatly simplify a number of issues.  The IOI
  is getting carried away too much by all kinds of "irrelevant"
  technicalities.  This is not the way that computing scientists
  deal with their problems.  Why don't we learn from the other
  science olympiads, which try to do better in the scientific
  treatment of their competitors' work.]

There were a lot of bugs in the online system: interactive test
  data not being available on first day, evaluation reports not
  being available for some competitors on the second day, etc.
  This was acceptable as "teething" problems on the first attempt
  (when in the past haven't we had problems with new systems?).
  But please ensure that all bugs  are fixed before next IOI.
  [First of all, many of the problems were not software bugs,
  but people "bugs".  Second, next IOI we might well have another
  new system, with mostly new people driving it...]

Grading environment: BSD seems more reliable, but may not be possible
  politically.

The grading environment should be the same as ONE of the
  environments available to the competitors.

Both Linux and Windows are preferred as grading execution
  environment.  [I do not know what to make of this.]

Home directories of all competitors should be backed up immediately
  after the competition and they should be available to anyone
  (at least for some days) after the competition.

We want the grading to be done under Linux.  And, if grading is
  done under Linux, it is ABSOLUTELY ESSENTIAL that there will
  be also Linux available as an option for the competitors,
  also at IOI2002.  [I agree.]

Better access to competitor's solutions via the network after
  the competition.  Is this possible also for output-only tasks?
  [Yes.]


Miscellaneous
-------------
Questionnaires OK, so long as they continue to have an influence
  on the competition.

The GA computers need to be available for longer.  [Fine with me]

Please do not have non-official GA meetings.  Have a proper meeting
  with resolutions passed and minuted.  [I was surprised that the
  IOI feedback meeting was "not official".  See "Guidelines for
  IOI Competitions", where it is an official GA meeting.]

It is understandable that perhaps the Tampere Hall needed to be
  closed at certain times.  However, the translation room should
  have been open for use from the morning of arrival day till
  evening of departure day (except times when Tampere Hall was
  closed).

It may be better to have a fixed system for medals (e.g. 1/12 gold)
  that does not require interaction with the GA.  This removes the
  need for secrecy of the scores, so scores can be made public.
  [Unfortunately, the situation is not that simple.  We will look
  into this.]

Generically, a feedback session should be about feedback, not a
  review.
  [Given the limited time for the meeting, and my experience with
  past IOIs, I have attempted to structure the meeting and to
  point out a number of issues on which feedback is desirable.
  The questionnaires then provide the opportunity for the
  real feedback.  We now at least have this much in writing,
  and I hope that a follow-up on the IOI mailing list will
  provide further insights.  A one-hour chaotic session with a mix
  of ideas and complaints would not have helped much, except maybe
  psychologically to the various speakers, who could feel some
  kind of relief.]

The GA computer room really SHOULD be open also after the Closing
  Ceremony.  We want to announce the results to our friends as soon
  as possible.  [I agree.]