IOI2001 Questionnaire for Delegations Summary of further comments written on 32 returned questionnaires [Remarks in square brackets are mine, Tom Verhoeff] ======= Tasks ----- Idea of output-only tasks OK, but at most one. [Per day? Why?] One output-only task per IOI is a good decision. [Why?] Why does it take so long to prepare the final versions of the tasks? E.g. SCORE was completely changed at 01:30 (AM). [Simple: Bad preparation.] There was insufficient diversity in theme and difficulty of the tasks. [How to improve that?] There should be several tasks (3 or 4) such that 1/2 of the competitors get 100%. The task set was such that very few competitors received 100% on any task. I think this encourages "hacker" approach -- just start coding and incrementally try to get more cases. If the candidates knew that 3 or 4 tasks were completely solvable, they might spend more time thinking and doing analysis. By all means have a couple of open-ended tasks, but also closed deterministic ones. [What do you mean by open-ended tasks?] The test data was far too difficult in most cases. [However, the best competitors could now be distinguished. Don't we want that? Maybe there is no need to distinguish among gold medal winners. I don't believe IOI should strive so much for a unique winner. That is not in the spirit of the olympiads for high schools. We must keep in mind that there is a huge difference in performance between the top and bottom; not everything in the competition is aimed at every competitor.] Time limits should distinguish complexity differences, not small constants. To this end, the limit should be at least 3 times the organizer's run-time. For this reason, I REALLY like the output-only submission. IPSC works well using exclusively output-only. One can also broaden the kind of tasks using output-only. [Time limits serve two purposes: They make the grading finite, and they help distinguish various types of algorithms. For the latter, I would prefer some other budgeting mechanism than time. In some reactive tasks, you can simply count the number of operations (cf. Wires and Switches IOI'95; Median IOI2000) and relate that to the instance size. This is much better. In Java, you might count specific instructions executed by the Java Virtual Machine.] The time limit for MOBILES was too tight. I think that we should not require competitors to squeeze every byte from their data structures to gain the full score. An asymptotically optimal solution programmed in a way that it doesn't waste resources unnecessarily should receive full score without having to resort to tricky optimizations. The MOBILES task is interesting. Having many different scores depending on the quality of the algoritm is great, but it seems that it was not easy at all to see that some solutions would not give 100% scores in less than 1 second. So, some competitors did not even try to find better solutions, even if they would have been able to find some. As a GA, we cannot control national selection procedures. The SC/ISC have done as much as could be expected (maybe more) to eliminate zero scores. Nothing further need be done. [Still, it is important to find out what caused the zero scores. If it is in the preparation of the delegation, then we cannot do much more, except maybe to provide some additional help before the IOI. If it is the consequence of some unfortunate decisions or events concerning the competition, then we do have an obligation to improve that.] The difficulty between the days was not consistent. Day 2 was much easier than Day 1. [This was a deliberate choice of the Host SC "to guarantee early separation at the top". It was not unanimously supported by the ISC. Furthermore, technical difficulties influenced the selection for the first day, e.g. forcing DOUBLE to the second day.] The test data appeared to be step-like instead of smooth. Thus, it tended to lump all O(n^2) algorithms in the same bucket, as opposed to try to distinguish between "better" (more efficient) implementations and less efficient ones. [This was mostly on purpose. The question is whether we want to reward small optimizations and language/compiler differences. It has been the ISC's opinion that this should not be the case.] The notion of "the" correct solution is bogus. There are many correct solutions. The SC almost certainly does not have THE optimal program, although there may not be a better algorithm in terms of big-O notation. The fact that you have an implementation that runs twice (or more) as fast as required is not an issue. The issue is how fast another program that you have runs and how many points you want it to get. Just because of the bad formulation of task SCORE as originally presented, even we were not sure whether we understood the conditions properly... If complex data structures are required to get "good" solutions (e.g. trees), perhaps it is unfair to include languages (like C++) that have more "built-in" support for such data structures. Or, for instance, using Unix system calls from C and not on Windows. Is there a way to "tighten" the features that are available for use? After all, IOI should be more about algorithms and problem solving than tricky programming. [This is indeed a controversial situation, which unfortunately has no straightforward solution. The situation is further complicated by offering multiple languages for expressing algorithms (Pascal, C, C++), and this year by offering multiple development platforms. The use of "real" programming languages in full glory, generally also means that you "drag" in standardized solutions for frequently occurring programming constructs. Some people have strongly objected against restricting the full languages (why limit good competitors in access to the toolbox they are familiar with). We already impose certain limitations (some for practical reasons), such as forbidding the use of auxiliary files. In my opinion, it should be possible to impose further restrictions, if that gives rise to a good problem context (e.g. it may be interesting to see how you can solve a problem without the use of multiplication). However, explaining in sufficient detail what limitations apply for each task is cumbersome. Finally, in reactive tasks, there sometimes is a natural way to control the availability of some operations (e.g. cf. task MEDIAN at IOI2000).] Task descriptions should not be too long, and should be informative. [How long is too long? Do you want to cut out any kind of story? Sometimes, finding the right "model" by abstracting from the story, is considered part of the task.] Tasks should be solvable by all competitors. This year 18 competitors could not get any score at all. At least include a simpler task for them. [The scores are affected by numerous variables, many of them not under control of the organization (incl. proper training and preparation for IOI; psychological pressure, etc.). Even inclusion of a task that only requires a program to add two numbers will still draw some zeroes. For task DOUBLE, one of the cases was the example, for which the output file was in fact given. The only thing the competitor had to do, was inspect the 10 given input cases, recognize the example among them, and submit the given example output file after editing the case number. So, we should also carefully explain to the competitors that there is something out there for which it is close to trivial to score some points. But then, a competitor failing to do so, would feel even more embarrassed. At IOI2000, the competitors got 50 points per day for showing up. That way, there were also no zero scores, guaranteed. The majority of the GA was not happy with that either.] Frankly speaking, the tasks are not at the high school level. They are for specially trained competitors only. Do we want that? [The IOI, like the other science olympiads, is aimed at TALENTED pupils. These usually perform well above the common "high school level". It is also true that there are great differences between the skill levels of the selected competitors. This is also the case in other events, such as professional sports competitions. Training is another matter. One hundred years ago, it was considered un-sportsman-like to train for the Olympic Games, which were intended for amateurs only. Because of the current predictability of IOI competition tasks, it is possible to improve the performance of competitors by special training sessions. The International Biology Olympiad requires that such training should be beneficial to a broader group than just the selected competitors. Making the tasks less predictable might reduce the effect of training. On the other hand, it also takes talent to benefit from training. You cannot expect to train an arbitrary high school student to do well at the IOI.] The test data for DOUBLE allowed a brute force approach to solve 8 cases out of 10. [But not without making some "intelligent" inferences.] The test cases and grading of DEPOT allowed a competitor who just gave the simple solution to get 33 out of 100. [And still, others complain that there are so many zero scores. Apparently, it was not that simple.] Task finalization (approval, confirmation, ...) ---- ------------ Approval procedure for questions: much improved, but it would be good if the GA got a chance to confirm they are happy with the REVISED problems (and to confirm that they are the FINAL versions). [I agree.] I think that knowing the whole set of tasks in advance is much better than having one at a time. On the other hand, I think that the semantical correction stage was pretty bad, mostly because the latest versions came out really late, when a lot of delegations had already finished their translations, and sometimes they were really different from earlier versions. [I agree.] Task approval procedure: good Task approval procedure: okay Task approval procedure: Was there any? It was an improvement over some procedures in the last 6 years. We think the task upgrades (incorporating minor remarks) should be communicated in a more clear way. [I agree.] Better tracking of changes in task descriptions that were introduced while the delegations were translating. [I agree.] This needs to be delegated totally to the GA-appointed ISC. [Not yet; for that, there need to be GA-approved guidelines as well.] We propose that the task approval procedure be done in a time just before the competition. We suggest to follow the table below: 06:00 - 09:30 Translation 10:00 - 15:00 Competition [This was done in the past; it changed at IOI'96. For a good reason, I believe. Getting started on time early in the morning is not that easy. There is much more pressure to do things in hurry, and this leads to mistakes. I believe the tasks, incl. the descriptions, must be prepared more carefully, so that translation can start earlier.] Good, only the preparation of final versions of the tasks after the tasks were approved took too much time. Also for some tasks the differences between versions were not recorded. [I agree.] The task approval process does streamline acceptance. However, the importance of minor changes is lost sometimes. Statements that seemed ambiguous were not changed to be completely clear. [I agree. However, this is the best we could get this year. The final preparation of the task descriptions was not up to ISC standards. There were not enough resources to get more done in less time.] I think screening of tasks by ISC seems to make task selection move more smoothly (than I expected). However, I was surprised at how long it took to "freeze" relatively mnor changes in language. [So was I; this part was not properly prepared.] Evaluation solely by test data requires careful choice of data. Generally, I have no complaints, but for IOIWARI I felt it was extreme to select exactly those 50% of input cases that do badly on a naive greedy strategy. A strategy that works on approx. 40-50% of input instances should get more 12 out of 100 (6 draws). The time is too short to fully understand all issues surrounding the tasks. What about "pick 3 out of 4"? [How to pick the 3? Simply by voting may seem democratic, but is no better guarantee that the real issues are resolved in an appropriate way.] The task approval procedure is OK. However, we need to limit the deadline to finalize the tasks. if we keep on going changing the wordings, the translations will be slow and error prone. Two hours after approval, the wordings should be finalized. We need the opportunity as a GA to discuss CONTENTIOUS minor issues (i.e. issues sparking disagreement that does not require the task to be dropped, e.g. 0.02s time limit) and VOTE if necessary. [Discussion should also be limited, and in some cases it cannot be expected that the GA can resolve the issue in a way that can be implemented in the competition to take place on the next day. It is easy for the GA to decide that they (or its majority) does not accept a certain aspect of a task. It is quite another matter to make sure that all aspects mesh together well enough to get a good competition. Changing one aspect in the last hours before the competition is asking for trouble. What would have happened if the time limit was raised, by democratic voting, to 2 seconds? Who would have re-assessed the test data and possibly changed it, in controlled way?] When voting for major issues, we heard objections to all 3 tasks, then voted for all 3. It would be better to discuss taks 1, then vote for it before discussing task 2. [That was the intention of the proposed procedure. The chairing of these meetings was not done appropriately, and should not be done by the ISC, because of their involvement in the discussions.] Development environment for competitors ----------- ----------- Preference for single development environment depends on what the majority of competitors use; both would be better. ["The competitor" is an elusive concept. We "create" them to a large extent.] Linux preferred: Red Hat (most common variant) Linux preferred: Debian Linux preferred: Debian, kernel 2.4 Linux preferred, but Linux alone probably won't work for the masses. Windows NT-4.0 preferred If single system, then Windows 2000 preferred. However, we prefer to have both Linux and Windows 2000. Availability of both is preferred. Windows 2000 preferred. Provide backup copies of task-specific material (such as example input files, and the source code for task DOUBLE). [I agree.] FreePascal IDE is still very unstable, but newer versions of RHIDE support not only the GNU compilers, but also FPC. So, I propose moving to RHIDE completely or at least providing RHIDE for FP as an alternative. [RHIDE is not under active development and also has its limitation, such as problems under Win2000 etc. The FP IDE has an active development group, but they need feedback from actual users. We might provide them with an IOI environment to help them reproduce our bug reports. That way, they can better see to it that at least under the IOI environment, the IDE is going to be usable. However, the key issue here is that we are dealing with open source software, which requires an active user community to make it work well. If the IOI community is not going to participate actively in this process, then we will not have better IDEs, and we must resort to the alternative of commercial tools. The latter have major drawbacks, such as big, expensive software packages that offer way too much for what is needed in the IOI competition.] Many competitors would like to use Emacs, but could not have it with the configuration they are used to have. So, they chose other editors. Would it be possible, for the next IOI, to be able to bring their own configuration files, or choose between a few ones? [We might consider this. However, I am not in favor of increasing the technicalities at the IOI. If that is really what competitors worry about, then something is wrong with the competition tasks. The tools should be a minor concern. If they are not, then we have moved to much in the direction of a coding contest. I had much rather find a way to overcome this, then to introduce more technicalities.] The website always stated that the Linux installation would have sported a GNOME desktop. Instead, GNUstep was installed, a different interface with a different look-and-feel, keyboard shortcuts, etc. We invested a certain amount of efforts in training our team on the SAME environment (as promised) for IOI2001, and it was worthless. [I must admit that this has escaped the ISC's attention. The ISC should more carefully check that promises are indeed kept.] Grading ------- I think the automatic grading system was a big improvement on earlier years. One suggestion for change: it would be good if competitors could submit programs even if they do not solve small test cases (they could still solve certain special cases). [I agree; ISC did not press enough for that] After-grading support should allow download of full test data and tasks, and our competitors' solutions. [I agree.] Provide two copies of grading info handouts. [Can be done] Provide some statistics of grading results, like some anonymous histograms... [I agree; also after first day?] Complete transcript of communication between interactive tasks and the tester should have been available. The competitors' home directories should be backed up after the competition and made available to the delegations for a much longer time---preferably for a few weeks after the competition via a web/ftp site (e.g. using their passwords from the competition). [I agree; also the organization/ISC might wish to inspect or investigate this material afterwards. I have asked for this on several occasions, but it was rejected.] Web-based submission/printing/... is not much convenient (selection of files using a dialogue instead of just passing a file name etc.). We would suggest a couple of scripts (at least under Linux) available as part of the environment as an alternative to the web interface. Also, logging in to the system twice (once to the workstation and second time to the competition system) should be avoided. [The various interfaces to the grading system have been defined much too late. They were not ready for review in May. Of course, concerning user interfaces, there are many opinions on what is and is not convenient. Besides convenience, one should first consider the principles. We have much to learn.] Precise timing of program execution still ahs its problems, and I think they weren't fully understood yet. Setting time limits of 20ms in case the system measures execution times by SAMPLING once every 10ms, does not seem reasonable. I'll try to write a more detailed technical analysis on this issue and send it by e-mail later. We were already solving these problems for our national olympiad some time ago... [Note that the ISC has specifically expressed its concerns about these timing issues in its May review meeting. However, the matter was never satisfactorily resolved. Your analysis is certainly welcome.] It would be worth considering to do the test compile/test runs during the competition directly on the competitor's computers. This way the submitting should be much "smoother", although it would be possible to trick the system to submit an invalid solution. But I think this cannot do any harm. Anyway, Windows seems to be a problem here. [In its May review, the ISC has specifically requested the Test facility for use by competitors developing under Windows. In that case, it would be too cumbersome to have them reboot their machine under Linux. I would be very much in favor of Linux as only competition environment, since it would greatly simplify a number of issues. The IOI is getting carried away too much by all kinds of "irrelevant" technicalities. This is not the way that computing scientists deal with their problems. Why don't we learn from the other science olympiads, which try to do better in the scientific treatment of their competitors' work.] There were a lot of bugs in the online system: interactive test data not being available on first day, evaluation reports not being available for some competitors on the second day, etc. This was acceptable as "teething" problems on the first attempt (when in the past haven't we had problems with new systems?). But please ensure that all bugs are fixed before next IOI. [First of all, many of the problems were not software bugs, but people "bugs". Second, next IOI we might well have another new system, with mostly new people driving it...] Grading environment: BSD seems more reliable, but may not be possible politically. The grading environment should be the same as ONE of the environments available to the competitors. Both Linux and Windows are preferred as grading execution environment. [I do not know what to make of this.] Home directories of all competitors should be backed up immediately after the competition and they should be available to anyone (at least for some days) after the competition. We want the grading to be done under Linux. And, if grading is done under Linux, it is ABSOLUTELY ESSENTIAL that there will be also Linux available as an option for the competitors, also at IOI2002. [I agree.] Better access to competitor's solutions via the network after the competition. Is this possible also for output-only tasks? [Yes.] Miscellaneous ------------- Questionnaires OK, so long as they continue to have an influence on the competition. The GA computers need to be available for longer. [Fine with me] Please do not have non-official GA meetings. Have a proper meeting with resolutions passed and minuted. [I was surprised that the IOI feedback meeting was "not official". See "Guidelines for IOI Competitions", where it is an official GA meeting.] It is understandable that perhaps the Tampere Hall needed to be closed at certain times. However, the translation room should have been open for use from the morning of arrival day till evening of departure day (except times when Tampere Hall was closed). It may be better to have a fixed system for medals (e.g. 1/12 gold) that does not require interaction with the GA. This removes the need for secrecy of the scores, so scores can be made public. [Unfortunately, the situation is not that simple. We will look into this.] Generically, a feedback session should be about feedback, not a review. [Given the limited time for the meeting, and my experience with past IOIs, I have attempted to structure the meeting and to point out a number of issues on which feedback is desirable. The questionnaires then provide the opportunity for the real feedback. We now at least have this much in writing, and I hope that a follow-up on the IOI mailing list will provide further insights. A one-hour chaotic session with a mix of ideas and complaints would not have helped much, except maybe psychologically to the various speakers, who could feel some kind of relief.] The GA computer room really SHOULD be open also after the Closing Ceremony. We want to announce the results to our friends as soon as possible. [I agree.]