The Pilot Stress Test Project

May 23 2026

Air France’s pilot school was down to one working station; if it broke, a critical evaluation tool would be lost. They contracted my employer to build a replacement, and we only had a couple of months. The test was used to determine how pilot candidates performed under heavy mental pressure.

The project started with a visit to Air France’s training facility to discuss the project. At the end of the visit I tried the test for 15 minutes, and I was mentally exhausted for the rest of the day.

The original test station consisted of a computer connected to a CRT screen, a joystick, two pedals, and a tape player with speakers. The candidate had to keep a pointer centered with the joystick, use the pedals to turn off lights on either side of the screen, while performing mental calculations in their heads, all simultaneously.

Originally there were four simulation stations, but they broke down over time. Over the years the Air France team used parts from the broken-down stations to keep the others operational. The hardware and the software were early 1990’s technology; the computers ran an i386 processor with 4 MB of RAM. The joystick and pedals were connected to its serial ports, and there was the tape player attached to another serial port. The tape was used to give the oral instructions about the calculation, like “12, minus 5, plus 3, remember the result”. The candidate had to remember 3 numbers and give them to the examiner after the test was completed.

The customer’s team included a psychologist, who would be the principal user of the new assessment tool we were developing. It was clear that re-creating the original stations with updated hardware and software wasn’t practical. Keeping the test as it was would be costly and time-consuming.

With the original 4 stations, it took almost half a day to process each batch of candidates. Each test took about 20 minutes. Now with only a single station left, it took them a full day. It involved sitting down a candidate in the simulation station, giving explanations, and then running the test for 15 minutes. When the test was over, the staff had to transfer the results to a floppy disk. Each run took between 20 and 25 minutes per candidate. This was too slow and expensive.

We had to come up with a replacement, and we had less than two months to do it. Beside the room where the stations were, they had a bigger room with about 20 regular PCs. They used it for the knowledge test: multiple choice questionnaires. We decided to create a brand-new test based on the original; the plan was to use the PCs in the computer room and use a regular game controller for the input. This would be faster and cheaper.

The in-house psychologist wasn’t thrilled about losing the existing simulation; he had accumulated 20 years of data based on the old test. We won him over by allowing him to customize the test and write different scenarios. He would be able to get richer data, and since the new test could be run in parallel, he could gather more data for every candidate by testing them with different scenarios.

We decided to write the new test in Python and to use SDL. All of Air France’s computers ran Windows XP, and I was developing under Linux. It made sense to use SDL for portability; it handled graphics, sounds, and controllers. Reproducing the test was relatively easy; it took me about a week to get the graphics and controller part working; it was effectively a mini-video game.

One of the fun parts of the project was selecting the controller: we bought 3 different models to evaluate. One was a big arcade controller with a joystick and 6 fat buttons, and two gamepad-style controllers; one was the Xbox 360 gamepad, the other a generic USB gamepad.

After a couple of hours of testing with our early prototype it was clear that the arcade controller wouldn’t work. The joystick wasn’t truly analog; it was a stick mounted over 8 buttons, giving it discrete input instead of the gradual response of a real joystick. The Xbox controller was good, but it required a driver to work with Windows XP and Linux. Finally the cheap USB gamepad worked as well as the Xbox controller, but it didn’t require a driver, and cost less than half the price of the Xbox controller. This meant Air France could stock up and just replace them cheaply.

Within a couple of weeks we had a working prototype; we gave the Air France team a demo and they were happy that we had developed something functional so quickly.

The next step was working on the scenarios. The test started off relatively easy, and gradually rose in difficulty to the point of being almost impossible. The psychologist wanted to write different scenarios to vary the level of difficulty and gather more specialized data about the candidates.

To do this I designed a custom language and wrote a parser for it. Each line was a command; the first item was a timestamp, then function name, and the rest of the line had the parameters for the function. At specific moments it could adjust pointer sensitivity, flash lights, or prompt mental calculations. We had about twenty commands the psychologist could mix and match to build custom tests. We wrote a couple of generic scenarios ourselves that matched the original simulation.

Another piece of the puzzle was exporting the data after the test. All the computers were hooked up to a common shared drive so we wrote the raw CSV data there. We created a tool to generate a PDF report from the data, so the school could easily get the details of how well each candidate did during the test.

Halfway through, the client grew worried. We showed the demo a couple of weeks into the project, and we were finishing the scenario interpreter. We didn’t show them anything new for two weeks and they felt that the project was stalling. We were ahead of schedule, but it wasn’t visible progress. We didn’t have clear checkpoints for the customer to verify progress, and didn’t articulate clearly that the interpreter was all or nothing: we had to complete it before we could demonstrate it.

We addressed the situation by setting up a workshop to teach them how to write custom scenarios and show we were still making progress. It was a success on both fronts. They learned how to use the system and felt much more confident that the project would be completed on time.

After the workshop, the Air France team spent a full day writing, testing, and analyzing their first scenario. A full day was much longer than the few hours they expected it to take. This was mostly due to their inexperience; they didn’t realize that translating an idea into a script was not as easy as they thought. Despite their difficulties, they were excited about the new system and the analytical power it gave them.

The delivery and installation also took longer than expected. It took their IT team almost a month to get the project set up and working. Their IT setup was rigid and bureaucratic, turning simple deployments into month‑long ordeals. Delays like this were typical, and a major reason the school hired an external contractor for the development.

I left the company before the first batch of prospective students tried the new test. A friend who worked on the project told me the first session was a success: they were able to get all the candidates to take the test twice, thereby gathering more and better data in less than half the time the old antiquated system took.

I was the principal developer on this project, and it was a ton of fun. I was still green, with only a couple of years of professional experience. It was the first major project I led, and this gave me confidence in my abilities.

That confidence helped me be more ambitious with my career, I could take on complicated projects with tight deadlines.

Sometimes I wonder if the code I wrote almost 20 years ago still runs in the Air France Pilot school.