The Guardian has a story about how statisticians correctly estimated the production of German Tanks during World War II, despite having only limited intelligence. Can any mathematician shed light on why the formula works?
The statisticians had one key piece of information, which was the serial numbers on captured mark V tanks. The statisticians believed that the Germans, being Germans, had logically numbered their tanks in the order in which they were produced. And this deduction turned out to be right. It was enough to enable them to make an estimate of the total number of tanks that had been produced up to any given moment.
The basic idea was that the highest serial number among the captured tanks could be used to calculate the overall total. The German tanks were numbered as follows: 1, 2, 3 … N, where N was the desired total number of tanks produced. Imagine that they had captured five tanks, with serial numbers 20, 31, 43, 78 and 92. They now had a sample of five, with a maximum serial number of 92. Call the sample size S and the maximum serial number M. After some experimentation with other series, the statisticians reckoned that a good estimator of the number of tanks would probably be provided by the simple equation (M-1)(S+1)/S. In the example given, this translates to (92-1)(5+1)/5, which is equal to 109.2. Therefore the estimate of tanks produced at that time would be 109