Lessons from the German Tank Problem
During World War II the German army used tanks to devastating advantage. The Allies needed accurate estimates of their tank production and deployment. They used two approaches to find these values: spies, and statistics. This note describes the statistical approach. Assuming the tanks are labeled consecutively starting at 1, if we observe k serial numbers from an unknown number N of tanks, with the maximum observed value m, then the best estimate for N is m(1 + 1/k) - 1. This is now known as the German Tank Problem, and is a terrific example of the applicability of mathematics and statistics in the real world. The first part of the paper reproduces known results, specifically deriving this estimate and comparing its effectiveness to that of the spies. The second part presents a result we have not found in print elsewhere, the generalization to the case where the smallest value is not necessarily 1. We emphasize in detail why we are able to obtain such clean, closed-form expressions for the estimates, and conclude with an appendix highlighting how to use this problem to teach regression and how statistics can help us find functional relationships.
READ FULL TEXT