“Age Heaping,” Numeracy, and Human Capital

Important economic and managerial concepts like social capital and human capital, as latent variables, are difficult to discern in aggregate or historical data. My very clever friend (and occasional O&M commenter) Brian A’Hearn suggests that “age-heaping” — rounding up or down one’s self-reported age to the nearest five or zero — may be a good proxy for human capital. From Brian’s paper with Jörg Baten and Dorothee Crayen, “Quantifying Quantitative Literacy: Age Heaping and the History of Human Capital”:

As signature ability can proxy for literacy, so accuracy of age awareness can proxy for numeracy, and for human capital more generally. A society in which individuals know their age only approximately is a society in which life is governed not by the calendar and the clock but by the seasonal cycle, in which birth dates are not recorded by families or authorities, in which numerical age is not a criterion for access to privileges (e.g. voting, office-holding, marriage, holy orders) or for the imposition of responsibilities (such as military service or taxation), in which individuals who know their birth year have difficulty accurately calculating their age from the current year. Within a society, the least educated and those with the least interaction with state, religious, or other administrative bureaucracies will be least likely to know their age accurately. Age awareness thus tells us something about both the individual and he society he or she inhabits. Approximation in age awareness manifests itself in the phenomenon of age heaping in self-reported age data. Individuals lacking certain knowledge of their age rarely state this openly, but choose instead a figure they deem plausible. They do not choose randomly, but have a systematic tendency to prefer “attractive” numbers, such as those ending in 5 or 0, or even numbers, or in some societies numbers with other specific terminal digits. Age heaping can be assessed from any sufficiently numerous source of age data: census returns, tombstones, necrologies, muster lists, legal records, or tax data, for example. While care must be exercised in ascertaining possible biases, such data are in principle available much more widely than signature rates and other proxies for human capital.

Brian and his coauthors use age-heaping data to generate estimates of human capital in Europe over a long period of time, finding substantial increases in human capital just before the Industrial Revolution.

  • 1. Chihmao Hsieh  |  24 January 2007 at 1:39 am

    This is very interesting.

    The notion of using inaccuracies in self-reports of biographical or demographic information to proxy for a knowledge-based variable is something I’ve tried in my research as well. Recently, I have used a particular dataset where tens of thousands of individuals trained in science or engineering have been surveyed about their work history and educational backgrounds. Believe it or not, there is a small percentage of individuals who actually report a set of majors and degrees in one year, and then two years later omit (or report as a replacement) a different major or degree. My argument was that this indicated deficiencies in respondents’ long-term memory retrieval capabilities, since the list of categories of majors never changed from one survey administration to the next.

    I found evidence suggesting that my proxy for long-term memory capabilities was reasonably accurate, but more than a few dear colleagues have found it difficult to accept. When I tried to explain to them that some individuals trained in science or engineering appeared to have trouble recalling their major over the years, they looked at me quizically (“That’s impossible!”)… even when I showed that the coding scheme used to enter the survey data left data entry error highly unlikely.

    I would guess that these same good colleagues of mine would have a difficult time accepting that people can’t reliably report their own age!

