Perhaps the most frequently asked question is "Why did the National Research Council choose to use such a bizarre method?"
The answer is simply that the NRC conducted the count in the only way it could, given the available data.
First, no comprehensive database exists that incorporates, for example, chapters in multiauthor books, and electronic databases that list books (e.g. WorldCat) do not include any information that would help the NRC distinguish one "J. Smith" from another. The ISI database of journal articles was (and is) the only one it was practical to use, so only journal articles could be counted.
Second, the NRC report was intended specifically to address research, not all scholarship, so review articles, editorial material, book reviews, and other items not part of the primary research literature were excluded.
Third, the problem of ascribing every entry in the ISI database to an individual author was a massive one, and use of zip codes really the only practical solution--the database includes many authors named "J. C. Smith" or "T. M. Roberts" but only one "J. C. Smith at 32306" and one "T. M. Roberts at 32306."
Finally, the structure of the data as ISI delivered it to NRC gave rise to the restriction that only citations within the time window of publications within the time window could be counted. The database consisted of lines of tabular data. Each line corresponded to one author-publication combination. (That is, a one-author paper would appear as a single line of data; a three-author paper would appear three times, once for each author.) Each line consisted of fields containing information on author's identity, paper's identity, journal name, number and order of authors, year of publication, publication type, language of paper, journal name, paper's total number of citations, number of times the paper was cited in each of the years of NRC's time window, and identity of author's institution.
Only papers published during NRC's time window were included in the database--that still yielded a staggering number of entries--and these items were the only data available from which NRC could count publications or citations.
Because the only data on citations included in the database were the numbers of citations received by the publications in the database, the NRC received no data on citations of papers published outside the time window. Because the database, in final form, was produced on the last day of 1992--the end of the NRC time window--the numbers of citations in it cannot include citations that occurred after that date. Because papers cannot be cited before they are published [even as "in press"?], the numbers of citations in the database cannot include any citations that occurred before the time window (that is, earlier than the earliest papers published in the time window.) Therefore, the database contained no information on citations (a) of papers published outside the time window, (b) after the time window of papers published within the time window, or (c) before the time window of any papers at all. The only citations available to be counted were citations within the time window of papers published within the time window.
In addition, because the database contained no information on the identity of any citing paper--just number of citations received--no distinction could be made between self and other citations. All were counted.
Sample data from the NRC database
If you have questions or comments, please contact Anne B. Thistle.