Hypothesis 3

Changes in the use of population terminology in biomedical abstracts

While our first set of analyses established that “diversity” is increasing in the biomedical literature, we also wanted to account for how terminology related to population testing has changed over time. For example, Panofsky and Bliss (2017) have found that racial and ethnic terms (including those within the US Census and OMB Directive 15 labeling schema) have been replaced by more general geographic terms such continental, national, and directional terminology. Below, we test how these sets of terms vary over time using computational text analysis. We started by posing this hypothesis:

H3: The use of population terminology has increased in biomedical abstracts since 1990.

To carry out this analysis, we created a dictionary with various sets of terms corresponding to the existing literature. This dictionary includes a comprehensive, though not necessarily exhaustive, list of around 2,000 continental, subcontinental, national, directional, ancestry, and OMB/US Census terms. Additionally, we compiled a list of an additional ~4,500 ethnic, tribal, and caste terms that we umbrella under the category of “subnational.” In addition to an interactive tree diagram that visualizes these term categories, we have also included a searchable table to explore which terms are included in each category set. While future work will need to explore how these categories overlap and intertwine, the forthcoming analyses simply demonstrate how these categories vary over time in our sample.

In Figure 3A, we see that the raw growth trends in the use of population terminology have grown tremendously over time in the PubMed/MEDLINE sample. In the top red line, we see that when all 6,600+ population terms are combined together into one set, the growth trends increase from just around 8,000 in 1990 to more than 95,000 mentions in 2020. As we can see in the orange and yellow lines, the majority of population terminology is a result of national, continental, and subnational terms being used more often. While subcontinental, directional, racial/ethnic, and OMB/US Census terms do rise, these trends never top more than 10,000 instances in a given year.

Looking at these trends as proportions, Figure 3B again demonstrates that the overall growth of all population terms (12.4% in 1990 to 26.0% in 2020) is mostly the result of growth in national and continental terms, which increase from 5.5% to 19.0% and from 1.8% to 5.7% respectively. Most of the subcontinental, subnational, directional, racial/ethnic, and OMB/US Census categories have exhibited more subtle increases over time. These results suggest that the use of population terms is increasing over time, and that this mostly due to national and continental terms. This is consistent with the Panofsky and Bliss (2017) who, in a much smaller sample of publications from Nature Genetics, also find that national and continental terms are becoming the vernacular of choice among leading biomedical scholars when conducting population difference testing.


Here is a list of the 6,600+ distinct terms that were collapsed into the “all population terms” category in these analyses. You can use the search box to find a specific term you are interested in. Please note that this dictionary is still in progress and the sub/categories are both fluid and imperfect.