NHANES 2015-2016: Demographic Variables and Sample Weights Data Documentation, Codebook, and Frequencies (2024)

2015-2016 Data Documentation, Codebook, and Frequencies

Demographic Variables and Sample Weights (DEMO_I)

Data File: DEMO_I.xpt

First Published: September 2017
Last Revised: NA

Component Description

The demographics file provides individual, family, and household-level information on the following topics:

  • Survey participant’s household interview and examination status;
  • Interview and examination sample weights;
  • Masked variance units;
  • Language of questionnaires used for the interviews conducted in the household and in the mobile examination center;
  • Use of proxy or interpreter during the interviews;
  • The six-month time period when the examination was performed;
  • Pregnancy status;
  • Household and family income;
  • Household and family sizes;
  • Household composition: the number of children (aged 5 years or younger and 6-17 years old), and adults aged 60 years or older, in the household;
  • Demographic information about the household reference person; and
  • Other selected demographic information, such as gender, age, race/Hispanic origin, education, marital status, military service status, country of birth, citizenship, and years of U.S. residence.

The format and coding for all the variables included in the 2015-2016 NHANES demographics file are identical to those released for the 2013-2014 survey cycle.

Similar to the 2011-2014 cycle, the sample design for NHANES 2015-2016 also includes an oversample of Asian Americans (Table 1). The variable RIDRETH3 is included to describe the participant’s race and Hispanic origin.

Table 1. Unweighted sample size and percentage by race/Hispanic origin, from NHANES 2011-2014 and 2015-2016 for examined participants
Hispanic Non-Hispanic Total
Mexican American Other Hispanic White, single race Black, single race Asian, single race Other, including multiracial persons
2011-2014
n (%)
3,001
(15.7)
1,941
(10.1)
6,379
(33.3)
4,780
(25.0)
2,234
(11.7)
816
(4.3)
19,151
(100.0)
2015-2016
n (%)
1,837
(19.3)
1,232
(12.9)
2,948
(30.9)
2,052
(21.5)
986
(10.3)
489
(5.1)
9,544
(100.0)

Similar to previously released cycles, the 2015-2016 demographics file includes a variable for age in years at screening (RIDAGEYR) for all participants. Age in months at screening (RIDAGEMN) is reported for participants aged 0 to 24 months, and age in months at examination (RIDEXAGM) is reported for participants aged 0 to 19 years only. Due to increasing concerns about potential disclosure risks, information on age in months at screening and at examination for participants in other age groups are no longer included in the public release file but are available through the NCHS Research Data Center (RDC).

Eligible Sample

The target age groups for demographic variables in this file vary by the topic. Please review the codebook carefully.

Interview Setting and Mode of Administration

The family and sample person demographics questionnaires were asked, in the home, by trained interviewers using Computer-Assisted Personal Interview (CAPI) system. The respondent selected the language of interview (English or Spanish) or requested that an interpreter be used. Hand cards, showing response choices or information that survey participants needed to answer the questions, were used for some questions. The hand cards were printed in English, Spanish, Mandarin Chinese (both traditional and simplified), Korean, and Vietnamese. The interviewer directed the respondent to the appropriate hand card during the interview. When necessary, the interviewer further assisted the respondent by reading the response choices listed on the hand cards.

Persons 16 years and older and emancipated minors were interviewed directly. A proxy provided information for survey participants who were under 16 and for participants who could not answer the questions themselves.

The NHANES 2015-2016 demographics questionnaires are available on the NHANES website at: https://wwwn.cdc.gov/nchs/nhanes/continuousnhanes/questionnaires.aspx?BeginYear=2015.

Quality Assurance & Quality Control

The CAPI system is programmed with built-in consistency checks to reduce data entry errors. CAPI also uses online help screens to assist interviewers in defining key terms used in the questionnaire.

After collection, interview data were reviewed by the NHANES field office staff for accuracy and completeness of selected items. The interviewers were required to record interviews periodically and the recorded interviews were reviewed by NCHS staff and interviewer supervisors.

Data Processing and Editing

Frequency counts were checked, “skip” patterns were verified, and the reasonableness of question responses was reviewed. Edits were made to some variables to ensure the completeness, consistency, and analytic usefulness of the data. Edits were also made, when necessary, to address data disclosure concerns.

SDDSRVYR: This variable represents the two-year data release cycle number. A value of “9” denotes NHANES 2015–2016.

RIDSTATR: This status code is used to identify whether a participant was both interviewed at home and examined in the mobile examination center (MEC) or was only interviewed in the home but never went through the examination.

RIDAGEYR: Age in years, at the time of the screening interview, is reported for survey participants between the ages of 1 and 79 years of age. All responses of participants aged 80 years and older are coded as ‘80.’ The reporting of age in single years for adults 80 years and older was determined to be a disclosure risk. In NHANES 2015-2016, the weighted mean age for participants 80 years and older is 85 years.

RIDAGEYR was calculated based on the participant’s date of birth. In rare cases, the actual date of birth was missing but the participant’s age in years was provided, then the reported age was used.

RIDAGEMN: The age in months, at the time of the screening interview, is provided for participants who were less than 25 months of age at the time of examination (RIDEXAGM < 25). If the exact date of birth was not provided by the respondent, the age in months was calculated based on the imputed age in years at the time of the screening interview.

RIDEXAGM: The age in months, at the time of examination, is provided for participants who were less than 240 months of age at the time of examination (RIDEXAGM < 240).

RIDEXMON: This variable indicates the six-month time period when the examination was performed. A value of “1” indicates November 1st through April 30th; a value of “2” indicates May 1st through October 31st.

RIDRETH3: This is the race-ethnicity variable included in the demographics file since the 2011-2012 survey cycle to accommodate the oversample of Asian Americans. It was derived from responses to the survey questions on race and Hispanic origin. Respondents who self-identified as “Mexican American” were coded as such (i.e., RIDRETH3=1) regardless of their other race-ethnicity identities. Otherwise, self-identified “Hispanic” ethnicity would result in code “2, Other Hispanic” in the RIDRETH3 variable. All other non-Hispanic participants would then be categorized based on their self-reported races: non-Hispanic white (RIDRETH3=3), non-Hispanic black (RIDRETH3=4), non-Hispanic Asian (RIDRETH3=6), and other non-Hispanic races including non-Hispanic multiracial (RIDRETH3=7). Code “5” was not used in RIDRETH3.

RIDRETH1: This is the race-ethnicity variable that can be linked to the previous NHANES race-ethnicity variable in 1999-2010. Non-Hispanic Asian participants are grouped with other non-Hispanic races in code “5” (other non-Hispanic race including non-Hispanic multiracial) in RIDRETH1. Codes “6” and “7” were not used in RIDRETH1. Coding procedure for other categories in RIDRETH1 was compatible to RIDRETH3.

DMDBORN4: Due to the concerns of disclosure risk, starting in 2011, country of birth was recoded into two categories: 1) Born in 50 U.S. states or Washington, DC; and 2) Born in other countries, including U.S. territories.

DMDCITZN: Citizenship status is reported using two codes: 1) Citizen by birth or naturalization; or 2) Not a citizen of the U.S. Persons who were born in the U.S. or U.S. territories who acquired citizenship at birth were coded as U.S. citizens.

DMDMARTL: The marital status question was asked of persons 14 years of age and older. Due to disclosure risks, marital status is only released for persons 20 years of age and older.

RIDEXPRG: Pregnancy status at the time of the health examination was ascertained for females 8–59 years of age. Due to disclosure risks pregnancy status is only released for women 20-44 years of age. The information used to code RIDEXPRG values included self-reported pregnancy status and urine pregnancy test results. Persons who reported they were pregnant at the time of exam were assumed to be pregnant (RIDEXPRG=1). Those who reported they were not pregnant or did not know their pregnancy status were further classified based on the results of the urine pregnancy test. If the respondent reported “no” or “don’t know” and the urine test result was positive, the respondent was coded as pregnant (RIDEXPRG=1). If the respondent reported “no” and the urine test was negative, the respondent was coded not pregnant (RIDEXPRG=2). If the respondent reported did not know her pregnancy status and the urine test was negative, the respondent was coded "could not be determined” (RIDEXPRG=3). Persons who were interviewed, but not examined also have an RIDEXPRG value = 3 (could not be determined).

DMDYRSUS: This variable is the number of years the participant has lived in the United States. Participants who were born outside the U.S. were asked the month and year when they came to the U.S. to live (DMQ.160). A small number of records were imputed because the participant did not report the month of their arrival. A month value of 7 (July) was used to impute DMDYRSUS for these respondents. The responses to the question were recoded into 9 categories ranging from less than one year to 50 years or more.

DMDEDUC3: This variable provides information on the highest grade or level of education completed by participants 6-19 years of age. The responses were re-coded as follows: single years of education (grades 1-12), high school graduate/GED, and post-high school. Codes “55” (less than 5th grade) and “66” (less than 9th grade) were used to categorize older youth who had very low education levels.

DMDEDUC2: This variable is the highest grade or level of education completed by adults 20 years and older. The response categories are: less than 9th grade education, 9-11th grade education (includes 12th grade and no diploma), High school graduate/GED, some college or associates (AA) degree, and college graduate or higher.

DMQMILIZ: This is a variable included in the demographics file since the 2011-2012 survey cycle to provide information on whether the participant has ever served on active duty in the U.S. Armed Forces, military Reserves, or National Guard. Active duty does not include training for the Reserves or National Guard, but does include activation, for service in the U.S. or in a foreign country, in support of military or humanitarian operations.

Prior to 2011, the veteran status information (released in the variable DMQMILIT in the demographics file in 1999-2010) was collected in the survey using a question with different wording that asked if the participant had served in the U.S. Armed Forces.

DMQADFC: For participants who reported having served on active duty in the U.S. Armed Forces, this variable denotes whether the participant has ever served in a foreign country during a time of armed conflict or on a humanitarian or peace-keeping mission. This would include National Guard or reserve or active duty monitoring or conducting peace keeping operations in Bosnia and Kosovo, in the Sinai between Egypt and Israel, or in response to the 2004 tsunami or Haiti in 2010.

Similar to the 2013-2014 survey cycle, there is more detailed information on veterans collected in NHANES. Additional information for these veterans is available through the NCHS RDC.

SIALANG: This variable indicates the language (English or Spanish) used during the sample person questionnaire interview conducted at the participant’s home.

SIAPROXY: This variable denotes whether a proxy respondent was used during the sample person questionnaire interview.

SIAINTRP: This variable denotes whether an interpreter was used during the sample person questionnaire interview. The language spoken by the respondent is only available through the NCHS RDC.

FIALANG: This variable indicates the language used during the family questionnaire interview conducted at the participant’s home.

FIAPROXY: This variable denotes whether a proxy respondent was used to complete the family questionnaire interview.

FIAINTRP: This variable denotes whether an interpreter was used to complete the family questionnaire interview. The language spoken by the respondent is only available through the NCHS RDC.

MIALANG: This variable indicates the language (English or Spanish) used for the CAPI portion of the MEC interview.

MIAPROXY: This variable denotes whether a proxy respondent was used during the CAPI portion of the MEC interview.

MIAINTRP: This variable denotes whether an interpreter was used during the CAPI portion of the MEC interview. The language spoken by the respondent is only available through the NCHS RDC.

AIALANGZ: This variable indicates the language used for the audio-computer-assisted self-interviewing (ACASI) portion of the MEC interview. Starting 2011, the ACASI portion was translated into Chinese (traditional/Mandarin, simplified/Mandarin, and traditional/Cantonese), Korean, and Vietnamese to accommodate the Asian oversampling. The three categories reported are: 1) English, 2) Spanish, and 3) Asian languages.

INDFMIN2: This variable indicates the total annual family income or annual individual income (for households with one person or households comprised of unrelated individuals). A family is defined as a group of two people or more (one of whom is the householder) related by birth, marriage, or adoption and residing together.

During the household interview, the respondent was asked to report total income for the entire family (or individual) in the last calendar year in dollars. The reported dollar amount was re-coded into range values.

If the respondent was not willing or able to provide an exact dollar figure, the interviewer asked an additional question to determine whether the income was < $20,000 or ≥ $20,000. Based on the respondent’s answer to this question, he/she was asked to select a category of income from a list on a hand card. For respondents who selected a category of income, their family incomes were set as the midpoints of the selected ranges. If the respondent was unable to report greater detail than < $20,000 or ≥ $20,000, then these two categories were used to report the family (or individual) income.

INDFMPIR: This variable is the ratio of family income to poverty. The Department of Health and Human Services (HHS) poverty guidelines were used as the poverty measure to calculate this ratio. These guidelines are issued each year, in the Federal Register, for determining financial eligibility for certain federal programs, such as Head Start, Supplemental Nutrition Assistance Program (SNAP), Special Supplemental Nutrition Program for Women, Infants, and Children (WIC), and the National School Lunch Program. The poverty guidelines vary by family size and geographic location (with different guidelines for the 48 contiguous states and the District of Columbia; Alaska; and Hawaii).

INDFMPIR was calculated by dividing family (or individual) income by the poverty guidelines specific to the survey year. The value was not computed if the respondent only reported income as < $20,000 or ≥ $20,000. If family income was reported as a more detailed category, the midpoint of the range was used to compute the ratio. Values at or above 5.00 were coded as 5.00 or more because of disclosure concerns. The values were not computed if the income data was missing.

INDHHIN2: This variable indicates the total annual household income in dollar ranges. If a household was comprised of a single family or individual, the reported family income was used as household income as well. When more than one family, or one or more unrelated individuals, or a combination of a family and unrelated individuals resided in the household, the total household income was calculated by the sum of all reported family and/or individual income values. Please see above notes on variable INDFMIN2 for details on how the amounts of family income were determined.

When more than one family, or one or more unrelated individuals, or a combination of a family and unrelated individuals resided in the same household, they were asked to provide a total income estimate for the entire household, using similar questions as were used for family income. This estimated household income value was only used when: 1) the family income value was missing for one or more families in the household; and 2) the estimated value was equal or more than the sum of all known family incomes from the household. If different respondents in the household provided different estimates, the largest value was used. If none of the respondents provided a valid household income estimate, but the sum of known family and/or individual incomes was at least $100,000, then INDHHIN2 was categorized as “$100,000 and over.”

Similar to the family income category coding, the “$20,000 and over” and “under $20,000” categories were only used when no other valid value estimates were provided.

DMDFMSIZ: This variable is the number of people in the participant’s family. A family is defined as a group of people related by birth, marriage, or adoption and residing together. Due to disclosure concerns, families that are comprised of 7 or more people are included in the category that is labeled ‘7 or more’.

DMDHHSIZ: This variable is the number of people in the participant’s household. The values for this variable range from 1 to 7. Due to disclosure concerns, households that are comprised of 7 or more people are included in the category that is labeled ‘7 or more’.

DMDHHSZA: This variable is the number of children aged 5 years or younger living in the participant’s household. The values for this variable range from 0 to 3. Due to disclosure concerns, households that are comprised of 3 or more children aged 5 years or younger are included in the category that is labeled ‘3 or more’.

DMDHHSZB: This variable is the number of children aged 6-17 years old living in the participant’s household. The values for this variable range from 0 to 4. Due to disclosure concerns, households that are comprised of 4 or more children aged 6-17 years are included in the category that is labeled ‘4 or more’.

DMDHHSZE: This variable is the number of adults aged 60 years or older living in the participant’s household. The values for this variable range from 0 to 3. Due to disclosure concerns, households that are comprised of 3 or more adults aged 60 years or older are included in the category that is labeled ‘3 or more’.

Household Reference Person: The household reference person is defined as the first household member 18 years of age or older listed on the household member roster, who owns or rents the residence where members of the household reside. The household reference person is comparable to “family reference person” in NHANES programs prior to 1999. Analysts frequently use information about the reference persons to characterize the socioeconomic status of the households where survey participants reside. The demographics file includes information on the household reference person’s gender (DMDHRGND), age (DMDHRAGE), country of birth (DMDHRBR4), education level (DMDHREDU), and marital status (DMDHRMAR). Additionally, information on the education level of the household reference person’s spouse is included (DMDHSEDU).

Analytic Notes

As aforementioned, the sample design for NHANES 2011-16 includes an oversample of Asian Americans. For more details on sample design and related analytic issues, please refer to the NHANES Analytic Guidelines available at: https://wwwn.cdc.gov/nchs/nhanes/analyticguidelines.aspx.

Age at screening: Age at screening was used to determine eligibility for an examination component and should be used for most analyses. However, when analyzing anthropometric data on children and youth from birth through 19 years, age in months at MEC examination was often the recommended age variable for analyses. To further facilitate these analyses, a variable, BMDBMIC, has been created as part of the Body Measures Exam file to provide analysts pre-computed BMI categories for children and adolescents aged 2 to 19 years at examination. For further details refer to the Body Measures Data File and Documentation.

DMDMARTL: Marital status is only released for persons 20 years of age and older because of potential disclosure risks. Prior to 2007, marital status was released for participants aged 14 and older. In NHANES 2015-2016, the number of married persons aged 14-19 is less than 2%.

RIDEXPRG: Because of possible disclosure risks, pregnancy status is only released for women aged 20-44 years. The percentage of pregnant women/girls aged 8-19 or 45-59 years is less than 1% in the 2015-2016 dataset.

Masked Variance Units (MVUs): Fifteen masked variance strata and 30 masked primary sampling units (PSUs) are included in the 2015-2016 NHANES demographics file. Each stratum has 2 PSUs. These MVUs are a collection of secondary sampling units that are aggregated into groups for the purpose of variance estimation. The variance estimates that are produced, using the MVUs, closely approximate the variances that would have been estimated using the “true” sample design variance units that are based on the actual survey sample strata and primary sampling units. MVUs are used to protect the confidentiality of information provided by survey participants and to reduce disclosure risks. The use of MVUs is described in the NHANES Analytic Guidelines. Analysts should review the Guidelines carefully prior to analyzing the survey data.

Sample Weights: The 2-year sample weights (WTINT2YR, WTMEC2YR) should be used for all NHANES 2015-2016 analyses. Detailed instructions for combining datasets from previous NHANES cycles are provided in the NHANES Analytic Guidelines.

Please also refer to the NHANES Analytic Guidelines and the on-line NHANES Tutorial for further details on the use of sample weights and other analytic issues.

Disclosure risks and issues pertaining to confidentiality protection prevent NCHS from releasing all of the NHANES demographic variables publicly. Additional information may be accessed through the NCHS RDC. Instructions for requesting use of these data are available from https://www.cdc.gov/rdc/.

NHANES 2015-2016: Demographic Variables and Sample Weights Data Documentation, Codebook, and Frequencies (2024)
Top Articles
Latest Posts
Article information

Author: Kareem Mueller DO

Last Updated:

Views: 6157

Rating: 4.6 / 5 (66 voted)

Reviews: 81% of readers found this page helpful

Author information

Name: Kareem Mueller DO

Birthday: 1997-01-04

Address: Apt. 156 12935 Runolfsdottir Mission, Greenfort, MN 74384-6749

Phone: +16704982844747

Job: Corporate Administration Planner

Hobby: Mountain biking, Jewelry making, Stone skipping, Lacemaking, Knife making, Scrapbooking, Letterboxing

Introduction: My name is Kareem Mueller DO, I am a vivacious, super, thoughtful, excited, handsome, beautiful, combative person who loves writing and wants to share my knowledge and understanding with you.