Skip to contents

Setup

Example 1: institution locations

For this first request, we’ll search for institution coordinates. Since we don’t know the exact variable names, we can use ipeds_dict(). By default, ipeds_dict() will search variable names, descriptions, and filenames. We’ll start with latitude:

Search dictionary

ipeds_dict("latitude")
#> 
#> ======================================================================
#> VARIABLE: pclatitude
#> ======================================================================
#> 
#> ::::::::::::::::::::          DESCRIPTION         ::::::::::::::::::::
#> 
#> Latitude location of institution
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ IC2021_CAMPUSES*
#>  |__ IC2022_CAMPUSES*
#>  |__ IC2023_CAMPUSES*
#> 
#>  * Denotes a long file in which institutions may have more than one 
#>  record (UNITID values repeated across multiple rows).
#> 
#> ======================================================================
#> VARIABLE: latitude
#> ======================================================================
#> 
#> ::::::::::::::::::::          DESCRIPTION         ::::::::::::::::::::
#> 
#> Latitude location of institution
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ HD2009
#>  |__ HD2010
#>  |__ HD2011
#>  |__ HD2012
#>  |__ HD2013
#>  |__ HD2014
#>  |__ HD2015
#>  |__ HD2016
#>  |__ HD2017
#>  |__ HD2018
#>  |__ HD2019
#>  |__ HD2020
#>  |__ HD2021
#>  |__ HD2022
#>  |__ HD2023
#> 
#> ======================================================================
#> Printed information for 2 of out 2 variables.

The dictionary output shows two variables, pclatitude and latitude. Both have similar descriptions, but latitude has a longer stretch of years available. Now we’ll search for longitude:

ipeds_dict("longitude")
#> 
#> ======================================================================
#> VARIABLE: longitud
#> ======================================================================
#> 
#> ::::::::::::::::::::          DESCRIPTION         ::::::::::::::::::::
#> 
#> Longitude location of institution
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ HD2009
#>  |__ HD2010
#>  |__ HD2011
#>  |__ HD2012
#>  |__ HD2013
#>  |__ HD2014
#>  |__ HD2015
#>  |__ HD2016
#>  |__ HD2017
#>  |__ HD2018
#>  |__ HD2019
#>  |__ HD2020
#>  |__ HD2021
#>  |__ HD2022
#>  |__ HD2023
#> 
#> ======================================================================
#> VARIABLE: pclongitud
#> ======================================================================
#> 
#> ::::::::::::::::::::          DESCRIPTION         ::::::::::::::::::::
#> 
#> Longitude location of institution
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ IC2021_CAMPUSES*
#>  |__ IC2022_CAMPUSES*
#>  |__ IC2023_CAMPUSES*
#> 
#>  * Denotes a long file in which institutions may have more than one 
#>  record (UNITID values repeated across multiple rows).
#> 
#> ======================================================================
#> Printed information for 2 of out 2 variables.

Similar output, but we note that longitude is represented by a variable that drops the final e: longitud. We have our variables.

Pull most recent year

To begin, we pull only the most recent year, which we can see from the output of the data dictionary under ../FILES: 2023. We’ll run the default chain and add a call to dplyr::as_tibble() at the end to covert the data.frame output to a tibble for nicer viewing:

df <- ipeds_init() |>
  ipeds_select(latitude, longitud) |>
  ipeds_year(2023) |> 
  ipeds_get() |>
  as_tibble()
df
#> # A tibble: 6,163 × 4
#>    unitid latitude longitud  year
#>     <int>    <dbl>    <dbl> <int>
#>  1 100654     34.8    -86.6  2023
#>  2 100663     33.5    -86.8  2023
#>  3 100690     32.4    -86.2  2023
#>  4 100706     34.7    -86.6  2023
#>  5 100724     32.4    -86.3  2023
#>  6 100733     33.2    -87.5  2023
#>  7 100751     33.2    -87.5  2023
#>  8 100760     32.9    -85.9  2023
#>  9 100812     34.8    -87.0  2023
#> 10 100830     32.4    -86.2  2023
#> # ℹ 6,153 more rows

The output contains the requested variables, plus the unique institutional unitid, and year associated with the request.

Pull past 10 years

Modifying the request slightly, we can request the past 10 years of data. Most institutions don’t move, but some do and that might be interesting!

df <- ipeds_init() |>
  ipeds_select(latitude, longitud) |>
  ipeds_year(2014:2023) |> 
  ipeds_get() |>
  as_tibble() |>
  arrange(unitid, year)
df
#> # A tibble: 68,572 × 4
#>    unitid latitude longitud  year
#>     <int>    <dbl>    <dbl> <int>
#>  1 100636     32.4    -86.2  2014
#>  2 100654     34.8    -86.6  2014
#>  3 100654     34.8    -86.6  2015
#>  4 100654     34.8    -86.6  2016
#>  5 100654     34.8    -86.6  2017
#>  6 100654     34.8    -86.6  2018
#>  7 100654     34.8    -86.6  2019
#>  8 100654     34.8    -86.6  2020
#>  9 100654     34.8    -86.6  2021
#> 10 100654     34.8    -86.6  2022
#> # ℹ 68,562 more rows

Add other useful variables as well as a filter

In the next step, we can add more variables, including instnm (institution name), and sector, which gives the level (sub two-year, two-year, and four-year) by control (public, private not-for-profit, and private for-profit). We can also limit to institutions in Kentucky by using the filter stabbr == "KY". Because neither the additional variables nor the filtering variables come from different files than those already downloaded and in memory, this request should be faster.

df <- ipeds_init() |>
  ipeds_select(instnm, sector, latitude, longitud) |>
  ipeds_year(2014:2023) |>
  ipeds_filter(stabbr == "KY") |> 
  ipeds_get() |>
  as_tibble()
df
#> # A tibble: 979 × 7
#>    unitid  year instnm                           sector latitude stabbr longitud
#>     <int> <int> <chr>                             <int>    <dbl> <chr>     <dbl>
#>  1 156189  2023 Alice Lloyd College                   2     37.3 KY        -82.9
#>  2 156213  2023 Asbury University                     2     37.9 KY        -84.7
#>  3 156222  2023 Asbury Theological Seminary           2     37.9 KY        -84.7
#>  4 156231  2023 Ashland Community and Technical…      4     38.5 KY        -82.7
#>  5 156286  2023 Bellarmine University                 2     38.2 KY        -85.7
#>  6 156295  2023 Berea College                         2     37.6 KY        -84.3
#>  7 156310  2023 PJ's College of Cosmetology-Bow…      9     37.0 KY        -86.5
#>  8 156338  2023 Southcentral Kentucky Community…      4     37.0 KY        -86.5
#>  9 156356  2023 Brescia University                    2     37.8 KY        -87.1
#> 10 156365  2023 Campbellsville University             2     37.3 KY        -85.3
#> # ℹ 969 more rows

Example 2: exclusively distance undergraduate students

For this example, we’ll search for the number of exclusively distance education undergraduate students.

Searching

We’ll begin by searching for “distance”:

ipeds_dict("distance")
#> 
#> ======================================================================
#> VARIABLE: pdocppde
#> ======================================================================
#> 
#> NOTE: This variable has (2) unique descriptions.
#> 
#> :::::::::::::::::::         DESCRIPTION (1)        :::::::::::::::::::
#> 
#> Doctor's degree-professional practice - all programs in a CIP code
#>  can be completed entirely via distance education
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ C2020DEP*
#>  |__ C2021DEP*
#>  |__ C2022DEP*
#>  |__ C2023DEP*
#> 
#>  * Denotes a long file in which institutions may have more than one 
#>  record (UNITID values repeated across multiple rows).
#> 
#> :::::::::::::::::::         DESCRIPTION (2)        :::::::::::::::::::
#> 
#> Number of Doctor's degree-professional practice programs offered via
#>  distance education
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ C2013DEP*
#>  |__ C2014DEP*
#>  |__ C2015DEP*
#>  |__ C2016DEP*
#>  |__ C2017DEP*
#>  |__ C2018DEP*
#>  |__ C2019DEP*
#> 
#>  * Denotes a long file in which institutions may have more than one 
#>  record (UNITID values repeated across multiple rows).
#> 
#> ======================================================================
#> VARIABLE: distcrs
#> ======================================================================
#> 
#> ::::::::::::::::::::          DESCRIPTION         ::::::::::::::::::::
#> 
#> Distance education courses offered
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ IC2016
#>  |__ IC2017
#>  |__ IC2018
#>  |__ IC2019
#>  |__ IC2020
#>  |__ IC2021
#>  |__ IC2022
#>  |__ IC2023
#> 
#> ======================================================================
#> VARIABLE: slo3
#> ======================================================================
#> 
#> ::::::::::::::::::::          DESCRIPTION         ::::::::::::::::::::
#> 
#> Distance learning opportunities
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ IC2001
#>  |__ IC2002
#>  |__ IC2003
#>  |__ IC2004
#>  |__ IC2005
#>  |__ IC2006
#>  |__ IC2007
#>  |__ IC2008
#>  |__ IC2009
#>  |__ IC2010
#>  |__ IC2011
#> 
#> ======================================================================
#> VARIABLE: pcert2de
#> ======================================================================
#> 
#> NOTE: This variable has (2) unique descriptions.
#> 
#> :::::::::::::::::::         DESCRIPTION (1)        :::::::::::::::::::
#> 
#> Number of 1-year, but less than 2-year certificate programs offered
#>  via distance education
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ C2013DEP*
#>  |__ C2014DEP*
#>  |__ C2015DEP*
#>  |__ C2016DEP*
#>  |__ C2017DEP*
#>  |__ C2018DEP*
#>  |__ C2019DEP*
#> 
#>  * Denotes a long file in which institutions may have more than one 
#>  record (UNITID values repeated across multiple rows).
#> 
#> :::::::::::::::::::         DESCRIPTION (2)        :::::::::::::::::::
#> 
#> Certificates of 1 year, but less than 2 years - all programs in a CIP
#>  code can be completed entirely via distance education
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ C2020DEP*
#>  |__ C2021DEP*
#>  |__ C2022DEP*
#>  |__ C2023DEP*
#> 
#>  * Denotes a long file in which institutions may have more than one 
#>  record (UNITID values repeated across multiple rows).
#> 
#> ======================================================================
#> VARIABLE: dstngc
#> ======================================================================
#> 
#> ::::::::::::::::::::          DESCRIPTION         ::::::::::::::::::::
#> 
#> Graduate level distance education courses offered
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ IC2017
#>  |__ IC2018
#>  |__ IC2019
#>  |__ IC2020
#>  |__ IC2021
#>  |__ IC2022
#>  |__ IC2023
#> 
#> ======================================================================
#> VARIABLE: pcert1bde
#> ======================================================================
#> 
#> ::::::::::::::::::::          DESCRIPTION         ::::::::::::::::::::
#> 
#> Certificates of at least 12 weeks, but less than 1 year - all
#>  programs in a CIP code can be completed entirely via distance
#>  education
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ C2020DEP*
#>  |__ C2021DEP*
#>  |__ C2022DEP*
#>  |__ C2023DEP*
#> 
#>  * Denotes a long file in which institutions may have more than one 
#>  record (UNITID values repeated across multiple rows).
#> 
#> ======================================================================
#> VARIABLE: pbachlde
#> ======================================================================
#> 
#> NOTE: This variable has (2) unique descriptions.
#> 
#> :::::::::::::::::::         DESCRIPTION (1)        :::::::::::::::::::
#> 
#> Number of Bachelor's degree programs offered via distance education
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ C2013DEP*
#>  |__ C2014DEP*
#>  |__ C2015DEP*
#>  |__ C2016DEP*
#>  |__ C2017DEP*
#>  |__ C2018DEP*
#>  |__ C2019DEP*
#> 
#>  * Denotes a long file in which institutions may have more than one 
#>  record (UNITID values repeated across multiple rows).
#> 
#> :::::::::::::::::::         DESCRIPTION (2)        :::::::::::::::::::
#> 
#> Bachelor's degree - all programs in a CIP code can be completed
#>  entirely via distance education
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ C2020DEP*
#>  |__ C2021DEP*
#>  |__ C2022DEP*
#>  |__ C2023DEP*
#> 
#>  * Denotes a long file in which institutions may have more than one 
#>  record (UNITID values repeated across multiple rows).
#> 
#> ======================================================================
#> VARIABLE: pdocrsdes
#> ======================================================================
#> 
#> ::::::::::::::::::::          DESCRIPTION         ::::::::::::::::::::
#> 
#> Doctor's degree-research/scholarship - some programs in a CIP code
#>  can be completed entirely via distance education
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ C2020DEP*
#>  |__ C2021DEP*
#>  |__ C2022DEP*
#>  |__ C2023DEP*
#> 
#>  * Denotes a long file in which institutions may have more than one 
#>  record (UNITID values repeated across multiple rows).
#> 
#> ======================================================================
#> VARIABLE: pctdesom
#> ======================================================================
#> 
#> ::::::::::::::::::::          DESCRIPTION         ::::::::::::::::::::
#> 
#> Percent of students enrolled in some but not all distance education
#>  courses
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ DRVEF2021
#>  |__ DRVEF2022
#>  |__ DRVEF2023
#> 
#> ======================================================================
#> VARIABLE: pcgdesom
#> ======================================================================
#> 
#> ::::::::::::::::::::          DESCRIPTION         ::::::::::::::::::::
#> 
#> Percent of graduate students enrolled in some but not all distance
#>  education courses
#> 
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#> 
#> ../FILES 
#> 
#>  |__ DRVEF2021
#>  |__ DRVEF2022
#>  |__ DRVEF2023
#> 
#> ======================================================================
#> Printed information for 10 of out 70 variables.
#> Increase limit to see more variables.

This yields quite a few results (we’re only shown 10 of 70). However, looking through the list, we find that efdeexc represents, “Students enrolled exclusively in distance education courses” and covers 2011 to 2022. We’ll use this variable.

df <- ipeds_init() |>
  ipeds_select(efdeexc) |>
  ipeds_year(2022) |> 
  ipeds_get() |>
  as_tibble()
df
#> # A tibble: 22,247 × 3
#>    unitid efdeexc  year
#>     <int>   <int> <int>
#>  1 100654     429  2022
#>  2 100654     249  2022
#>  3 100654     246  2022
#>  4 100654       3  2022
#>  5 100654     180  2022
#>  6 100663    5984  2022
#>  7 100663    2436  2022
#>  8 100663    2316  2022
#>  9 100663     120  2022
#> 10 100663    3548  2022
#> # ℹ 22,237 more rows

Looking at our output, we find an issue: unitid values are not unique to each row. Instead, each unitid has multiple values of efdeexc. Looking back at the output from ipeds_dict(), we can see that the file from which these values come, EF2022A_DIST, is long. That means we need to find the variable that uniquely identifies the values in efdeexc.

There are two ways to do this:

  1. Search the filename with the ripeds dictionary and look through the list of possible variables: ipeds_dict("EF2022A_DIST", search_col = "filename", limit = Inf)
  2. Download the dictionary file associated with the complete data file provided by IPEDS: ipeds_download_to_disk(".", "EF2022A_DIST", type = "dictionary")

The advantage of the first is that you don’t have to download extra files (which will unzip into either a Microsoft Excel workbook or HTML file). The benefit of the second is that IPEDS dictionary contain more information than is presented by ipeds_dict().

Check complete file dictionary file and filter

For this example, we’ll download the IPEDS dictionary file.

ipeds_download_to_disk(".", "EF2022A_DIST", type = "dictionary")

Looking in the Frequencies tab of the unzipped file, we can see that efdelev values map to different student levels and combination of levels. To select only undergraduates, we need to set efdelev == 2.

varnumber varname codevalue valuelabel frequency percent
24816 EFDELEV 1 All students total 5,978 26.88
24816 EFDELEV 2 Undergraduate total 5,706 25.66
24816 EFDELEV 3 Undergraduate, degree/certificate-seeking total 5,703 25.64
24816 EFDELEV 11 Undergraduate, non-degree/certificate-seeking 2,803 12.60
24816 EFDELEV 12 Graduate 2,049 9.21

Note that ripeds converts all input to lowercase, so choosing either EFDELEV or efdelev will work. Adding the filter, we successfully return the number of exclusively distance education undergraduate students in 2022.

df <- ipeds_init() |>
  ipeds_select(efdeexc) |>
  ipeds_filter(efdelev == 2) |> 
  ipeds_year(2022) |> 
  ipeds_get() |>
  as_tibble()
df
#> # A tibble: 5,706 × 5
#>    unitid  year efdeexc efdelev file        
#>     <int> <int>   <int>   <int> <chr>       
#>  1 100654  2022     249       2 EF2022A_DIST
#>  2 100663  2022    2436       2 EF2022A_DIST
#>  3 100690  2022     228       2 EF2022A_DIST
#>  4 100706  2022     481       2 EF2022A_DIST
#>  5 100724  2022     133       2 EF2022A_DIST
#>  6 100751  2022    3306       2 EF2022A_DIST
#>  7 100760  2022     392       2 EF2022A_DIST
#>  8 100812  2022    1460       2 EF2022A_DIST
#>  9 100830  2022     582       2 EF2022A_DIST
#> 10 100858  2022    1094       2 EF2022A_DIST
#> # ℹ 5,696 more rows