Example 1: institution locations
For this first request, we’ll search for institution coordinates.
Since we don’t know the exact variable names, we can use
ipeds_dict()
. By default, ipeds_dict()
will
search variable names, descriptions, and filenames. We’ll start with
latitude:
Search dictionary
ipeds_dict("latitude")
#>
#> ======================================================================
#> VARIABLE: pclatitude
#> ======================================================================
#>
#> :::::::::::::::::::: DESCRIPTION ::::::::::::::::::::
#>
#> Latitude location of institution
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ IC2021_CAMPUSES*
#> |__ IC2022_CAMPUSES*
#> |__ IC2023_CAMPUSES*
#>
#> * Denotes a long file in which institutions may have more than one
#> record (UNITID values repeated across multiple rows).
#>
#> ======================================================================
#> VARIABLE: latitude
#> ======================================================================
#>
#> :::::::::::::::::::: DESCRIPTION ::::::::::::::::::::
#>
#> Latitude location of institution
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ HD2009
#> |__ HD2010
#> |__ HD2011
#> |__ HD2012
#> |__ HD2013
#> |__ HD2014
#> |__ HD2015
#> |__ HD2016
#> |__ HD2017
#> |__ HD2018
#> |__ HD2019
#> |__ HD2020
#> |__ HD2021
#> |__ HD2022
#> |__ HD2023
#>
#> ======================================================================
#> Printed information for 2 of out 2 variables.
The dictionary output shows two variables, pclatitude and latitude. Both have similar descriptions, but latitude has a longer stretch of years available. Now we’ll search for longitude:
ipeds_dict("longitude")
#>
#> ======================================================================
#> VARIABLE: longitud
#> ======================================================================
#>
#> :::::::::::::::::::: DESCRIPTION ::::::::::::::::::::
#>
#> Longitude location of institution
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ HD2009
#> |__ HD2010
#> |__ HD2011
#> |__ HD2012
#> |__ HD2013
#> |__ HD2014
#> |__ HD2015
#> |__ HD2016
#> |__ HD2017
#> |__ HD2018
#> |__ HD2019
#> |__ HD2020
#> |__ HD2021
#> |__ HD2022
#> |__ HD2023
#>
#> ======================================================================
#> VARIABLE: pclongitud
#> ======================================================================
#>
#> :::::::::::::::::::: DESCRIPTION ::::::::::::::::::::
#>
#> Longitude location of institution
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ IC2021_CAMPUSES*
#> |__ IC2022_CAMPUSES*
#> |__ IC2023_CAMPUSES*
#>
#> * Denotes a long file in which institutions may have more than one
#> record (UNITID values repeated across multiple rows).
#>
#> ======================================================================
#> Printed information for 2 of out 2 variables.
Similar output, but we note that longitude is represented by a variable that drops the final e: longitud. We have our variables.
Pull most recent year
To begin, we pull only the most recent year, which we can see from
the output of the data dictionary under ../FILES
: 2023.
We’ll run the default chain and add a call to
dplyr::as_tibble()
at the end to covert the data.frame
output to a tibble for nicer viewing:
df <- ipeds_init() |>
ipeds_select(latitude, longitud) |>
ipeds_year(2023) |>
ipeds_get() |>
as_tibble()
df
#> # A tibble: 6,163 × 4
#> unitid latitude longitud year
#> <int> <dbl> <dbl> <int>
#> 1 100654 34.8 -86.6 2023
#> 2 100663 33.5 -86.8 2023
#> 3 100690 32.4 -86.2 2023
#> 4 100706 34.7 -86.6 2023
#> 5 100724 32.4 -86.3 2023
#> 6 100733 33.2 -87.5 2023
#> 7 100751 33.2 -87.5 2023
#> 8 100760 32.9 -85.9 2023
#> 9 100812 34.8 -87.0 2023
#> 10 100830 32.4 -86.2 2023
#> # ℹ 6,153 more rows
The output contains the requested variables, plus the unique institutional unitid, and year associated with the request.
Pull past 10 years
Modifying the request slightly, we can request the past 10 years of data. Most institutions don’t move, but some do and that might be interesting!
df <- ipeds_init() |>
ipeds_select(latitude, longitud) |>
ipeds_year(2014:2023) |>
ipeds_get() |>
as_tibble() |>
arrange(unitid, year)
df
#> # A tibble: 68,572 × 4
#> unitid latitude longitud year
#> <int> <dbl> <dbl> <int>
#> 1 100636 32.4 -86.2 2014
#> 2 100654 34.8 -86.6 2014
#> 3 100654 34.8 -86.6 2015
#> 4 100654 34.8 -86.6 2016
#> 5 100654 34.8 -86.6 2017
#> 6 100654 34.8 -86.6 2018
#> 7 100654 34.8 -86.6 2019
#> 8 100654 34.8 -86.6 2020
#> 9 100654 34.8 -86.6 2021
#> 10 100654 34.8 -86.6 2022
#> # ℹ 68,562 more rows
Add other useful variables as well as a filter
In the next step, we can add more variables, including
instnm (institution name), and sector, which gives the
level (sub two-year, two-year, and four-year) by control (public,
private not-for-profit, and private for-profit). We can also limit to
institutions in Kentucky by using the filter
stabbr == "KY"
. Because neither the additional variables
nor the filtering variables come from different files than those already
downloaded and in memory, this request should be faster.
df <- ipeds_init() |>
ipeds_select(instnm, sector, latitude, longitud) |>
ipeds_year(2014:2023) |>
ipeds_filter(stabbr == "KY") |>
ipeds_get() |>
as_tibble()
df
#> # A tibble: 979 × 7
#> unitid year instnm sector latitude stabbr longitud
#> <int> <int> <chr> <int> <dbl> <chr> <dbl>
#> 1 156189 2023 Alice Lloyd College 2 37.3 KY -82.9
#> 2 156213 2023 Asbury University 2 37.9 KY -84.7
#> 3 156222 2023 Asbury Theological Seminary 2 37.9 KY -84.7
#> 4 156231 2023 Ashland Community and Technical… 4 38.5 KY -82.7
#> 5 156286 2023 Bellarmine University 2 38.2 KY -85.7
#> 6 156295 2023 Berea College 2 37.6 KY -84.3
#> 7 156310 2023 PJ's College of Cosmetology-Bow… 9 37.0 KY -86.5
#> 8 156338 2023 Southcentral Kentucky Community… 4 37.0 KY -86.5
#> 9 156356 2023 Brescia University 2 37.8 KY -87.1
#> 10 156365 2023 Campbellsville University 2 37.3 KY -85.3
#> # ℹ 969 more rows
Example 2: exclusively distance undergraduate students
For this example, we’ll search for the number of exclusively distance education undergraduate students.
Searching
We’ll begin by searching for “distance”:
ipeds_dict("distance")
#>
#> ======================================================================
#> VARIABLE: pdocppde
#> ======================================================================
#>
#> NOTE: This variable has (2) unique descriptions.
#>
#> ::::::::::::::::::: DESCRIPTION (1) :::::::::::::::::::
#>
#> Doctor's degree-professional practice - all programs in a CIP code
#> can be completed entirely via distance education
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ C2020DEP*
#> |__ C2021DEP*
#> |__ C2022DEP*
#> |__ C2023DEP*
#>
#> * Denotes a long file in which institutions may have more than one
#> record (UNITID values repeated across multiple rows).
#>
#> ::::::::::::::::::: DESCRIPTION (2) :::::::::::::::::::
#>
#> Number of Doctor's degree-professional practice programs offered via
#> distance education
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ C2013DEP*
#> |__ C2014DEP*
#> |__ C2015DEP*
#> |__ C2016DEP*
#> |__ C2017DEP*
#> |__ C2018DEP*
#> |__ C2019DEP*
#>
#> * Denotes a long file in which institutions may have more than one
#> record (UNITID values repeated across multiple rows).
#>
#> ======================================================================
#> VARIABLE: distcrs
#> ======================================================================
#>
#> :::::::::::::::::::: DESCRIPTION ::::::::::::::::::::
#>
#> Distance education courses offered
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ IC2016
#> |__ IC2017
#> |__ IC2018
#> |__ IC2019
#> |__ IC2020
#> |__ IC2021
#> |__ IC2022
#> |__ IC2023
#>
#> ======================================================================
#> VARIABLE: slo3
#> ======================================================================
#>
#> :::::::::::::::::::: DESCRIPTION ::::::::::::::::::::
#>
#> Distance learning opportunities
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ IC2001
#> |__ IC2002
#> |__ IC2003
#> |__ IC2004
#> |__ IC2005
#> |__ IC2006
#> |__ IC2007
#> |__ IC2008
#> |__ IC2009
#> |__ IC2010
#> |__ IC2011
#>
#> ======================================================================
#> VARIABLE: pcert2de
#> ======================================================================
#>
#> NOTE: This variable has (2) unique descriptions.
#>
#> ::::::::::::::::::: DESCRIPTION (1) :::::::::::::::::::
#>
#> Number of 1-year, but less than 2-year certificate programs offered
#> via distance education
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ C2013DEP*
#> |__ C2014DEP*
#> |__ C2015DEP*
#> |__ C2016DEP*
#> |__ C2017DEP*
#> |__ C2018DEP*
#> |__ C2019DEP*
#>
#> * Denotes a long file in which institutions may have more than one
#> record (UNITID values repeated across multiple rows).
#>
#> ::::::::::::::::::: DESCRIPTION (2) :::::::::::::::::::
#>
#> Certificates of 1 year, but less than 2 years - all programs in a CIP
#> code can be completed entirely via distance education
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ C2020DEP*
#> |__ C2021DEP*
#> |__ C2022DEP*
#> |__ C2023DEP*
#>
#> * Denotes a long file in which institutions may have more than one
#> record (UNITID values repeated across multiple rows).
#>
#> ======================================================================
#> VARIABLE: dstngc
#> ======================================================================
#>
#> :::::::::::::::::::: DESCRIPTION ::::::::::::::::::::
#>
#> Graduate level distance education courses offered
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ IC2017
#> |__ IC2018
#> |__ IC2019
#> |__ IC2020
#> |__ IC2021
#> |__ IC2022
#> |__ IC2023
#>
#> ======================================================================
#> VARIABLE: pcert1bde
#> ======================================================================
#>
#> :::::::::::::::::::: DESCRIPTION ::::::::::::::::::::
#>
#> Certificates of at least 12 weeks, but less than 1 year - all
#> programs in a CIP code can be completed entirely via distance
#> education
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ C2020DEP*
#> |__ C2021DEP*
#> |__ C2022DEP*
#> |__ C2023DEP*
#>
#> * Denotes a long file in which institutions may have more than one
#> record (UNITID values repeated across multiple rows).
#>
#> ======================================================================
#> VARIABLE: pbachlde
#> ======================================================================
#>
#> NOTE: This variable has (2) unique descriptions.
#>
#> ::::::::::::::::::: DESCRIPTION (1) :::::::::::::::::::
#>
#> Number of Bachelor's degree programs offered via distance education
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ C2013DEP*
#> |__ C2014DEP*
#> |__ C2015DEP*
#> |__ C2016DEP*
#> |__ C2017DEP*
#> |__ C2018DEP*
#> |__ C2019DEP*
#>
#> * Denotes a long file in which institutions may have more than one
#> record (UNITID values repeated across multiple rows).
#>
#> ::::::::::::::::::: DESCRIPTION (2) :::::::::::::::::::
#>
#> Bachelor's degree - all programs in a CIP code can be completed
#> entirely via distance education
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ C2020DEP*
#> |__ C2021DEP*
#> |__ C2022DEP*
#> |__ C2023DEP*
#>
#> * Denotes a long file in which institutions may have more than one
#> record (UNITID values repeated across multiple rows).
#>
#> ======================================================================
#> VARIABLE: pdocrsdes
#> ======================================================================
#>
#> :::::::::::::::::::: DESCRIPTION ::::::::::::::::::::
#>
#> Doctor's degree-research/scholarship - some programs in a CIP code
#> can be completed entirely via distance education
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ C2020DEP*
#> |__ C2021DEP*
#> |__ C2022DEP*
#> |__ C2023DEP*
#>
#> * Denotes a long file in which institutions may have more than one
#> record (UNITID values repeated across multiple rows).
#>
#> ======================================================================
#> VARIABLE: pctdesom
#> ======================================================================
#>
#> :::::::::::::::::::: DESCRIPTION ::::::::::::::::::::
#>
#> Percent of students enrolled in some but not all distance education
#> courses
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ DRVEF2021
#> |__ DRVEF2022
#> |__ DRVEF2023
#>
#> ======================================================================
#> VARIABLE: pcgdesom
#> ======================================================================
#>
#> :::::::::::::::::::: DESCRIPTION ::::::::::::::::::::
#>
#> Percent of graduate students enrolled in some but not all distance
#> education courses
#>
#> ::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::::
#>
#> ../FILES
#>
#> |__ DRVEF2021
#> |__ DRVEF2022
#> |__ DRVEF2023
#>
#> ======================================================================
#> Printed information for 10 of out 70 variables.
#> Increase limit to see more variables.
This yields quite a few results (we’re only shown 10 of 70). However, looking through the list, we find that efdeexc represents, “Students enrolled exclusively in distance education courses” and covers 2011 to 2022. We’ll use this variable.
df <- ipeds_init() |>
ipeds_select(efdeexc) |>
ipeds_year(2022) |>
ipeds_get() |>
as_tibble()
df
#> # A tibble: 22,247 × 3
#> unitid efdeexc year
#> <int> <int> <int>
#> 1 100654 429 2022
#> 2 100654 249 2022
#> 3 100654 246 2022
#> 4 100654 3 2022
#> 5 100654 180 2022
#> 6 100663 5984 2022
#> 7 100663 2436 2022
#> 8 100663 2316 2022
#> 9 100663 120 2022
#> 10 100663 3548 2022
#> # ℹ 22,237 more rows
Looking at our output, we find an issue: unitid values are
not unique to each row. Instead, each unitid has multiple
values of efdeexc. Looking back at the output from
ipeds_dict()
, we can see that the file from which these
values come, EF2022A_DIST
, is long. That means we need to
find the variable that uniquely identifies the values in
efdeexc.
There are two ways to do this:
- Search the filename with the
ripeds
dictionary and look through the list of possible variables:ipeds_dict("EF2022A_DIST", search_col = "filename", limit = Inf)
- Download the dictionary file associated with the complete data file
provided by IPEDS:
ipeds_download_to_disk(".", "EF2022A_DIST", type = "dictionary")
The advantage of the first is that you don’t have to download extra
files (which will unzip into either a Microsoft Excel workbook or HTML
file). The benefit of the second is that IPEDS dictionary contain more
information than is presented by ipeds_dict()
.
Check complete file dictionary file and filter
For this example, we’ll download the IPEDS dictionary file.
ipeds_download_to_disk(".", "EF2022A_DIST", type = "dictionary")
Looking in the Frequencies tab of the
unzipped file, we can see that efdelev values map to different
student levels and combination of levels. To select only undergraduates,
we need to set efdelev == 2
.
varnumber | varname | codevalue | valuelabel | frequency | percent |
---|---|---|---|---|---|
24816 | EFDELEV | 1 | All students total | 5,978 | 26.88 |
24816 | EFDELEV | 2 | Undergraduate total | 5,706 | 25.66 |
24816 | EFDELEV | 3 | Undergraduate, degree/certificate-seeking total | 5,703 | 25.64 |
24816 | EFDELEV | 11 | Undergraduate, non-degree/certificate-seeking | 2,803 | 12.60 |
24816 | EFDELEV | 12 | Graduate | 2,049 | 9.21 |
Note that ripeds
converts all input to lowercase, so
choosing either EFDELEV
or efdelev
will work.
Adding the filter, we successfully return the number of exclusively
distance education undergraduate students in 2022.
df <- ipeds_init() |>
ipeds_select(efdeexc) |>
ipeds_filter(efdelev == 2) |>
ipeds_year(2022) |>
ipeds_get() |>
as_tibble()
df
#> # A tibble: 5,706 × 5
#> unitid year efdeexc efdelev file
#> <int> <int> <int> <int> <chr>
#> 1 100654 2022 249 2 EF2022A_DIST
#> 2 100663 2022 2436 2 EF2022A_DIST
#> 3 100690 2022 228 2 EF2022A_DIST
#> 4 100706 2022 481 2 EF2022A_DIST
#> 5 100724 2022 133 2 EF2022A_DIST
#> 6 100751 2022 3306 2 EF2022A_DIST
#> 7 100760 2022 392 2 EF2022A_DIST
#> 8 100812 2022 1460 2 EF2022A_DIST
#> 9 100830 2022 582 2 EF2022A_DIST
#> 10 100858 2022 1094 2 EF2022A_DIST
#> # ℹ 5,696 more rows