MORPC works regularly with census data, including but not limited to ACS 5 and 1-year, Decennial Census, PEP, and geographies. The following module is useful for gathering and organizing census data for processes in various workflow. Those workflows are linked when appropriate.
import morpc
API functions and variables¶
api_get() is a low-level wrapper for Census API requests that returns the results as a pandas dataframe. If necessary, it splits the request into several smaller requests to bypass the 50-variable limit imposed by the API.
The resulting dataframe is indexed by GEOID (regardless of whether it was requested) and omits other fields that are not requested but which are returned automatically with each API request (e.g. “state”, “county”)
url = 'https://api.census.gov/data/2022/acs/acs1'
params = {
"get": "GEO_ID,NAME,B01001_001E",
"for": "county:049,041",
"in": "state:39"
}
api = morpc.census.api_get(url, params)
api
American Community Survey (ACS) Data Class¶
When using ACS data, generally we will be digesting data produded using the morpc
The Census ACS Fetch script leverages the acs_data
class form morpc.census
Create an initial object which represents a variable in the ACS data api.¶
The class takes 3 arguments:
- variable group number
- the year
- the type of survey (1 or 5 year estimates)
import morpc
acs = morpc.census.acs_data('B25010', '2023', '5')
The initial call creates queries the Census for the variable definitions and returns a dictionary of the available variables in the group. see acs.VARS
acs.VARS
The initial call alse fetchs a list of dimensions from a cached json file in ./morpc/census/acs_variable_group.json and is stored in morpc.census.ACS_VAR_GROUPS.
Manual verfication for variable dimension names.¶
The list of dimensions are automatically created from the Census Variable labels and need verified before being used. If the dimesion names have not be verified, the will not be stored. Navigate to the JSON and check to make sure that there are the correct number of dimension and that they are in the correct order. Change the verfication field to true
.
acs.DIMENSIONS
Query the API for the deisred variables and geography¶
The .query()
method queries the API and caches the data in memory under acs.DATA
. At the same time it creates a frictionless schema that corrosponds with the data.
scope:¶
These are pre-defined sumlevels and scopes for commonly queried geographies. see morpc.census.SCOPES
.
morpc.census.SCOPES
acs = acs.query(scope='region15-tracts')
data = acs.DATA
data.head()
For custom queries, use for and in parameters to pass to api query.¶
for_param:¶
(optional) The geographies for which to call the the query “state:*” represents all states. “state:39” represent Ohio.
in_param:¶
(optional) A filter for the for parameter. In combinations this allows you do call for small geograhpies inside larger ones.
Examples: for_param=“county:*”, in_param=“state:39” would get all counties in Ohio. for_param=“tract:*”, in_param=‘state:39,county:041,049’ gets all census tracts in Delaware and Franklin Counties.
Filter the variables using the get parameter¶
get_param:¶
(Optional) If you want to return a subset of variables, they can be passed here as a list.
Dimension Tables¶
When the query is called the class makes table with the dimensions included that can be used to get summaries of the data.
This can be used to get quick queries for summaries.
acs.DIM_TABLE.LONG
acs.DIM_TABLE.WIDE.T[('Average household size --', 'Total')].min()
acs.DIM_TABLE.PERCENT
Save raw data (not dim table) as a frictionless resource with schema¶
After querying the data, save the data as a frictionless resource with reasonable descriptors.
acs.save(output_dir='./temp_data/')
acs.SCHEMA
acs.RESOURCE
Load data from cached file¶
import morpc
acs = morpc.census.acs_data('B25010', '2023', '5').load(scope='region15-tracts', dirname='./temp_data/')
morpc.frictionless.load_data | INFO | Loading Frictionless Resource file at location morpc-acs5-2023-region15-tracts-b25010.resource.yaml
morpc.frictionless.load_data | INFO | Using schema path specified in resource file.
morpc.frictionless.load_data | INFO | Loading data, resource file, and schema (if applicable) from their source locations
morpc.frictionless.load_data | INFO | --> Data file: morpc-acs5-2023-region15-tracts-b25010.csv
morpc.frictionless.load_data | INFO | --> Resource file: morpc-acs5-2023-region15-tracts-b25010.resource.yaml
morpc.frictionless.load_data | INFO | --> Schema file: morpc-acs5-2023-region15-tracts-b25010.schema.yaml
morpc.frictionless.load_data | INFO | Loading data.
morpc.frictionless.load_data | INFO | Loading Frictionless Resource file at location ..\..\morpc-geos-collect\output_data\morpc-geos.resource.yaml
morpc.frictionless.load_data | INFO | Ignoring schema as directed by useSchema parameter.
morpc.frictionless.load_data | INFO | Loading data, resource file, and schema (if applicable) from their source locations
morpc.frictionless.load_data | INFO | --> Data file: ..\..\morpc-geos-collect\output_data\morpc-geos.gpkg
morpc.frictionless.load_data | INFO | --> Resource file: ..\..\morpc-geos-collect\output_data\morpc-geos.resource.yaml
morpc.frictionless.load_data | INFO | --> Schema file: Not available. Ignoring schema.
morpc.frictionless.load_data | INFO | Loading data.
morpc.frictionless.load_data | INFO | Skipping casting of field types since we are ignoring schema.
C:\Users\jinskeep\morpc_venv\Lib\site-packages\pyogrio\raw.py:198: RuntimeWarning: driver GPKG does not support open option DRIVER
Georeference the data to map¶
Add geometries by joining GEOS to DATA.
acs.GEOS
import geopandas as gpd
acs.DATA = gpd.GeoDataFrame(acs.DATA.join(acs.GEOS), geometry='geometry')
acs.DATA.explore()
Use the built in .explore() method to view a map of all the columns in data¶
acs.explore()
Below should still be functional, but hoping to implement into ACS class¶
Load the data using frictionless.load_data()¶
data, resource, schema = morpc.frictionless.load_data('./temp_data/morpc-acs5-2023-state-B01001.resource.yaml', verbose=False)
Using ACS_ID_FIELDS to get the fields ids¶
morpc.census.acs_generate_universe_table(data.set_index("GEO_ID"), "B01001_001")
Create a dimension table with the data and the dimension names¶
dim_table = morpc.census.acs_generate_dimension_table(data.set_index("GEO_ID"), schema, idFields=idFields, dimensionNames=["Sex", "Age group"])
dim_table.loc[dim_table['Variable type'] == 'Estimate'].head()
Build ACS Variable Group JSON for Dimension names¶
import requests
r = requests.get('https://api.census.gov/data/2023/acs/acs5/variables.json')
varjson = r.json()
groups = {}
for variable in varjson['variables']:
if variable not in ['for', 'in', 'ucgid', 'GEO_ID', 'AIANHH', 'AIHHTL', 'AIRES', 'ANRC']:
group = varjson['variables'][variable]['group']
if not group[-1].isalpha():
if group not in groups:
groups[group] = {}
groups[group]['concept'] = varjson['variables'][variable]['concept'].replace('Year and Over','Year & Over').replace('Indian and Alaska','Indian & Alaska').replace('Hawaiian and Other','Hawaiian & Other')
groups[group]['dimensions'] = ['TOTAL'] + varjson['variables'][variable]['concept'].replace(' by ',':').replace('Year and Over','Year & Over').replace('Indian and Alaska','Indian & Alaska').replace('Hawaiian and Other','Hawaiian & Other').replace(' and ',':').split(':')
groups[group]['dimensions_verified'] = False
variables = {}
for variable in varjson['variables']:
if varjson['variables'][variable]['group'] == group:
variables[variable] = varjson['variables'][variable]['label']
variables = {k: v for k, v in sorted(variables.items(), key=lambda item: item[0])}
groups[group]['variables'] = variables
groups = {k: v for k, v in sorted(groups.items(), key=lambda item: item[0])}
# import json
# with open('../morpc/census/acs_variable_groups.json', 'w') as file:
# json.dump(groups, file, indent=3)