Load spatial data¶
Often we want to make a copy of some input data and work with the copy, for example to protect the original data or to create an archival copy of it so that we can replicate the process later. With tabular data this is simple, but with spatial data it can be tricky. Shapefiles actually consist of up to six files, so it is necessary to copy them all. Geodatabases may contain many layers in addition to the one we care about. The load_spatial_data()
function simplifies the process of reading the data and (optionally) making an archival copy. It has three parameters:
sourcePath
- The path to the geospatial data. It may be a file path or URL. In the case of a Shapefile, this should point to the .shp file or a zipped file that contains all of the Shapefile components. You can point to other zipped contents as well, but see caveats below.layerName
(required for GPKG and GDB, optional for SHP) - The name of the layer that you wish to extract from a GeoPackage or File Geodatabase. Not required for Shapefiles, but may be specified for use in the archival copy (see below)driverName
(required for zipped data or data with non-standard file extension) - which GDAL driver to use to read the file. Script will attempt to infer this from the file extension, but you must specify it if the data is zipped, if the file extension is non-standard, or if the extension cannot be determined from the path (e.g. if the path is an API query)archiveDir
(optional) - The path to the directory where a copy of a data should be archived. If this is specified, the data will be archived in this location as a GeoPackage. The function will determine the file name and layer name from the specified parameters, using generic values if necessary.archiveFileName
(optional) - IfarchiveDir
is specified, you may use this to specify the name of the archival GeoPackage. Omit the extension. If this is unspecified, the function will assign the file name automatically using a generic value if necessary.
The following example loads data from the MORPC Mid-Ohio Open Data website, however you can also load data from a local path or network drive.
import geopandas as gpd
import morpc
import os
url = 'https://www2.census.gov/geo/tiger/TIGER2024/METDIV/tl_2024_us_metdiv.zip'
gdf = morpc.load_spatial_data(url, archiveDir='./temp_data')
morpc.load_spatial_data | INFO | Loading spatial data from location: https://www2.census.gov/geo/tiger/TIGER2024/METDIV/tl_2024_us_metdiv.zip
morpc.load_spatial_data | INFO | Attempting to load data from Census FTP site. Using wget to archive file.
morpc.load_spatial_data | WARNING | Data from Census FTP must be temp saved. Using ./temp_data.
morpc.load_spatial_data | INFO | Using driver Census Shapefile as specified by user.
morpc.load_spatial_data | INFO | Reading spatial data...
morpc.load_spatial_data | INFO | File name is unspecified. Will infer file name from source path.
morpc.load_spatial_data | INFO | Using automatically-selected file name: tl_2024_us_metdiv
morpc.load_spatial_data | INFO | Layer name is unspecified. Using automatically-selected layer name: tl_2024_us_metdiv
morpc.load_spatial_data | INFO | Creating archival copy of geospatial layer at ./temp_data\tl_2024_us_metdiv.gpkg, layer tl_2024_us_metdiv
C:\Users\jinskeep\morpc_venv\Lib\site-packages\pyogrio\raw.py:198: RuntimeWarning: driver ESRI Shapefile does not support open option DRIVER
# Create a directory to store the archival data (for demonstration purposes only)
if not os.path.exists("./temp_data"):
os.makedirs("./temp_data")
# Load the data and create an archival copy
gdf = morpc.load_spatial_data(
sourcePath="https://opendata.arcgis.com/api/v3/datasets/e42b50fbd17a47739c2a7695778c498e_17/downloads/data?format=shp&spatialRefId=3735&where=1%3D1",
layerName="MORPC MPO Boundary",
driverName="ESRI Shapefile",
archiveDir="./temp_data"
)
Let’s take a look at the data and make sure it loaded correctly.
gdf.drop(columns="Updated").explore() ## avoid datetime column JSON error
Now let’s read the archival copy and make sure it looks the same. We’ll use the load_spatial_data()
function again, but this time we won’t make an archival copy.
gdfArchive = morpc.load_spatial_data("./temp_data/MORPC MPO Boundary.gpkg", layerName="MORPC MPO Boundary")
Assign geographic identifiers¶
Sometimes we have a set of locations and we would like to know what geography (county, zipcode, etc.) they fall in. The assign_geo_identifiers()
function takes a set of georeference points and a list of geography levels and determines for each level which area each point falls in. The function takes two parameters:
points
- a GeoPandas GeoDataFrame consisting of the points of interestgeographies
- A Python list of one or more strings in which each element corresponds to a geography level. You can specify as many levels as you want from the following list, however note that the function must download the polygons and perform the analysis for each level so if you specify many levels it may take a long time.- “county” - County (Census TIGER)
- “tract” - Not currently implemented
- “blockgroup” - Not currently implemented
- “block” - Not currently implemented
- “zcta” - Not currently implemented
- “place” - Census place (Census TIGER)
- “placecombo” - Not currently implemented
- “juris” - Not currently implemented
- “region15County” - Not currently implemented
- “region10County” - Not currently implemented
- “regionCORPO” - Not currently implemented
- “regionMPO” - Not currently implemented
NOTE: Many of the geography levels are not currently implemented. They are being implemented as they are needed. If you need one that has not yet been implemented, please contact Adam Porr (or implement it yourself).
In the following example, we will assign labels for the “county” and “place” geography levels to libraries in MORPC’s Points of Interest layer. First we’ll download just the library locations from Mid-Ohio Open Data using the ArcGIS REST API.
url = "https://services1.arcgis.com/EjjnBtwS9ivTGI8x/arcgis/rest/services/Points_of_Interest/FeatureServer/0/query?outFields=*&where=%22type%22=%27Library%27&f=geojson"
librariesRaw = gpd.read_file(url)
The data incudes a bunch of fields that we don’t need. For clarity, extract only the relevant fields.
libraries = librariesRaw.copy().filter(items=['NAME', 'ADDRESS','geometry'], axis="columns")
libraries.head()
Let’s take a look at the library locations.
libraries.explore(style_kwds={"radius":4})
Use the assign_geo_identifiers()
function to iterate through the requested geography levels (in this case “county” and “place”), labeling each point with the identifier of the geography in each level where the point is located.
Assign Geographic Identifiers¶
This fuction is broken due to changes at the Census which prevents loading TigerLINE files from FTP site.
librariesEnriched = morpc.assign_geo_identifiers(libraries, ["county","place"])
Note that two columns have been added to the dataframe, one that contains the identifier for the county the library is located in and one that contains the identifier for the place.
librariesEnriched.head()
Let’s take a look at libraries, symbolizing each according to the county where it is located.
librariesEnriched.explore(column="id_county", style_kwds={"radius":4})
Let’s take another look, this time symbolizing each library according to the place where it is located. The legend has been suppressed because there are too many unique values, but you can hover over each point to see the place identifier that has been assigned to it.
librariesEnriched.explore(column="id_place", style_kwds={"radius":4}, legend=True)