Geocoding Funnelback results
Background
Funnelback includes capabilities for geospatial search. However for geospatial search to function the data set must be geo-coded. This article provides some advice for retro-fitting geocoding to the data set.
Details
Funnelback can perform basic geospatial searches on geocoded data sets.
The data set must include a meta field that holds the geo-coordinate in the correct format for geospatial search to function correctly.
This coordinate is ideally provided in the source data, however this is not always possible and in some circumstances Funnelback can be configured to geocode the data set as it is downloaded.
Geocoding using Funnleback should generally be avoided as it is a resource intensive process. Sometimes the geo-coordinates are supplied, but not in the correct format. |
The following are the options for geocoding source data sets
-
Decimal lat/long is included in item metadata
-
Geocoding based off a postcode (or suburb)
-
Geocoding based off a street address
Decimal lat/long is included in item metadata
This is the preferred option and the one that should always be explored first.
The data set is modified to include a single decimal latlong for each item (whether it is in a metadata field in a HTML page, external metadata or a field in XML or CSV).
The format of the latlong is correct for Funnelback (in a single field, separated with a semicolon - e.g. <latlong>-24.26785;34.5552</latlong>
or <meta name="latlong" content="-24.26785;34.5552" />
. Note: the field name is unimportant.
This field is mapped to a geospatial metadata class in Funnelback (or type 2 for pre-Funnelback 15.14 metadata).
If the X and Y values are specified separately then filtering will be required to assemble the metadata into a single field. This is the second best option as the filtering required is minimal.
Geocoding based off a postcode (or suburb)
If there is postcode metadata then it is possible to geocode the data based on this value.
Geocoding using a postcode is imprecise due to the varied nature of postcodes (some postcode areas are very large, some are very small). |
The preferred approach in this scenario is to generate a postcode to geocoordinate mapping file that maps the postcode to a geocoordinate. A filter is then written to read the postcode and write in the geocoordinate during the update.
If query time geocoding from a postcode is required then the mappings should be loaded into the data model’s custom metadata.
The postcode-geocoordinate mapping file can be sourced in a number of ways:
-
This can be provided and maintained along with the data, and fetched using update workflow.
-
The Australia Post postcode data file can be licenced.
A script can be written to generate these mappings using one of the geocoding services that are available. Common choices are the Open Street Maps geocoding service and the Google Maps geocoding service (Note: there are limits on the number of daily requests that can be made against this service). This approach is also resource intensive and can impact on the time required to crawl due to one or more http requests being made to the geocoding service for each data record processed. Also consider caching any postcode to geo-coord requests to save on future network traffic and API request. This will also make future updates quicker.
Postcode to geocoordinate mapping resources
-
Australia: http://auspost.com.au/business-solutions/postcode-data.html
-
For lots of countries: https://www.aggdata.com/freedata-category/postal-codes
Geocoding based off a street address
If there is address metadata available then a geocoding service can be used to translate this to a geocoordinate.
A filter will be required to perform the lookup and writing out of the geospatial metadata.
One of the geocoding services can be used (but beware of the limitations detailed above).
Consider caching the lookup values (e.g. once you’ve looked up a given address cache this value so you don’t need to actively look it up next time).