Geographic information extraction from textual data sources, called geoparsing, is a key task in text processing and central to subsequent spatial analysis approaches. Several geoparsers are available to support this task, each with its own (often limited or specialized) gazetteer and its own approaches to toponym detection (finding place mentions) and resolution (linking place mentions to unique spatial references).
With HeidelPlace, we developed an extensible geoparsing framework that includes a generic gazetteer model and an implementation of the entire geoparsing process. The generic gazetteer model supports the integration and management of heterogeneous place information from different knowledge bases. With modularization and an annotation-based processing pipeline as central design principles, we enable an extensible framework and promote a transparent implementation of the entire geoparsing process. By reconfiguring and reusing existing modules as well as integrating newly developed modules, users are supported in experimenting with new geoparsing approaches and application domains. To further increase the efficiency of developing new geoparsing methods, we include user-friendly GUIs that make the gazetteer model and the geoparsing process transparent to the user.
The current source-code of HeidelPlace can be downloaded here: HeidelPlace_1.0.0 (a publication on Github is planned in the near future). It is licened under the GNU 3 license. For demonstration purposes, HeidelPlace includes code to create a gazetteer based on GeoNames. We provide a dump of GeoNames, downloaded on September 7th, 2016: Geonames_7.9.2016
- Java 8 (u121+ should be used to avoid JavaFx related issues in case you want to use the GUIs)
- PostgreSQL 9.4+ (a different database management system may be used, but code adjustments are required)
- Storage: approx. 6 GB for storing the provided GeoNames dump and 13 GB for storing the generated gazetteer with several indices installed (so approx. 20 GB in total).
Installing, Configuring, and Running HeidelPlace
- Download the source-code and unpack it
- Download the provided GeoNames dump from Geonames_7.9.2016 (or a newer one from the GeoNames website, if you desire so)
- Follow the GeoNames installation instructions to load GeoNames into PostgreSQL
- Follow the installation instructions in Getting Started with HeidelPlace to install HeidelPlace using an Eclipse development environment.
To get a feeling of how the gazetteer and geoparser works, we provide screenshots of the GUIs „Gazetteer Viewer“ and „Geoparser Viewer“: