Identifying Geographic Terms within Natural Language Text
Master thesis
Permanent lenke
http://hdl.handle.net/11250/137025Utgivelsesdato
2008Metadata
Vis full innførselSamlinger
Sammendrag
The huge amount of textual data available in digital form in today’s world increases
the need for methods that facilitate ease of access and navigability. Automatic
extraction of keywords from text bodies is one promising approach. However,
the relevance of keywords are context dependent, and extracting relevant
keywords often requires a semantic analysis, simply because words may have different
meanings in different contexts. It is well-known that resolving such word
sense ambiguity automatically can be very challenging. When the topic of interest
is geographic information, important keywords would be geographic terms
like countries, cities, counties and states.
This thesis presents a probabilistic method for automatic identification of geographic
terms within natural language text. The method uses a database of geographic
terms to identify possible geographic entities. In contrast to state of
the art, we resolve semantic ambiguity by using a Bayesian classifier that takes
the context of ambiguous words into account. In our empirical results, we report a
geographic term identification accuracy of 90%. We thus believe that the approach
we present can be of importance for those working within the field of text analysis
and data-mining, when accurate geographic term identification is of importance.
Beskrivelse
Masteroppgave i informasjons- og kommunikasjonsteknologi 2008 – Universitetet i Agder, Grimstad