Ontology guided financial knowledge extraction from semistructured information sources

Bjoraa, Eivind

dc.contributor.author	Bjoraa, Eivind
dc.date.accessioned	2007-07-09T10:25:52Z
dc.date.issued	2003
dc.identifier.uri	http://hdl.handle.net/11250/137543
dc.description	Masteroppgave i informasjons- og kommunikasjonsteknologi 2003 - Høgskolen i Agder, Grimstad	en
dc.description.abstract	Intermedium has an agent searching the Web for financial articles defined by certain criteria, for instance an industrial domain of interest. A portal service for reading and searching these articles, are available for the customers. The sources searched among are secondary sources, like online newspapers. Secondary sources publish information more frequently, and other information than can be found in annual reports etc, like predictions. Finding and comparing financial figures in the articles are often time consuming and hard to compare with each other. Having the financial figures, and what these applies for, presented in an application where information could be easy reviewed and compared, would apply valuable information for decision makers in bigger companies. Web documents are usually semi-structured, and therefore almost impossible to query for information. Only keyword searches are supported by the computers because of the lack of understanding. Advanced extraction processes of the information needs to be performed. This thesis evaluates an ontology guided approach for extracting financial information from semi-structured information sources. A financial ontology has been constructed based on an investigation of 50 articles gathered from Intermedium’s agent. Instances with synonyms, the words to extract from the text, and relations between the instances have been defined. The ontology language RDF has been chosen and used as ontology language through the entire thesis. A prototype application has been developed to perform the extraction process. Articles are loaded from XML files; words to extract from the text are found by query the ontology using the query language RDQL; NLP and NLTK are used to do the extraction based on the words found in the ontology; Velocity template is used to get the proper structure in the output files RDF and XBRL instance document. The ontology is providing the application with knowledge in the extraction process. When a synonym is found in one instance, a query for reference to other instances is performed, and synonyms of these instances are searched for in the text. If a text does not contain any interesting information, the application does not waste time with trying to match all words in the ontology with the ones in the text. The result is presented with semantic tagging in RDF syntax. A part of the information extracted is also shown as an example of how the financial standard XBRL can be given. The advantage of XBRL is that it can be used directly by supporting tools; RDF has to be processed by a more intelligent application. Financial information has in both these formats been added knowledge with computer processable semantic tagging.	en
dc.format.extent	1118922 bytes
dc.format.mimetype	application/pdf
dc.language.iso	eng	en
dc.publisher	Høgskolen i Agder
dc.publisher	Agder University College
dc.subject.classification	IKT590
dc.title	Ontology guided financial knowledge extraction from semistructured information sources	en
dc.type	Master thesis	en
dc.subject.nsi	VDP::Matematikk og naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420::Algoritmer og beregnbarhetsteori: 422
dc.subject.nsi	VDP::Matematikk og naturvitenskap: 400::Informasjons- og kommunikasjonsvitenskap: 420::Kunnskapsbaserte systemer: 425

Files in this item

Name:: master_ikt_2003_bjoraa.pdf
Size:: 1.067Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Master's theses in Information and Communication Technology [505]
MM500, IKT590, IKT591

Show simple item record