Date of Award
Ph.D. in Business Administration
Management Information Systems
CAINES, Content Analysis and INformation Extraction System, employs an information extraction (IE) methodology to extract unstructured text from the Web. It can create an ontology and a Semantic Web. This research is different from traditional IE systems in that CAINES examines the syntactic and semantic relationships within unstructured text of online business reports. Using CAINES provides more relevant results than manual searching or standard keyword searching. Over most extraction systems, CAINES extensively uses information extraction from natural language, Key Words in Context (KWIC), and semantic analysis. A total of 21 online business reports, averaging about 100 pages long, were used in this study. Based on financial expert opinions, extraction rules were created to extract information, an ontology, and a Semantic Web of data from financial reports. Using CAINES, one can extract information about global and domestic market conditions, market condition impacts, and information about the business outlook. A Semantic Web was created from Merrill Lynch reports, 107,533 rows of data, and displays information regarding mergers, acquisitions, and business segment news between 2007 and 2009. User testing of CAINES resulted in recall of 85.91%, precision of 87.16%, and an F-measure of 86.46%. Speed with CAINES was also greater than manually extracting information. Users agree that CAINES quickly and easily extracts unstructured information from financial reports on the EDGAR database.
Simmons, Lakisha L., "Extraction of ontology and semantic web information from online business reports" (2011). Electronic Theses and Dissertations. 1360.