Electronic Theses and Dissertations

Date of Award


Document Type


Degree Name

Ph.D. in Business Administration


Management Information Systems

First Advisor

Sumali Conlon

Second Advisor

Walter Davis

Third Advisor

Tony Ammeter

Relational Format



CAINES, Content Analysis and INformation Extraction System, employs an information extraction (IE) methodology to extract unstructured text from the Web. It can create an ontology and a Semantic Web. This research is different from traditional IE systems in that CAINES examines the syntactic and semantic relationships within unstructured text of online business reports. Using CAINES provides more relevant results than manual searching or standard keyword searching. Over most extraction systems, CAINES extensively uses information extraction from natural language, Key Words in Context (KWIC), and semantic analysis. A total of 21 online business reports, averaging about 100 pages long, were used in this study. Based on financial expert opinions, extraction rules were created to extract information, an ontology, and a Semantic Web of data from financial reports. Using CAINES, one can extract information about global and domestic market conditions, market condition impacts, and information about the business outlook. A Semantic Web was created from Merrill Lynch reports, 107,533 rows of data, and displays information regarding mergers, acquisitions, and business segment news between 2007 and 2009. User testing of CAINES resulted in recall of 85.91%, precision of 87.16%, and an F-measure of 86.46%. Speed with CAINES was also greater than manually extracting information. Users agree that CAINES quickly and easily extracts unstructured information from financial reports on the EDGAR database.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.