Enterprise Search for Pharmacometric Documents : A Feature and Performance Evaluation

Detta är en Uppsats för yrkesexamina på avancerad nivå från Uppsala universitet/Institutionen för biologisk grundutbildning

Sammanfattning: Information retrieval within a company can be referred to as enterprise search. With the use of enterprise search, employees can find the information they need in company internal data. If a business can take advantage of the knowledge within the organization, it can save time and effort, and be a source for innovation and development within the company.  In this project, two open source search engines, Recoll and Apache Solr, are selected, set up, and evaluated based on requirements and needs at the pharmacometric consulting company Pharmetheus AB. A requirement analysis is performed to collect system requirements at the company. Through a literature survey, two candidate search engines are selected. Lastly, a Proof of Concept is performed to demonstrate the feasibility of the search engines at the company. The search tools are evaluated on criteria including indexing performance, search functionality and configurability. This thesis presents assessment questions to be used when evaluating a search tool. It is shown that the indexing time for both Recoll and Apache Solr appears to scale linearly for less than one hundred thousand pdf documents. The benefit of an index is demonstrated when search times for both search engines greatly outperforms the Linux command-line tools grep and find. It is also explained how the strict folder structure and naming conventions at the company can be used in Recoll to only index specific documents and sub-parts of a file share. Furthermore, I demonstrate how the Recoll web GUI can be modified to include functionality for filtering on document type.  The results show that Recoll meets most of the company’s system requirements and for that reason it could serve as an enterprise search engine at the company. However, the search engine lacks support for authentication, something that has to be further investigated and implemented before the system can be put into production. 

  HÄR KAN DU HÄMTA UPPSATSEN I FULLTEXT. (följ länken till nästa sida)