Knowledge Discovery Engine: A Full-Stack Scientific Search System
, Senior Manager, Data Science and Machine Learning, Unilever
The availability of open-source natural language processing (NLP) tools, as well as widespread access to GPT-3.5/4, has made it possible for organizations to build customized search engines and knowledge processing pipelines that suit their goals. We'll detail the architecture of such a system built by Unilever to process and collate large amounts of scientific documents relevant to the consumer goods industry. Open-source tools were used to build a semantic search engine and a knowledge graph that supports concept navigation and scientific discovery. Large language models are used to guide and augment user queries, and to improve open-source named entity recognition and relation extraction tools.