Improving Scientific Research Worldwide
The Center for Open Science (COS; http://cos.io) is a non-profit technology organization dedicated to improving the alignment between scientific values and scientific practices. Open source software developed by COS – such as the Open Science Framework (OSF; http://osf.io ) – is free to all scientists worldwide with the goal of supporting new discoveries that can change the world.
Currently 6,000 scientists utilize OSF, which is part network of research materials, part version control system, and part collaboration software. It is a Web service that integrates with the scientist’s daily workflow, helps document and archive materials and data, facilitates data sharing, and enables transparency. With the OSF, users can create and manage scientific research projects, collaborate with other researchers, and make project data publicly accessible. In just over one year of operation, users have downloaded 215,000 available documents.
"Solr was used in production as the search engine for OSF before implementing Elasticsearch," recalled Fabian von Feilitzsch, a summer intern (and soon to be full-time developer) at COS. "We are rapidly prototyping many new features and products. But Solr added a lot of complexity and overhead that prevented us from integrating search into our prototypes. It wasn’t worth the time."
Delivering 5x Search Performance
COS replaced Solr with Elasticsearch as the primary search engine for all internal and external content on OSF. All registered users and components of projects are indexed by Elasticsearch and searchable through the search bar on osf.io, and through OSF's API.
"We have an increasing amount of content in the system," said von Feilitzsch. "In addition, scientists are specialists that know exactly what type of content they want. Elasticsearch helps them find content on OSF much faster." von Feilitzsch points out that OSF searches in Solr took about 250 milliseconds, while Elasticsearch takes only 50 milliseconds – a 5x improvement in query response time.
"We tested Elasticsearch with 50,000 to 100,000 documents and did not see any slow down of performance," he added. "The key factor for us is speed of delivery to end-users. Elasticsearch delivers this value to us."Additionally, von Feilitzsch noted that easy scalability is a major advantage of Elasticsearch. COS can quickly spin up several Elasticsearch nodes to deliver infinite horizontal scalability.
Enabling Plug-and-Play Flexibility
OSF is a highly modular system, COS develops all tools and features with a plug-and-play design. As OSF evolves, the COS development team adapts by continuously adding and removing components, which requires easy interaction with the search engine. Elasticsearch meets this need for flexibility.
"Elasticsearch allows us to focus on what the data should look like, and not worry about whether the data is compatible with the search engine," said von Feilitzsch. "Elasticsearch removes any worries about the search engine as a consideration. There is very little configuration needed. It just works."
One example of a simple feature COS easily added to Elasticsearch was a quick filter on document types. It was a small addition to a single Elasticsearch query. Now, OSF can filter by any document type in 50 milliseconds.
"It took us only 20 minutes to add the functionality in Elasticsearch," said von Feilitzsch. "This new feature adds more power to user searches – power we just couldn’t offer before on OSF."
Enhancing the User Experience
"Elasticsearch is a means of delivering on the promise of providing better, more professional tools to scientists," said Andrew Sallans, Partnerships, Collaborations, and Funding at COS. "We are building this environment where people can store, manage, and share all of this content, but in order for them to want to use OSF, it must be a desirable environment to work in. Elasticsearch helps us improve the user experience for both the scientists producing the research and the users consuming the data."
By easily adding more features, COS not only gains development productivity but also enhances the user experience. Improving the search query speed is also an important augmentation of the user experience. In addition, Elasticsearch capabilities such as filtering and boosting increase the relevance of search results, consequently improving the user experience.
"We could have stayed with Solr, but it made our lives harder, and the user experience was not as good," von Feilitzsch concluded. "Elasticsearch raised the bar on both sides, bringing value to us, our users, and of course to our mission as a result."