The Center for Open Science (COS; http://cos.io) is a non-profit technology organization dedicated to improving the alignment between scientific values and scientific practices. Open source software developed by COS – such as the Open Science Framework (OSF; http://osf.io ) – is free to all scientists worldwide with the goal of supporting new discoveries that can change the world.
Improve Search Performance
- Gain 5x faster query response time
- Handle searches across 215,000 documents
- Boost relevance of search results
Increase developer productivity
- Easily scale to meet any demand
- Add new search features in minutes
- Add search to prototypes in under one hour
Currently 6,000 scientists utilize OSF, which is part network of research materials, part version control system, and part collaboration software. It is a Web service that integrates with the scientist’s daily workflow, helps document and archive materials and data, facilitates data sharing, and enables transparency. With the OSF, users can create and manage scientific research projects, collaborate with other researchers, and make project data publicly accessible. In just over one year of operation, users have downloaded 215,000 available documents.
"Solr was used in production as the search engine for OSF before implementing Elasticsearch," recalled Fabian von Feilitzsch, a summer intern (and soon to be full-time developer) at COS. "We are rapidly prototyping many new features and products. But Solr added a lot of complexity and overhead that prevented us from integrating search into our prototypes. It wasn’t worth the time."
COS replaced Solr with Elasticsearch as the primary search engine for all internal and external content on OSF. All registered users and components of projects are indexed by Elasticsearch and searchable through the search bar on osf.io, and through OSF's API.
"We have an increasing amount of content in the system," said von Feilitzsch. "In addition, scientists are specialists that know exactly what type of content they want. Elasticsearch helps them find content on OSF much faster." von Feilitzsch points out that OSF searches in Solr took about 250 milliseconds, while Elasticsearch takes only 50 milliseconds – a 5x improvement in query response time.
"We tested Elasticsearch with 50,000 to 100,000 documents and did not see any slow down of performance," he added. "The key factor for us is speed of delivery to end-users. Elasticsearch delivers this value to us."Additionally, von Feilitzsch noted that easy scalability is a major advantage of Elasticsearch. COS can quickly spin up several Elasticsearch nodes to deliver infinite horizontal scalability.
OSF is a highly modular system, COS develops all tools and features with a plug-and-play design. As OSF evolves, the COS development team adapts by continuously adding and removing components, which requires easy interaction with the search engine. Elasticsearch meets this need for flexibility.
With Elasticsearch, it is very easy to integrate search into any prototype. We recently prototyped a new service and integrated Elasticsearch in less than one hour. It is absurdly easy. Because search is so much simpler with Elasticsearch, it is easier to add additional features to the search engine and to keep up-to-date with changes happening on the backend of OSF. This was much more difficult with Solr, when we had to manually define everything that we were changing.
"Elasticsearch is a means of delivering on the promise of providing better, more professional tools to scientists," said Andrew Sallans, Partnerships, Collaborations, and Funding at COS. "We are building this environment where people can store, manage, and share all of this content, but in order for them to want to use OSF, it must be a desirable environment to work in. Elasticsearch helps us improve the user experience for both the scientists producing the research and the users consuming the data."
By easily adding more features, COS not only gains development productivity but also enhances the user experience. Improving the search query speed is also an important augmentation of the user experience. In addition, Elasticsearch capabilities such as filtering and boosting increase the relevance of search results, consequently improving the user experience.
"We could have stayed with Solr, but it made our lives harder, and the user experience was not as good," von Feilitzsch concluded. "Elasticsearch raised the bar on both sides, bringing value to us, our users, and of course to our mission as a result."