Choosing an LLM: The 2024 getting started guide to open-source LLMs

139686_-_Elastic_-_Headers_-_V1_1.jpg

It would be an absolute understatement to say that AI took off in 2023. Thousands of new AI tools were launched, AI features were added to existing apps, and Hollywood screeched to a halt with concerns over the tech. There’s even an AI tool that evaluates how well you sing like Freddie Mercury, because of course there is!

But behind every AI tool or feature, there’s a large language model (LLM) doing all the heavy lifting, many of which are open-source. An LLM is a deep learning algorithm capable of consuming huge amounts of data to understand and generate language. They’re built on a neural network architecture, which allows them to be trained to perform a variety of natural language processing (NLP) tasks such as content generation, translation, categorization, and many other use cases. This, combined with the availability of open-source LLMs, makes it much easier to automate key business tasks — such as developing customer support chatbots, detecting fraud, or aiding R&D like vaccine development — as well as various other use cases across multiple industries. LLMs can also play a crucial role in improving cloud security, search, and observability by expanding how we process and analyze data.

As with any new technology, the use of LLMs also comes with challenges that need to be considered and addressed. The quality of the output depends entirely on the quality of the data it’s been given. Many LLMs are trained on large public repositories of data and have a tendency to "hallucinate" or give inaccurate responses when they haven't been trained on domain-specific data. There are also privacy and copyright concerns around the collection, storage, and retention of personal information and user-generated content.

Check out our page on What is a large language model? to learn more about LLMs.

What is an open-source LLM?

An open-source LLM is an LLM that’s available for free and can be modified and customized by anyone.

With an open-source LLM, any person or business can use it for their means without having to pay licensing fees. This includes deploying the LLM to their own infrastructure and fine-tuning it to fit their own needs.

This is the opposite of a closed-source LLM, which is a proprietary model owned by a single person or organization that’s unavailable to the public. The most famous example of this is OpenAI’s GPT series of models.

What are the best LLM use cases?

There are endless potential use cases for LLMs, but here are a few key capabilities to showcase the variety of what they can do:

  • Sentiment analysis: LLMs can be used to identify and classify subjective opinions collected from feedback, social media, etc.

  • Content creation: Several LLMs can generate contextually relevant content like articles, marketing copy, and product descriptions.

  • Chatbot: You can fine-tune LLMs to use as chatbot help or engage with your customers.

  • Translations: Using multilingual text data, LLMs can be used to translate human languages to aid communication.

  • Research: LLMs can make light work of research, being able to consume and process huge amounts of data and return the most relevant information.

1. GPT-NeoX-20B

Developed by EleutherAI, GPT-NeoX-20B is an autoregressive language model designed to architecturally resemble GPT-3. It’s been trained using the GPT-NeoX library with data from The Pile, an 800GB open-source data set hosted by The Eye.

GPT-NeoX-20B was primarily developed for research purposes and has 20 billion parameters you can use and customize.

Who is it for?
GPT-NeoX-20B is ideal for medium/large businesses that need advanced content generation, such as marketing agencies and media companies. These companies will need to have both skilled personnel and the computational power required to run a larger LLM.

Who is it not for?
This LLM isn’t suitable for small businesses or individuals without the financial and technical resources to manage the computational requirements. 

Complexity of use
As it’s not intended for deployment as-is, you will need the technical expertise to both deploy and fine-tune GPT-NeoX-20B for your specific tasks and needs.

2. GPT-J-6b

Also developed by EleutherAI, GPT-J-6b is a generative pre-trained transformer model designed to produce human-like text from a prompt. It’s built using the GPT-J model and has 6 billion trainable parameters (hence the name).

It was trained on an English-language-only data set, which makes it unsuitable for translations or generating text in non-English languages.

Who is it for?
With its ease of use and relatively small size, GPT-J-6b is a good fit for startups and medium-sized businesses looking for a balance between performance and resource consumption.

Who is it not for?
This LLM may not be the best choice for enterprises requiring more advanced model performance and customization. It’s also not a good fit for companies that need multi-language support.

Complexity of use
GPT-J-6b is a moderately user-friendly LLM that benefits from having a supportive community, making it accessible for businesses with middling technical know-how.

3. Llama 2

Meta’s answer to Google and OpenAI’s popular LLMs, Llama 2, is trained on publicly available online data sources and is designed to create AI-driven experiences. It can be fine-tuned for specific tasks and is completely free for research and commercial use.

Building on Meta’s work on LLaMA, Llama 2 comes in three model sizes — 7 billion, 13 billion, and 70 billion parameters — making it a dynamic and scalable option.

Who is it for?
Because of the model size options, Llama 2 is a great option for researchers and educational developers who want to leverage extensive language models. It can even run on consumer-grade computers, making it a good option for hobbyists.

Who is it not for?
Llama 2 isn’t a good fit for higher-risk or more niche applications as it’s not intended for highly specialized tasks, and there are some concerns about the reliability of its output.

Complexity of use
It’s a relatively easy-to-use LLM with a focus on educational applications, but it will likely require customization for optimal results.

4. BLOOM

BLOOM is a decoder-only transformer language model that boasts a massive 176 billion parameters. It’s designed to generate text from a prompt and can be fine-tuned to carry out specific tasks such as text generation, summarization, embeddings, classification, and semantic search.

It was trained on a data set comprising hundreds of sources in 46 different languages, which also makes it a great option for language translation and multilingual output.

Who is it for?
BLOOM is great for larger businesses that target a global audience who require multilingual support. Due to the model’s size, businesses will also need to have ample available resources to run it.

Who is it not for?
Companies that operate solely in English-speaking markets may find its multilingual capabilities superfluous, especially with the considerable resources needed to customize and train such a large model.

Complexity of use
With the need for understanding language nuances and deployment in different linguistic contexts, BLOOM has a moderate to high complexity.

5. Falcon

Falcon is an LLM that looked at BLOOM and said “Pfft, only 176 billion parameters?”

Okay, they didn’t actually say that, but their open-source language model does come in three impressive sizes — 7 billion, 40 billion, and 180 billion.

Licensed under the Apache Licence 2.0, Falcon is an autoregressive LLM designed to generate text from a prompt and is based on its high-quality RefinedWeb data set.

Who is it for?
Because of its excellent performance and scalability, Falcon is ideal for larger companies that are interested in multilingual solutions like website and marketing creation, investment analysis, and cybersecurity.

Who is it not for?
Although there is the 7 billion option, this still isn’t the best fit for businesses looking for a simple plug-and-play solution for content generation. The cost of customizing and training the model would still be too high for these types of tasks.

Complexity of use
Despite the huge size of the biggest model, Falcon is relatively easy to use compared to some other LLMs. But you still need to know the nuances of your specific tasks to get the best out of them.

6. CodeGen

This LLM from Salesforce is different from any other in this list because instead of outputting text answers or content, it outputs computer code. CodeGen is short for “code generation,” and that’s exactly what it does. It’s been trained to output code based on either existing code or natural language prompts.

Available in sizes of 7 billion, 13 billion, and 34 billion parameters, CodeGen was created to create a streamlined approach to software development.

Who is it for?
CodeGen is for tech companies and software development teams looking to automate coding tasks and improve developer productivity.

Who is it not for?
If your company doesn’t write or work with computer code, this LLM isn’t for you!

Complexity of use
CodeGen can be complex to integrate into existing development workflows, and it requires a solid background in software engineering.

7. BERT

One of the first modern LLMs, BERT is an encoder-only transformer architecture created by Google back in 2018. It’s designed to understand, generate, and manipulate human language.

BERT has been used by Google itself to improve query understanding in its search, and it has also been effective in other tasks like text generation, question answering, and sentiment analysis.

Who is it for?
Considering it’s a key part of Google’s own search, BERT is the best option for SEO specialists and content creators who want to optimize sites and content for search engines and improve content relevance.

Who is it not for?
Outside of SEO, BERT probably won’t be the best option in many situations because of its age, which makes it redundant compared to the bigger and newer alternatives.

Complexity of use
BERT is fairly straightforward for those familiar with SEO and content optimization, but it may require fine-tuning to keep up with changes in Google’s more recent SEO recommendations.

8. T5

The T5 (short for the catchy Text-to-Text Transfer Transformer) is a transformer-based architecture that uses a text-to-text approach. It converts NLP problems into a format where the input and output are always text strings, which allows T5 to be utilized in a variety of tasks like translation, question answering, and classification. It’s available in five different sizes that range from 60 million parameters up to 11 billion.

Who is it for?
T5 is great for companies that require a versatile tool for a variety of text-to-text processing tasks, such as summarization, translation, and classification.

Who is it not for?
Despite T5’s relative flexibility, it’s unsuitable for tasks that require any sort of non-text output. 

Complexity of use
T5 is generally considered easy to use compared to other LLMs, with a range of pre-trained models available. But it may still require some expertise to adapt to more niche or specific tasks.

Disclaimer: All parameters and model sizes are correct at the time of publication but may have changed since.

Choosing the right LLM for your business

There are several key criteria you need to consider as you decide which open-source LLM to use:

  • Cost: As these LLMs are open-source, you don’t need to pay for the models themselves. But you do need to think about the cost of hosting, training, resources, etc. The bigger and more complex an LLM, the more it’ll likely cost you. This is because a bigger LLM will require more data storage costs, processing power, a bigger infrastructure, and maintenance costs.

  • Accuracy: Evaluating the accuracy of your options is essential. You need to compare how accurately different LLMs can carry out the types of tasks you need. For example, some models will be domain-specific, and some can be improved with fine-tuning or retrieval augmented generation (RAG).

  • Performance: The performance of an LLM is measured with things like language fluency, coherence, and context comprehension. The better the LLM is at these things, the better it will perform. This will improve the user experience and task effectiveness and give you a competitive advantage. 

  • Data security: The security of your data is another key consideration. It is especially important if you’re handling sensitive or PII data. This is another area where a RAG could be useful, as you can control access to data using document-level security and restrict security permissions to particular data.

  • Task-specific vs. general-purpose: Consider whether you need an LLM that solves more specific use cases or one that covers a broader spectrum of tasks. Because some models are domain-specific, you need to be careful to either select one within your domain or find one with a wider-reaching scope. 

  • Quality of training data: If the quality of the data isn’t good, the results won’t be either. Assess the data each LLM uses and pick one you have confidence in. RAG will also help you with this, as you can use custom data, which can be prepared and fine-tuned to directly improve the quality of the output.

  • Skillset: Another big factor to consider is the existing skillset you have within your project team. Experience in things like data science, MLOps, and NLP is a must. The more complex the LLM, the deeper the skillset your team will need to have. If you’re more limited on this front, it’s worth focusing on the simpler LLMs, or even looking to bring in more expertise.

Using these criteria, you should be able to decide which of the LLMs we’ve covered is the best fit for your unique circumstances.

The best approach is to take your time, look at the options listed, and evaluate them based on how they can best help you solve your problems. All of these open-source LLMs are hugely powerful and can be transformative if utilized effectively.

The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.

In this blog post, we may have used or referred to third party generative AI tools, which are owned and operated by their respective owners. Elastic does not have any control over the third party tools and we have no responsibility or liability for their content, operation or use, nor for any loss or damage that may arise from your use of such tools. Please exercise caution when using AI tools with personal, sensitive or confidential information. Any data you submit may be used for AI training or other purposes. There is no guarantee that information you provide will be kept secure or confidential. You should familiarize yourself with the privacy practices and terms of use of any generative AI tools prior to use. 

Elastic, Elasticsearch, ESRE, Elasticsearch Relevance Engine and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.