From off-the-shelf to tailor-made: deploying LLMs

Generative AI has caught the imagination, with accessible large language models (LLM) some of its most visible applications. However, businesses must not assume they can deploy a generic LLM and reap instant rewards. Training, data privacy and the need for evaluation are just some of the topics enterprises need to consider.

Artificial intelligence has existed for many years, but the recent explosion in interest can be traced back to the release of several free-to-access LLMs. Specifically ChatGPT, followed closely by Bard and Co-Pilot.

Yet businesses should be careful not to mistake prominence for solution. Many enterprises believe one LLM will solve their problems and provide a route to AI-optimized growth.

And while LLMs can be powerful tools that transform operations, they are by their very nature generic. LLMs are not off-the-shelf products but require significant time to shape specific business needs.

The importance of use cases

What does that mean for businesses looking to deploy them? Before looking at the best way to turn an LLM from off-the-shelf into made-to-measure, it is important to consider why enterprises use it.

These models have many potential deployments, from summarizing documents and providing rough drafts of content to generating images, analyzing contracts and developing code.

But before they begin, businesses need to be specific about their use cases. As we discuss in this new article, any AI investment begins with identifying the right use case, and LLMs are no different.

Do you prompt, or do you fine-tune?

Defining the use case shapes how the LLM will be tailored. Enterprises then have a choice: fine-tune the model or use prompt engineering.

Each has its pros and cons.

Fine-tuning involves taking a standard LLM and training it with your data. This is particularly useful when you have hyper-specific use cases, as the result will be relevant to your exact needs. It also only needs to be completed once. However, it does mean you are sharing private or potentially sensitive data with the AI.

Prompt engineering involves directing the model with the appropriate context and background information it needs.

Within prompt engineering, you can implement Retrieval Augmented Generation (RAG). Here, you can take several approaches to prompting the model. At a very basic level, you provide everything the LLM needs; alternatively, you deploy a more advanced method whereby you split up your prompt and use the model to focus on specific sections.

Exposing your data to the LLM is less likely, but it is an ongoing process where each response needs to be reviewed and re-prompted. This means you will need subject matter experts to work on the prompts and check outputs.

The approach you choose will depend on several factors, including your use case, the sensitivity of the data needed to tailor the LLM (if fine-tuning), and whether you have the resource internal to create prompts (for RAG).

Other LLM considerations

However, shaping your LLM to your requirements is just the first step in successfully implementing it into your operations.

There are several other areas you must consider when planning your deployment:

1. Combatting hallucinations

What is the primary function of LLMs? The clue is in the name given to this part of AI: generative. LLMs are designed to generate outputs. It will learn from data, whether that’s the initial information it was taught as it was developed or the more specific inputs you deploy when tailoring it to your needs, but above all else, LLMs have to create. That means it will still give you something even if it doesn’t know the answer. This is what’s known as hallucination, and businesses that do not address the issue will have a serious problem.

First, you must be aware that no matter how well you train your model, hallucinations are a possibility. To combat this, deploy guardrails with a clear framework that embeds a process of reviewing outputs for quality and accuracy. This is necessary even with fine-tuning; ultimately, LLMs are built with millions of different parameters, so there is a strong likelihood that it will face a scenario that hasn’t been accounted for in your tailoring.

It is good practice to have this evaluation framework in place even at the proof-of-concept (PoC) stage. That way, checking how faithful answers are becomes another part of the overall process, with any learnings built into subsequent iterations.

2. What are the data privacy implications?

You have to remember that LLMs are third parties. As such, any data you give it, whether through fine-tuning or as part of RAG background information, is going to a third party. Therefore, you must review whether that exposes you to any regulations and the compliance implications. Depending on the use case, you may wish to anonymize data. For instance, when working on a contract, you could strip out sensitive information, run the LLM, and then replace the sensitive information once the work is complete. However, the data will be necessarily less precise, limiting the efficacy of the outputs.

Again, a clear framework can help balance quality output and compliance. This might involve having a governance team review information before it is given to the LLM.

3. Are your LLMs sustainable?

LLMs are sophisticated feats of software engineering, and they’re only getting more complex. New iterations offer increased prompt sizes and answers. This is great for improving the potential quality of outputs but also involves a huge amount of compute power. That means greater energy consumption levels, which could hit your sustainability efforts.

One way of combatting this is to interrogate the implications of prompts. After a certain point of RAG, how detailed do your prompts need to be, and will they generate large answers with limited value? Another consideration is how repetitive your requests are. It may well be worth creating a library of vetted and audited prompts and responses, updated as required, rather than having multiple people asking only slight variations of the same question.

Training LLMs for your needs

LLMs have immense potential, driving untold benefits for organizations. But they should not be seen as a silver bullet, a one-time deployment that can be left to run by itself. These generic models can become relevant to your business cases only through significant training, fine-tuning, or RAG. It’s quite simple: if you want to reap the rewards of integrating LLMs into your operations, you must invest time and resources into tailoring it for your needs.

LLMs may be one of the more visible aspects of the AI revolution, but are you aware of what it means for security, data or your network? Our new eBook dives into each area, helping you understand what investing in AI means for your organization.

Yasser Marey
Yasser Marey

Yasser Marey is a veteran software engineer with a master’s degree in computer science, specializing in Machine Learning. He has extensive experience in software systems envisioning and development. Yasser's journey in the field of AI began in 1996 when he crafted his first Neural Network. Yasser's latest achievement is the successful launch of Labeeb, a generative AI system, in Orange Business, which is currently utilized by over 500 users across the company.