How to deploy LLMs Part 1: Where do your models live?

Written by Laer AI | 6/17/24 5:55 PM

In the early days of first-generation AI models – such as logistic regression or support vector machines – the industry did not frequently encounter the question “Where do my models live?”

The assumption had always been that the models live where the data lives and so, if you had an on-prem Relativity Assisted Review license, for example, the models lived “locally” in your data center or in your own cloud instance.

Things are different now with Large Language Models: clients face a new kind of technology with tremendous capabilities but a new set of challenges. This blog post is about the questions you should ask your provider and, more importantly, understanding what their answers mean for you in terms of the quality and performance you will get from their solution, but most importantly, the security and privacy implications.

Why ask now?

“Where does your model live?” has become a suddenly prevalent question because these models, specifically Large Language Models, have outgrown the computational footprint available to most Fortune 100 companies, let alone solution providers themselves. As a result, very few companies have the resources to build and train these large language models – companies such as OpenAI GPT, Anthropic Claude, and Google Gemini – have taken on those massive investments, and made their models available via APIs which has in the past couple of years, spawned thousands of service providers who package up that API service as part of their own solution – in legal services, we often see those iterations in document review, document summarization, and contract review.

The wide availability of third-party services brings significant advantages, but it also raises many new questions that we never previously encountered:

What information am I sharing with third-party LLMs?
Is my information stored and used to improve their models?
Do I have control over those models and if not, who does?

The list of questions only gets longer as many of the open-source LLMs of all shapes and sizes become available, which, with the right infrastructure, make it possible to build deployments that do not rely on third-party providers at all. In which case, it becomes important to understand what you would be gaining and sacrificing in avoiding solutions that rely on third-party service providers like OpenAI.

How are LLMs typically deployed?

There are various deployment options for an LLM service, some of which we may be familiar with and others which have evolved out of necessity.

In a classical first-gen AI application, such as contract review or document review, which were primarily classification tasks, the AI models were small enough to live together with your data, whether that would be in your own datacenter or in a managed cloud provider, separate from other clients:

As the model sizes grew exponentially to what we call Large Language Models today, there are now effectively two ways where these models can live relative to your data:

A shared third-party model hosted by a company such as Open AI, where the Large Language Model is external to both the solution provider and your environments:

A Large Language Model hosted and deployed directly by your solution provider, in their own cloud environment:

(Service providers incur significant compute and infrastructure costs in hosting an LLM and are therefore likely to deploy the LLM as a service shared by their clients – not too dissimilar to how a public LLM service like OpenAI gets shared among all its users globally, but more on that later.)

Given this primer on the variety of deployment architectures that exist today, it is natural to ask the following questions:

Is there any risk in a solution provider relying on an external API service like OpenAI?

As with everything, the most general answer is “it depends” – consider three factors:

The extent that the third-party, like OpenAI, retains the data sent to its API.
The nature and quantity of the data sent.
Terms of Service of the third-party on what they can do with your data.

Considering each in turn:

The extent that the third-party (e.g., OpenAI), retains the data sent to its API.

With respect to the first point, service providers will have different policies on how long they would retain the data sent to their API, and those policies are likely to flow through from the terms of service with their third-party LLM API provider. For example, Microsoft has a default 30-day retention policy for all data sent to its GPT API, however, solution providers may negotiate an exemption to that policy, and many have obtained a zero-day retention policy. We recommend that you directly inquire with your solution provider on the retention policy they have in place with the third-party API service provider. It is important to make the distinction here between retention and use of your data: retention does not imply that the third-party can use your data to train its models (or otherwise) – under most service agreements (a question we will come back to below), your data cannot be used by the third party.

The nature and quantity of the data sent.

The nature and quantity of the data is entirely application specific. For example, a solution that only uses an external API on public data, such as public court filings, would pose a different risk profile than a solution that sends client’s discovery data to the API. Drilling down further, within discovery applications, different solution providers may send drastically different volumes of sensitive data to the API. For example, solutions that perform targeted question answering over the data, would typically send at most a dozen snippets of documents (out of millions) retrieved through a search engine to the external API, while solutions that rely on the third-party API to perform document classification would send millions of documents.

Terms of Service of the third-party on what they can do with your data.

Finally, it’s important to understand that as the number of third-party API providers for LLMs grow, not all of them are created equal when it comes to what they can do with the data that’s sent to them. As an example, GPT-4 API provided directly by OpenAI can explicitly use your data for improving their models, while the same GPT-4 model provided through Microsoft Azure, cannot. It is important to ask your solution provider about the terms of service of their external API service and what specifically they can and cannot do with your data.

What about solution providers that deploy their own LLM models? Are there risks there?

When the solution provider deploys an LLM (typically based on one of the open-source models available, such as Llama-3), they have direct control on data retention and use. As mentioned earlier, an LLM deployed by a solution provider often requires heavy computational resources – at a significant financial expenditure – and therefore, very likely, a shared architecture:

A “Dedicated LLM Deployment” requires a solution provider to provision compute (CPU/GPU/memory) resources to only one client. This would imply renting sufficient GPU resources from a cloud provider, and thus result in higher operating costs. The advantages of that approach would be the flexibility of the solution provider to customize the LLM to each client by using their data to improve the model, without mixing it with data from other clients.

The shared LLM model, however, would raise similar issues as in the third party LLM service provider instance outlined earlier. While it may seem concerning from the first glance, as long as the data from the different clients is not used to update the underlying LLM, the architecture would be very similar to that of a third party API service like OpenAI, and (at least theoretically) have the same risk profile.

View full post