How to deploy LLMs Part 2: Public or Private

AI Model

In Part 1 of our series, "How to Deploy LLMs," we discussed the risks associated with different deployment options. It is important to consider these risks carefully, as they can significantly impact the overall effectiveness and reliability of your deployment. Delving deeper into the pros and cons of each deployment architecture, we can examine how these options stack up in terms of performance, cost, and robustness. A comprehensive analysis will provide you with a clearer understanding of the most suitable deployment strategy to meet your needs, ensuring an informed decision that aligns with your specific objectives and operational requirements. 

 

  • External Third-Party LLM API (e.g., OpenAI) 
    • Pro – Potentially lower cost: While it is impossible to make a general assessment as to the relative cost-savings of using a third-party LLM API, there are several driving factors and of which, compute utilization is one. External APIs, such as OpenAI’s GPT-4, may be more expensive on a per-token basis than open-source models, however, the latter would also incur infrastructure costs associated with hosting the open-source model, which could apply whether in use or not. Dedicated deployments, therefore, may often be more expensive to operate than the third-party API option. An open-source model shared by clients would be less expensive than the dedicated option but would depend on the number of clients sharing the service provider’s hosted LLM.  
    • Pro – Potentially higher performance: The performance of any end-user product depends on more than just the LLM used. The LLM is only one component in a larger architecture and so there could be various considerations and limiting factors. It is worthwhile but certainly not conclusive to consider benchmarks such as open-source leaderboards and studies. While different models can perform differently relative to each other on various tasks (e.g., summarization, or question and answering), in general, GPT-4 and likely its future iterations will outperform open-source models; however, often, a combination of prompting and tuning and other architectural design supporting a smaller model will outpace performance of a larger model deployed in a less sophisticated architecture.  To evaluate performance, it would be wiser to benchmark the solutions in which your technology provider is deploying or integrating these LLMs from end-to-end. Considering performance benchmarks for the LLM alone can be misleading. 
    • ConsLimited tuning (customization): This point follows from the previous one regarding performance. Tuning (and fine-tuning), which involves modifying a portion of the original model to perform better at a specific task, is much more limited with external models like OpenAI GPT. 
    • Con – Potential data retention and use policies: As mentioned in the beginning of this article, external LLM API providers may have policies in place regarding data retention and use, described in their services agreement with your solution provider. Be certain to inquire about this early on. 
    • Con – No control over versioning (less robust): Solution providers relying on a third-party LLM API service are at the mercy of the provider regarding any changes or updates to the underlying model. Typically, third-party LLM providers are continuously updating and improving these models. This sounds like a good thing but may lead to sudden and unexpected changes in performance on any downstream task that has leveraged the outputs of a prior version of the model. These changes may not be noticeable or particularly critical to most but in the legal services industry where consistency and repose is critical to defensibility, for example, in document review, this could and often does result in a change to a document classification or designation – which could throw the defensibility and quality of a prior production into serious question.  
  • Custom LLM provided by solution provider 
    • Pro – Full control over data retention: The solution provider is in full control over the model and exercises complete flexibility to define the data retention and use policies – giving clients the ability to set the standard or at least negotiate the same with their service provider. 
    • Pro – Full control over versioning (more robust): As with data retention and use, a solution provider hosting their own LLM could implement version and update controls for clients, allowing them full control over the versions leveraged on their various matters and in a way that maintains the client’s posture with respect to the work they have performed with model outputs. For document review, a service provider leveraging these controls could provide assurances that document classifications or scores will remain consistent and unchanged unless the client takes some deliberate action themselves. 
    • Pro – Higher turning (customization) flexibility: Service providers hosting their own LLM will also have full control over the LLM parameters, which gives those providers and their clients tremendous optionality to tune and customize the LLM to their specific tasks or data – so long as the service provider has the technical savvy to make that capability available. 
    • ConsPotentially Higher cost: For many reasons, models deployed by solution providers are likely to be based on open-source foundational models, such as Llama-3. These models are cheaper to run on a per-token basis than an OpenAI GPT-4, however, as mentioned above, total cost of ownership can often be significantly higher when factoring in infrastructure costs associated with hosting the dedicated environment and GPU resources, regardless of the utilization. 

Both external third-party LLM APIs and custom LLMs offered by solution providers come with their unique sets of benefits and challenges. External APIs like OpenAI provide lower upfront costs and potentially higher performance, but they have limitations in customization, data policies, and version control. Custom LLMs offer more flexibility and control but tend to be more expensive to operate.  

Understanding these trade-offs is key to choosing the deployment strategy that best meets your specific needs and goals. 


 

Related posts