a group of four windows with two open and two closed shutters
Feature

A Look at the Large Language Model Landscape

6 minute read
David Barry avatar
By
SAVED
Enterprises have dozens of LLMs to chose from, whether proprietary or open source. Here are eight worth considering.

When Microsoft invested billions in OpenAI in January 2023, it set off a frenzy of investment in the space that shows no signs of slowing down. The rush to capitalize on the interest in large language models has resulted in a flood of LLMs entering the market which roughly fall into two categories: 

  1. Proprietary models: On offer from companies like OpenAI, Microsoft or AWS, these models are generally offered as-a-service. Here, vendors provide a public API to a model like OpenAI’s GPT-4, or one of the LLMs hosted on Microsoft Azure. These are relatively easy to deploy and enable organizations access to some of the bigger models on the market to apply to a range of use-cases.  
  2. Open source models: The wide range of open source models available come with a number of advantages, which means organizations should find it relatively easy to identify a model for their needs. Working with these models is one way companies can ensure their sensitive data doesn't leave the company to train another model.  

There are drawbacks and advantages to both, so enterprises will need to identify their exact needs and goals for the LLM as well as know which model and architecture fits their enterprise. 

There's some debate over which model works better in the enterprise setting given the need to protect intellectual property and personally identifiable information. Open source allows an enterprise to customize and fine-tune the LLM on their own data. This approach also provides more transparency, as the organization would have access to the source code in use, which isn't the case with proprietary LLMs which operate in a black box. Both of these factors tip the scales towards the open source model for privacy needs. 

However, some argue that should any regulations be introduced to the field, it would make it harder for the open source models to comply. OpenAI, for one, is also wooing the enterprise crowd with its Enterprise Edition, launched in late August, which seeks to answer many of the privacy concerns large businesses have. The field is clearly changing quickly. 

One thing to note is that while enterprises may save money by not paying the licenses for proprietary models, the cost can quickly add up for open source models based on where they are hosted. 

So let's take a look at seven of the better-known models enterprises can currently choose from. They represent both open source and proprietary models and are presented in alphabetical order.  

1. Anthropic's Claude 2 — Proprietary

San Francisco-based Anthropic first released Claude 2 in July 2023 as the successor to the original Claude 1.3 chatbot. Claude 2 uses a conversational model trained on Anthropic’s dataset, allowing it to engage in open-ended conversations.

Anthropic claims that it has a larger context window compared to ChatGPT, which in practical terms means users can input longer and more comprehensive inputs into the system, which in turn enables more contextually relevant responses to questions. 

It also says that Claude 2 does not generate responses statistically. Instead, it uses self-supervised learning to develop a contextual understanding of the conversation. Claude 2 can generate any type of written text, summarize existing texts, help you with research and summarize hundreds of pages of documents in minutes.

2. Databricks' Dolly — Open Source

At the beginning of April, San Francisco-based Databricks released Dolly, a large language model (LLM) which, the company said, it had trained for less than $30 to offer ChatGPT-like human interactivity, notably the ability to follow instructions.

Two weeks later, it released Dolly 2.0, the first open source, instruction-following LLM, which was fine-tuned on a human-generated instruction dataset and licensed for research and commercial use.

Dolly 2.0 is a 12B parameter language model based on the EleutherAI's Pythia model family. EleutherAI’s Pythia is a suite for analyzing LLMs across training and scaling. Dolly 2.0 has been fine-tuned on human generated instruction from a new dataset, crowdsourced among Databricks employees.

Dolly 2.0 can be used for a variety of tasks, including summarization, content generation, question answering and automation.

3. Google’s Bard — Proprietary

Mountain View, Calif.-based Google's conversational AI chat service Bard was designed as an alternative to GPT, the notable difference being that it takes its data from the web. 

Upon its release in February, CEO Sundar Pichai explained in a statement that it would be powered by the company's language model for dialogue applications, or LaMDA for short. 

The initial version of Bard was a lightweight version of LaMDA, which required significantly less computing power. This allowed them to scale to more users, allowing for more feedback and more training data in turn. 

However, the LaMDA-powered Bard performed poorly compared to GPT-3.5. So Google switched to the more advanced PaLM 2 for all its AI products, including Bard.

Google introduced PaLM 2 at Google I/O in May 2023, as the next iteration of PaLM (which stands for pathways language model), first launched in April 2022. 

PaLM's differentiator is that it was trained on collections of scientific papers and web pages that contain mathematical content, making it potent in logical reasoning and maths. It currently understands over 100 languages and comes pre-trained on a number of coding languages, including Python and JavaScript. 

Bard is good for interpreting prompts and generating accurate answers. Its conversational capabilities make it a useful tool for responding to a wide range of queries, from general information to more specific requests.

Related Article: How Smaller Digital Workplace Vendors Are Integrating Generative AI

4. TII's Falcon — Open Source

Falcon is an open-source LLM model built by the Technology Innovation Institute (TII) in the United Arab Emirates. It offers a wide-range of pre-trained models. Organizations have access to sentiment analysis, entity recognition as well as well as tools for fine-tuning these models on custom datasets.  

Learning Opportunities

Use cases include NLP for chatbots, social media monitoring as well as the ability to extract and understand information taken from unstructured text. TII is also encouraging developers and researchers to submit ideas for new workplace scenarios.

Falcon 180B, which was released this month, has 180 billion parameters and builds on previous versions. To give some kind of idea of how big this model is, its parameters are two and a half times larger than Meta's LLaMA 2 model with 70 billion parameters trained on two trillion tokens.   

5. Meta's LLaMA 2 — Open Source

LLaMA (Large Language Model Meta AI) is a family of large language models, released by Menlo Park, Calif.-based Meta AI starting in February 2023. The first version of LLaMa came in four model sizes and were trained with 7, 13, 33 and 65 billion parameters. 

LLaMA 2 was released in the second half of July in partnership with Microsoft Azure and added support on Windows. It is freely available for almost anyone to use for research and commercial purposes. LLaMA 2 doesn't yet offer a demo application like ChatGPT or Google Bard. For now, the best way to try it out is through Hugging Face.   

LLaMA 2 was trained with 2 trillion "tokens" from publicly available sources. Among those sources are Wikipedia and Common Crawl.

A token represents a word or semantic fragment. This allows the model to understand text as well as its context so it can reasonably predict the next part of a text. LLaMA 2 is a pre-trained data set that answers queries in human-sounding language. It can be used in comparable ways to ChatGPT or Google Bard, including building chatbots, virtual assistants and customer support systems that engage users in natural, fluid conversations.

Related Article: 5 Generative AI Issues in the Digital Workplace

6. OpenAI's GPT — Proprietary

Probably the best-known LLM of all, GPT from San Francisco-based OpenAI, now in v4, is a multimodal LLM that can accept both text and image as inputs. It has demonstrated complex reasoning, understanding, advanced coding capability, proficiency in multiple academic exams, skills that exhibit human-level performance, and more.

It has also for the most part addressed the hallucination problem found in GPT 3.5 as well as improved responses. Version four has been trained on over one trillion parameters and supports a maximum context length of 32,768 tokens.  

To provide better control over GPT-4, OpenAI introduced the "system message," a directive in natural language given to GPT-4 with which users can specific tone and task of an instruction.  

GPT-4 is integrated into other websites that use this technology to respond to users. Users can only access GPT-4 via platforms like Microsoft Bing, ChatGPT or Duolingo. 

7. Vicuna's Vicuna-13B — Open Source

Vicuna's open-source chatbot model called Vicuna-13B was created by a team of researchers from UC Berkeley, CMU, Stanford, and UC San Diego and is known as Large Model Systems Organization (LMSYS Org).  

Its used Meta AI’s LLaMA model as its base and then fine-tuned it on user-shared conversations from Chat-GPT.

According to a blog on lmsys.org, in an informal study, Vicuna attained 90% of the quality of OpenAI ChatGPT and Google Bard and outperformed other models like LLaMA and Stanford Alpaca in more than 90% of the cases.  

The modular architecture of the model makes it more flexible, for users to easily customize and adapt to their specific needs.  

Its primary use is research on large language models and chatbots. The primary intended users of the model are researchers, those working with natural language processing, machine learning and artificial intelligence.

About the Author
David Barry

David is a European-based journalist of 35 years who has spent the last 15 following the development of workplace technologies, from the early days of document management, enterprise content management and content services. Now, with the development of new remote and hybrid work models, he covers the evolution of technologies that enable collaboration, communications and work and has recently spent a great deal of time exploring the far reaches of AI, generative AI and General AI.

Main image: Miti | unsplash
Featured Research