Large language model

A large language model (LLM) is a type of neural network language model with a very large number of "parameters" (meaning a big neural network, tens of millions or more artificial neurons, hence the 'large' in the name). An LLM is at the core of generative AI systems like ChatGPT and its competitors. Enormous quantities of text, e.g. major sites such as Wikipedia, collections of books and articles, and portions of the web from the can be used to create an LLM.

Essentially, an LLM is a big, fuzzy text database, which stores how probable it is that some things follow other things in text – the text on which the LLM was "trained", i.e. built. The text stored is meaning that each unique combination of letters or symbols treated as a word is assigned a number, and these numbers are what the LLM deals with, rather than words as we see them. The output produced by a generative LLM is in turn translated back from such numbers to text such as we are familiar with.

Producing "mathematically plausible" responses, LLMs have a superhuman ability to imitate style and always come up with an answer (right or wrong), without ever dealing with the distinction between style and substance. An LLM neither thinks nor perceives in human terms, and apart from the product of training it on data, the only memory it has is the current input used to produce output, which may e.g. be added to as a person chats with it until the session ends and nothing remains.

LLMs used for imitating human communication and works are easy to anthropomorphize; whenever the training data is filled with human expressiveness, such is parroted back, and furthermore, humans in the same way as into the works of human authors, doing much of the job of being convincing for the AI system.

Stochastic parrots
A stochastic parrot is an LLM good at generating convincing human language. Coined by linguist the term was introduced in  by her and other researchers, named "On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜". The term conveys the sense of a skilled probabilistic imitator working without any understanding, much like a parrot can imitate the sound of human speech without understanding it, and the associated paper is critical of how LLMs can be misused, misunderstood, and basic flaws to the technology.

The paper brings up how LLMs regurgitate biases and prominent errors included in their training data in ways which can't be reliably controlled for, and that LLMs are inscrutable and can stitch together 'dangerously wrong' results. It mentions how people tend to see meaning and coherence where it does not exist, and that both the general public and researchers may fool themselves into seeing more than exists when interacting with LLMs or reading what they produce. Furthermore, the training (i.e. building) of LLMs is also financially and environmentally costly due to computational costs.

In late 2020 Google tried to pressure a co-author of the paper and one of the leaders of Google's Ethical AI Team, into either retracting the paper or censoring the names of the authors involved who were Google employees. She refused to do so and abruptly lost her job. (Other co-authors at Google were also pressured into removing their names, and largely complied.) Google's maneuvering backfired, the incident becoming infamous and the paper very well-read. As of July 2023, the paper has been cited in 1,858 publications. In early 2021 another co-author of the paper and the other Ethical AI Team lead at Google, was fired after digging into the matter of how Gebru had been treated.

The paper was never controversial from an academic perspective, so when Google motivated their attempted censorship with vague insinuations of the paper not taking recent research findings into account, refusing to clarify to Gebru what the problem was and how it may possibly be remedied, Google's version is not very credible. In relation to Google's commercial activities, the paper was somewhat at odds with efforts and possible future plans to hype LLM technology. However, Gebru has claimed that the abrupt loss of her job came at least in part as a reaction against her advocacy for diversity at Google, and her expressions of dissatisfaction with the measures used back then.

Some who professionally hype AI technology have taken digs at the paper and its idea of the stochastic parrot. OpenAI's CEO Sam Altman tweeted not so long after their launch of ChatGPT, "i am a stochastic parrot, and so r u". It's not obvious whether he truly believes that, though there are those, like ex-Google engineer, who do.

Hallucination
When AIs get facts wrong and make stuff up, claiming things that were not included in the training data set, this is called by analogy with errors in human perception. However, this term – often used in connection with LLM chatbots that produce falsehoods – is sometimes criticized for anthropomorphizing AIs and being a misnomer, for example by statistician Gary N. Smith.

There is no essential difference to the quality of what is produced when it is found acceptable and when it isn't; the LLMs don't deal with concepts of truth or falsehoods or any such evaluation, and are much like BS artists who sometimes fail to be convincing. LLMs can thus be viewed as hallucinating all of the time, it being a matter of statistics that these hallucinations often coincide with what is wanted (and are then usually not viewed as hallucinations), but not always.

Emergence or mirage from metrics?
As language models have grown larger, according to some metrics they have suddenly gained new skills – apparently unexpected "emergent abilities", as first described by a team of researchers in 2022. Examples include the ability to deal in some ways with arithmetic, solve simple tasks involving the individual letters in a word, disambiguating words, etc. It also includes new ways of using an LLM, such as However, research by Schaeffer et al argues that such abilities do not unpredictably pop up out of nowhere, but that if studies are made using different and more carefully chosen metrics – linear instead of nonlinear, continuous instead of discontinuous – those abilities can be seen to gradually grow into prominence, instead of there being any thresholds and sudden leaps involved. Thus, they argue, the 'emergence' is a mirage, a byproduct of the choice of metrics.

The idea of "emergent abilities" has become tied to hype, hopes, and fears in the world of AI vendors and "AI safety". Research and development has focused on increasing model sizes in part in order to hunt for new abilities which may suddenly (it seems) pop up. However, calling 'emergence' into question also suggests that smaller models may be able to do the same tasks as bigger ones, only a bit more roughly (or very roughly if too small), which may sometimes suffice while being computationally cheaper. The 'mystery' surrounding emergence of 'intelligent' skills has also been tied to dreams and nightmares about strong AI; what if the model size increases further, and the LLM then suddenly grows superpowers and takes over the world? Realistically, no, but the general philosophy of "AI doomerism" prominent with leading AI vendors encourages such thinking.

False hopes for strong AI
Please do not conflate word form and meaning. Mind your own credulity. The LLM AI boom which began with the success of ChatGPT has seen much hype for the potential, hopes for, and fear of near-future strong AI, also called Artificial General Intelligence – an AI that is truly conscious, intelligent, and has agency of its own – though arguably there's no credible research suggesting that LLM development may lead to such. The debate has been lively, with a number of economists, computer scientists, and business leaders having pushed such hype, often in accordance with financial interests. As of July 2023, opposition gradually grows, including from cognitive scientists who argue there's no basis for LLM-based systems having a mind to speak of.

A 2023 paper by Microsoft researchers titled "Sparks of Artificial General Intelligence: Early experiments with GPT-4" exemplifies the contentious, non-peer reviewed corporate research which skeptics of the AGI-from-LLM hype deem pseudoscientific. With such papers, Microsoft and their business partner OpenAI do not provide others with the training data or information needed to independently create systems that perform as claimed, or experiment with anything beyond using a black box product on offer, and so, withhold means of replication except at a more superficial level. With the "sparks" paper, an extraordinary claim is basically made in such a way as to be unfalsifiable. Other players, e.g. Google, play similar games with some of the research published, in withholding training data for their models while showcasing the capabilities of the models, effectively publishing PR masquerading as science. This is a continuation of an older trend, a wider replication crisis in AI research having been described back in 2018, the result of businesses treating the means of replication as trade secrets.

It could be that the researchers who see general intelligence in their LLM AIs have fallen victim to the same basic phenomenon as with psychics who come to believe in that their own performances are real. Even if sincere in their work, they may have reinvented the persuasive power of the mentalist's con game, and subjected themselves to a feedback loop of subjective validation of what they wish to see. (Comparisons of chatbot AIs to the magician's craft are not new, and have long been used by skeptics who find the Turing test inappropriate as a way to gauge the intelligence of machines, for the same reason that the persuasiveness of a magician's performance is not a good indicator of the genuine presence of supernatural powers. In a nutshell, the problem is that the main thing tested is the discernment of the audience.)

BERT
(Bidirectional Encoder Representations from Transformers) is a family of LLMs introduced in 2018 by researchers at Google. In a little over a year, BERT became a baseline for experiments. BERTs are generally smaller and faster but also less capable than GPTs. Developed for research purposes, Google made a set of BERT models freely available, along with the associated software.

GPT
(Generative pre-trained transformer) is a type of LLM first developed by OpenAI and introduced in 2018. While OpenAI has developed a series of GPT versions, the name is also used for some basically similar LLMs developed by others, GPT being a prominent framework. Some OpenAI GPT versions are the basis for ChatGPT.

LaMDA
(Language Model for Dialogue Applications) is a family of conversational LLMs developed by Google, introduced in 2021 (but also earlier in 2020 under the name Meena). Most known for the bogus June 2022 claims of Google engineer Blake Lemoine that it had become sentient (claims rejected both by Google, who ultimately fired him, and the scientific community), LaMDA is also the basis for earlier versions of Google's The Lemoine incident led to more widespread criticism of the suitability of the Turing test for gauging intelligence (not to mention sentience).

ChatGPT
Launched by OpenAI in November of 2022, ChatGPT (a system based on GPT-3.5 and later GPT-4) went viral and led to a boom in the commercial development and use of LLMs. Usable for many things, from entertainment to generating computer program code, Google feared that it may become a "Google killer" and scrambled to create the chatbot in response, while Microsoft decided to partner with OpenAI. The mainstream use of the technology sparked widespread fear of AI-generated plagiarism, cheating, and disinformation, alongside hopes of new kinds of automation and productivity gains in the times ahead.

LLaMA
(Large Language Model Meta AI) is a family of LLMs by Meta Platforms, first released in February 2023. Compared to GPT, LLaMA accomplishes more with less – a 13 billion parameter version reportedly outperforming a 175 billion parameter GPT-3 on most natural language processing benchmarks. Meta shared the LLaMA model weighs with researchers under a non-commercial use license, following which they soon leaked and became available to the general public.

As of 2023, LLaMA is the only LLM with capabilities roughly on par with GPT-3 to run at decent speed on consumer-grade hardware, meaning it can be ran locally, e.g. on laptops and smartphones, rather than relying on an Internet connection to an AI vendor's server.

Plagiarism and cheating
After ChatGPT was launched in November 2022, it took less than two months before some students were caught using it to cheat on exams, and fears of a new, difficult-to-counter kind of plagiarism began to spread in academia. At the same time came fears of such LLM AI furthering the spread of disinformation. Tools for detecting LLM AI-generated texts entered use within half a year of ChatGPT being released, but they are unreliable. Such tools can have 10% or more false positives, they often fail to catch some types of AI generated texts, and they are easy to defeat by paraphrasing the AI generated text by hand or using another tool. Paraphrasing also defeats suggested countermeasures such as an AI vendor voluntarily watermarking AI-generated texts for easy detection.

Popular examples of false positive include the United States Constitution and portions of the Bible, which are deemed wholly AI-generated by various AI-detection tools, for the simple reason that they're among the texts which LLM models are trained on to the point of imitating them. In new human writing, some legalistic, academic, and other formal writing styles are especially likely to falsely be judged AI-generated. Further, LLMs newer and more refined than GPT-3.5 generate text statistically more human-like, thus more difficult to catch. Much like the text-generating AIs, the plagiarism-catching AIs turn out to be over-hyped, sometimes trusted when they shouldn't be, or even sold with false promises.

ChatGPT and GPT-4 have passed various exams largely dependent on rote memorization which humans generally need to intensely study to pass – of course without understanding any of the subject matter. Essentially simulating rote memorization combined with guessing and verbal agility, these AI versions often perform passably, though not excellently. The pattern of failures for the AIs differ from those of humans, and it can e.g. unexpectedly fail to do some simple arithmetic for a business exam. While humans can spot some such cheating, most instances of cheating cannot be reliably caught.