Everything you ever wanted to know about LLMs, AI and the new world that they're bringing in. We've biased towards terms that are useful rather than every possible term.
Creating a summary by generating new text, capturing the main ideas, and using different phrasing from the original content. We're doing this with Wagtail AI.
An attempt by Adobe to not let other people eat their design-software-shaped lunch. It's been trained on specifically licensed data to avoid the potential copyright infringement problems others will have.
A field of AI focused on recognizing, interpreting, and responding to human emotions, using techniques like sentiment analysis, facial expression analysis and natural language understanding.
Ensuring AI systems operate as intended, minimising harmful consequences, and addressing concerns like robustness, interpretability, and unintended biases. Often feels like it's low down the list of priorities.
Two areas. First: developing AI algorithms that treat different groups fairly, avoiding discrimination and bias, and ensuring equal opportunities and outcomes for users. Second: ensuring fair access to tools and avoiding the enclosure of the digital commons.
Advanced AI with the ability to understand, learn, and apply knowledge across various tasks, matching or surpassing human-level intelligence. There are lots arguing that this would create an apocalyptic scenario and represent either a species-level or planet-level threat depending on who you ask. The paperclip maximiser is a commonly used thought experiment.
A chatbot created by Google. The start was underwhelming where a live demo caused an 8% decrease in Google stock and it hasn't gotten much better since then. The Verge noted that it appears to have sacrificed interesting results in order to reduce the risk of giving incorrect results.
The Large Language Model that started it all. Google released the awkwardly named Bidirectional Encoder Representations from Transformers in 2018. It learnt to understand text context by analysing words in both directions, which meant that it was able to have a better sense of context. That meant it could tell the difference between “She's running the office” and “She's running the Boston Marathon”, which context-free models like word2vec couldn't do.
Prejudice or unfairness in AI systems, often arising from biased data or algorithmic design, leading to skewed results or discrimination.
A computer program designed to interact with users through text or voice, simulating human-like conversation for tasks like customer support, information retrieval, or entertainment. They often fall into the uncanny valley. It's also weird to realise you've just said “thank you” to a machine.
Shared resources, such as data, code, or knowledge, accessible by everyone in a community, promoting collaboration and open innovation. Many believe that the behaviour of OpenAI - and their peers - is an attempt to enclose the internet commons for private profit.
The level of difficulty or intricacy in a problem, system, or algorithm, often related to the number of parts, interactions, or steps involved.
AI systems capable of generating novel and valuable ideas, designs, or art, simulating human creativity in various domains like music, visual arts, or problem-solving.
The tendency to favor information that confirms one's existing beliefs or opinions, often leading to biased decision-making or distorted perceptions.
AI models that process data without considering surrounding information or environment, often leading to less accurate or relevant results, as they cannot capture the nuances or dependencies present in the data.
AI models that consider the surrounding information or environment when processing data, like understanding the meaning of a word based on the words around it, improving accuracy and relevance in tasks like text analysis or prediction.
A way to generate awkward looking images. It got very big by calling itself DALL-E-Mini despite no relationship to OpenAI or Dall-E.
The Distributed AI Research Institute takes a community approach to AI research and development. An important voice within conversations around ethics in AI taking the opinion that harms are preventable and intentional applications of AI can be beneficial.
A way to generate images using words that you type into a computer. The software has been created by OpenAI and has gotten surprisingly good compared to the awkward images it initially created. Lots of folks were convinced into believing the Pope was wearing a very expensive designer jacket thanks to an image Dall-E 2 created.
A collection of decision trees combined to improve predictive accuracy and reduce overfitting in machine learning tasks.
A type of machine learning model that makes decisions based on a hierarchical structure of conditions, resembling a tree with branches and leaves.
A subset of machine learning using artificial neural networks with multiple layers, enabling complex pattern recognition and learning from large amounts of data.
AI systems that adapt and evolve over time, continuously learning and updating their knowledge and abilities based on new experiences or data.
The study and application of moral principles and values to the development, deployment, and use of AI systems, ensuring they promote fairness, transparency, and social good. There are two, overlapping, schools of thought in relation to ethics. The first focuses on the present harm caused by biases and risk that exist, the second on speculative future harm such as Artificial General Intelligence.
A classification error where a positive instance is incorrectly predicted as negative, leading to missed detections or opportunities.
A classification error where a negative instance is incorrectly predicted as positive, resulting in false alarms or unnecessary actions.
A distributed learning approach, training AI models on decentralized data across multiple devices, preserving privacy and reducing data centralization.
AI systems that create new content, such as images, text, or music, by learning patterns and structures from existing data and generating novel outputs.
OpenAI's technology and an acronym for Generative Pre-trained Transformer. Currently GPT4 is the most recent, and most powerful, model released in March 2023. GPT3.5 was released in November 2022 and got very popular, very quickly. GPT3 was released in mid-2022. No-one really remembers when GPT2 was released and it's no longer available so probably doesn't matter.
A type of neural network designed to process and learn from graph-structured data, enabling complex relational reasoning and knowledge representation.
In AI, refers to generated outputs that seem plausible but are not accurate or relevant, often due to biases, overfitting, or insufficient training data.
A company specializing in natural language processing, providing open-source tools and pre-trained models for tasks like text generation, translation, and sentiment analysis.
AI technology that identifies objects, people, or features in images, using techniques like deep learning and convolutional neural networks.
Direct consequences or effects of a technology, such as increased productivity or job displacement due to automation.
Indirect effects stemming from primary impacts, like economic shifts or changes in consumer behavior due to technology adoption.
Long-term and far-reaching consequences of technology, including societal, cultural, or environmental changes, often difficult to predict or measure.
The process of bypassing software restrictions on a device, typically a smartphone, to gain full control over the system and access unauthorized features or applications.
A data structure that represents information as a network of entities and their relationships, enabling AI systems to reason and infer new knowledge.
Data that includes information about the desired output, like correct answers or categories, used for training and evaluating supervised machine learning models.
Another Large Language Model from Google that came out between BERT and Bard. It was deliberately designed for open-ended conversations with the idea of powering chatbots or virtual assistants. This was the software that a Google engineer claimed had become sentient.
A tool - available in Python or Javascript - that helps create apps using large language models. It connects these models to other data sources and allows them to interact with their environment, which in theory should create more useful outputs.
The concept of a language being unique to a particular culture, community, or context, often resulting in barriers to communication or understanding.
AI models trained on vast amounts of text data, capable of understanding and generating human-like language across various tasks and domains.
A method for finding hidden patterns in text by analyzing relationships between words and documents, useful for tasks like grouping similar texts or discovering topics.
A probability distribution that an AI model learns from data, capturing patterns or relationships between variables, used for tasks like prediction or simulation.
The open source Large Language Model that Meta accidentally allowed to leak into the world. It's the basis for most of the non-OpenAI or Google LLMs that have popped up recently. Many of these LLMs are now running with very little hardware and exhibiting GPT3-esque abilities.
Automatically converting text from one language to another using AI, with the goal of producing accurate and natural translations.
A part of natural language processing that creates human-like text from data or other text, often using advanced AI techniques.
The area of AI focused on helping computers understand, interpret, and generate human language for tasks like analyzing text, summarizing, and translating.
In classification problems, the group of instances or examples that do not have a specific characteristic, like absence of a disease or negative sentiment.
A computer model inspired by the human brain, made up of connected nodes or neurons, used in AI to learn patterns and make predictions.
A problem in machine learning where a model learns its training data too well, including noise and inconsistencies, resulting in poor performance on new data.
A thought experiment in AI safety, illustrating the potential risks of an AI system with a simple - but totally pointless - goal, like making paperclips. If the paperclip maximiser pursued the goal of creating paperclip without limits we'd be in for a - literal - world of pain. We'd all feel pretty dumb if the apocalypse happened because a machine was trying to bend endless bits of metal. Relates to Artificial General Intelligence.
In classification problems, the group of instances or examples that have a specific characteristic, like presence of a disease or positive sentiment.
Input phrases or questions given to an AI language model to guide its response or output, helping it generate relevant and focused text.
AI technology that understands and answers questions in natural language, often used for tasks like customer support, information retrieval, or tutoring.
An ensemble machine learning method that combines multiple decision trees to improve prediction accuracy and reduce overfitting.
A type of machine learning where an AI agent learns to make decisions by receiving feedback or rewards for its actions, improving its performance over time.
A learning method where AI models generate their own training data, often by predicting parts of the input, reducing the need for labelled data.
An emotion or opinion expressed in text or speech, often analyzed by AI to understand feelings, attitudes, or preferences.
Using AI to determine the emotions or opinions expressed in text or imagery. It's mostly used for understanding customer feedback or tracking public opinion. There was a brief belief that images of humans could be analysed to understand the emotions someone was experiencing.
AI models that transform input sequences, such as text or audio, into output sequences, used for tasks like translation, summarization, or speech recognition.
AI models that convert input sequences, like text or audio, into output sequences, used for tasks like translation, summarization, or speech recognition. First proposed by Tomáš Mikolov in 2012, it was first used at Google for translation and has proven to be fundamental to LLMs.
Replicas or imitations of things, often referring to AI-generated outputs that closely resemble real objects or events, like images, text, or speech. It's an annoyingly difficult word to pronounce!
In the context of AI refers to how we can represent the working of large language models. The theory is that models are simulating a learned distribution of how our world works because of the fact they have been trained on a large corpus of human generated text.
AI systems that don't change or adapt over time, typically limited to a fixed set of tasks or knowledge. This is the classic idea of computers where they take an if-else approach to completing tasks.
A random process involving a series of events, where the outcome of each event depends on probabilities, often used in AI to model uncertainty or randomness. It's a critical behaviour of large language model outputs and why the same prompt will generate different answers if asked at different times.
Technology that converts written text into spoken words using synthetic voices, often used in applications like virtual assistants or accessibility tools.
A type of neural network architecture designed for natural language processing tasks, known for its ability to handle long-range dependencies in text.
A correct classification where a negative instance is accurately predicted as negative, indicating a successful rejection of a false outcome.
A correct classification where a positive instance is accurately predicted as positive, signifying a successful detection or confirmation.
The assignment of speaking turns in a conversation. As in, who gets to talk. This is normally not something humans think about but innately understand based on rules and conventions in our respective cultures. This has gotten awkward with chatbots and the interaction design can veer towards uncanny valley territory.
The process of alternating between speakers in a conversation, following social norms and cues to manage the flow of communication. Those social norms are less fixed when it's a human-machine interaction. There was a brief media storm about kids being impolite to smart speakers, which was taken to mean that kids would be impolite to other people. With the way we're interfacing with Large Language Models that turn taking friction is likely to increase.
That odd experience where a human-adjacent object, such as a robot or chatbot, is pretty realistic but causes discomfort or unease due to its slightly unnatural appearance or behaviour. Think all the Midjourney images with their seven fingers and awkward limbs. Less “realistic” representations can often be more appealing.
The degree to which a variable or factor influences the outcome of a process, often used in statistics and machine learning to assess importance.
A storage system that organises and retrieves data using numerical vectors, often used in AI applications to manage high-dimensional or complex data. Alongside word embedding they can store how close vectors relate to each other, which is really useful if you're trying to synthesise data from different sources or that has been unreliably categorised.
AI technology that recreates a person's voice by analysing and mimicking their speech patterns, often used for virtual assistants, voiceovers, or entertainment.
Named after the arch-rival of Luigi, this is a phenomenon related to making a chatbot go bad. Most prominently seen in early 2023 when Bing - Microsoft's search engine - first started using GPT
Numeric representations of words that capture their meaning and relationships, used in AI models to understand and process language. A challenge is how the meaning and relationships are represented because these will be multifaceted around sentiment, subject matter or category.
A popular machine learning library that provides an efficient and scalable implementation of gradient boosting, often used for tasks like classification or regression.
A friendly dinosaur. They're green and produces white eggs. As far as we're aware there's nothing that relates to Large Language Models that begin with 'Y' but it felt sad to not include all the letters from the English alphabet in this glossary.
A type of machine learning where a model can make predictions or solve tasks without having seen any examples during training, relying on its ability to generalise.