Glossary & Index
Definitions of every term used in the paper that may be unfamiliar to readers from a policy or creator-economy background, plus an index of every named person and organization referenced, with the most useful public links and contact details for each.
Glossary of terms
- AIISP
- The Artificial Intelligence Inference Standards Protocol — the open wire-format and on-chain settlement standard proposed in the paper and published in draft as AIISP-1. github.com/97115104/aiisp-spec ↗
- Attestation
- A signed, time-stamped statement by an identifiable party that a specific piece of content was produced by them in a specific role at a specific moment. Used as the primitive that binds compensation and authorship to a given data contribution.
- Base
- An Ethereum Layer 2 network maintained by Coinbase that provides low-cost, high-throughput settlement while inheriting the security properties of the underlying Ethereum main chain. Selected as the settlement venue for AIISP contracts. base.org ↗
- Blockchain
- A distributed database in which records are grouped into cryptographically linked blocks and validated by a network of independent participants, providing data structure underneath a public ledger.
- ERC-20
- The Ethereum Request for Comment number 20 — the canonical token standard on Ethereum and its Layer 2 networks. Defines the minimum interface (transfer, balance, approval) that any fungible token contract must implement so that wallets, exchanges, and settlement contracts can interact with it without bespoke integration. Used as the contract type each frontier provider mints to identify itself on Base.
- GPU
- Graphics Processing Unit — originally a co-processor for rendering graphics and now the dominant hardware for training and serving large neural networks. 5090-class GPU refers to the 2025-generation NVIDIA RTX 5090 consumer card and equivalents.
- Hallucination
- Output from a language model that is fluent and internally plausible but factually incorrect or unverifiable against any external source — including fabricated citations, events, statistics, and quotations. A property of the generation process, not a deliberate act by the model.
- HDC
- The Human Data Collective — the decentralized network of human creators, peer reviewers, and on-chain settlement contracts proposed in the paper as the reference implementation of AIISP. humandatacollective.org ↗
- Model weights
- The learned numerical parameters of a neural network — typically a large matrix of floating-point numbers on the order of hundreds of billions of entries for a frontier language model — which together encode what the model has learned from its training data and which are the artifact that is copied, served, and in proprietary systems withheld from external inspection.
- Public ledger
- An append-only record of transactions maintained by a blockchain network and readable by anyone, in which each state change is cryptographically linked to those before it so that the history cannot be quietly rewritten and every inflow, outflow, and retirement event is independently auditable.
- RLHF
- Reinforcement Learning from Human Feedback — the post-training technique in which a base language model is fine-tuned against a reward model trained on human preference rankings of its outputs. Introduced as the dominant alignment method for frontier consumer models, and used in the paper to refer to the broader supply chain of human raters whose labor that technique requires.
- Safe-harbor provision
- A clause in a statute or regulation that protects a party from liability if specified conditions are met. In the paper this means legal recognition that on-chain settlement against an audited registry constitutes valid and sufficient evidence of environmental remediation.
- Smart contract
- A small program deployed to a blockchain that executes automatically when its conditions are met and whose execution is recorded on the public ledger. Used in the paper to handle settlement, attribution, and credit retirement without a human intermediary.
- TPU
- Tensor Processing Unit — Google's custom-designed application-specific integrated circuit for accelerating machine-learning workloads, available to external developers through Google Cloud. The canonical example of a non-GPU accelerator at warehouse scale. cloud.google.com/tpu ↗
Named persons
- Sam Altman Co-founder and CEO of OpenAI. Cited for his May 2023 U.S. Senate testimony advocating an international oversight body for AI. blog.samaltman.com ↗ @sama ↗
- Leopold Aschenbrenner Former member of OpenAI's Superalignment team and author of Situational Awareness. Quoted in §Murky Data on the capability impact of RLHF. situational-awareness.ai ↗ @leopoldasch ↗
- Emily M. Bender Professor of linguistics at the University of Washington and co-author of Stochastic Parrots. faculty.washington.edu/ebender ↗ @emilymbender (Mastodon) ↗
- Timnit Gebru Founder and executive director of the Distributed AI Research Institute and co-author of Stochastic Parrots. dair-institute.org ↗ @timnitgebru (Bluesky) ↗ @timnitGebru ↗
- Karen Hao Author of Empire of AI: Dreams and Nightmares in Sam Altman's OpenAI; primary on-the-ground source for the environmental, labor, and corporate-history material in the paper. karendhao.com/empire ↗ LinkedIn ↗ Bluesky ↗ @_KarenHao ↗
- Demis Hassabis Co-founder and CEO of Google DeepMind; 2024 Nobel laureate in Chemistry (with John Jumper) for AlphaFold. Quoted in §Murky Data on data curation as the field's emerging bottleneck. deepmind.google ↗ @demishassabis ↗
- Austin Harshberger Author of this paper, founder of Happy Stack Calculus, and editor of the AIISP-1 specification. links.97115104.com ↗ blog.97115104.com ↗ [email protected]
- Jared Kaplan Co-founder and chief science officer of Anthropic; lead author of the foundational scaling-laws paper for neural language models. Quoted in §Murky Data. LinkedIn ↗
- Jaron Lanier Computer scientist, author, and Microsoft Research interdisciplinary scientist; co-author with E. Glen Weyl of the data-dignity argument cited in Appendix A. jaronlanier.com ↗
- Gavin Leech Co-author with Dwarkesh Patel of The Scaling Era. Google Scholar ↗
- Angelina McMillan-Major Computational linguist and co-author of Stochastic Parrots. Google Scholar ↗
- Margaret Mitchell Computer scientist and co-author of Stochastic Parrots (writing as Shmargaret Shmitchell at the time of original publication); chief ethics scientist at Hugging Face. m-mitchell.com ↗ @mmitchell_ai ↗
- Dwarkesh Patel Independent technology podcaster whose long-form interviews with leading AI researchers form the basis of The Scaling Era. dwarkesh.com ↗ @dwarkesh_sp ↗
- Sonia Ramos Lickanantay water defender from the Atacama region of northern Chile. Her IIED interview on territorial injustice in lithium and data-center expansion is the source of the quotation in §Environmental and Social Costs. IIED interview ↗
- Tania Rodríguez Organizer with MOSACAT in Cerrillos, Chile; credited in Karen Hao's Empire of AI for leading the community campaign that obtained Google's environmental filing for the proposed Cerrillos data center. Instagram ↗
- Trebor Scholz Founder of the Platform Cooperativism Consortium at The New School and author of Platform Cooperativism. platform.coop ↗ The New School ↗
- Nathan Schneider Associate professor of media studies at the University of Colorado Boulder and author of Everything for Everyone. nathanschneider.info ↗ @ntnsndr ↗
- Ilia Shumailov Researcher whose 2024 Nature paper on model collapse under recursive training on synthetic data is cited throughout. iliaishacked.github.io ↗
- Emma Strubell Assistant professor at Carnegie Mellon University whose 2019 ACL paper on the energy and policy implications of deep learning in NLP is cited in §Environmental and Social Costs. strubell.github.io ↗ @strubell ↗
- E. Glen Weyl Founder of RadicalxChange and researcher at Microsoft Research; co-author with Jaron Lanier of the data-dignity argument cited in Appendix A. glenweyl.com ↗
Named organizations
- Alphabet / Google Parent company of Google, Google DeepMind, and Google Cloud. Cited in §Brief on LLMs as one of the largest training-data holders and in §Environmental and Social Costs for the Cerrillos data-center proposal. abc.xyz ↗ google.com ↗
- Anthropic Frontier AI laboratory and developer of the Claude family of models, organized as a public benefit corporation. anthropic.com ↗ @AnthropicAI ↗
- Appen Australian-headquartered data-annotation and RLHF vendor; cited in §Murky Data and §For the Artificial Intelligence Industry. appen.com ↗
- Audius Decentralized music-streaming protocol cited in Appendix A as a precedent for on-chain creator-share contracts. audius.co ↗
- Base Ethereum Layer 2 network maintained by Coinbase; the proposed settlement venue for AIISP-1. base.org ↗ @base ↗
- Bonneville Environmental Foundation U.S. non-profit and the issuer of Water Restoration Certificates referenced throughout §Proposed Solution. b-e-f.org ↗
- Coalition for Content Provenance and Authenticity (C2PA) Standards body whose technical specification underpins the attestation primitive used in this paper. c2pa.org ↗
- Coinbase U.S. cryptocurrency exchange and the maintainer of the Base Layer 2 network. coinbase.com ↗ @coinbase ↗
- Distributed AI Research Institute (DAIR) Independent research institute founded by Timnit Gebru, cited in §For Contributors. dair-institute.org ↗
- Fairwork Oxford Internet Institute project that benchmarks platform working conditions against five fair-work principles. Referenced in §For Contributors. fair.work ↗
- GitHub Microsoft-owned source-code hosting platform and developer of GitHub Copilot. Cited throughout §Current Centralization and Pricing. github.com ↗
- Gold Standard Carbon-credit certification body referenced in §Proposed Solution as a permitted registry for HDC Carbon Fund retirements. goldstandard.org ↗
- Google DeepMind AI research subsidiary of Alphabet, cited in the Acknowledgements and §Murky Data. deepmind.google ↗
- Greenlining Institute California-based racial-and-economic-equity policy organization named in Appendix A as a candidate co-sponsor for a California disclosure bill. greenlining.org ↗
- Happy Stack Calculus The author's company and the entity under which AIISP-1 is being developed. links.97115104.com ↗ [email protected]
- International Atomic Energy Agency (IAEA) UN agency cited in §Brief on LLMs as the model for proposed international AI oversight bodies. iaea.org ↗
- International Institute for Environment and Development (IIED) London-based policy research institute and publisher of the Sonia Ramos interview cited in §Environmental and Social Costs. iied.org ↗
- Meta Parent company of Facebook, Instagram, WhatsApp, and the Llama family of open-weight language models. about.meta.com ↗
- Microsoft Cloud and software vendor and principal investor in OpenAI; the developer, with GitHub, of GitHub Copilot. microsoft.com ↗
- MOSACAT Movimiento Socioambiental Comunitario por el Agua y el Territorio in Cerrillos, Chile; the community organization that successfully blocked the proposed Google data center. mosacatchile.cl ↗
- National Institute of Standards and Technology (NIST) U.S. federal agency named in Appendix C as the proposed publisher of reference per-region energy benchmarks. nist.gov ↗
- OpenAI Frontier AI laboratory and developer of the GPT family of models and ChatGPT. openai.com ↗
- Optimism Ethereum Layer 2 ecosystem and operator of the Retroactive Public Goods Funding rounds referenced in Appendix B. optimism.io ↗
- Penguin Publishing Group Publisher of Karen Hao's Empire of AI. penguin.com ↗
- Royal On-chain music-royalty platform cited in Appendix A as a precedent for fractional creator-share contracts. royal.io ↗
- S&P Global / Markit Financial-information provider whose registry is used as the system of record for Water Restoration Certificate retirements. spglobal.com ↗
- Sama Data-annotation vendor whose Kenyan content-moderation contract for OpenAI is the central case study in §Murky Data. sama.com ↗
- Scale AI U.S. data-annotation and RLHF vendor; cited in §Murky Data and §For the Artificial Intelligence Industry. scale.com ↗
- Stanford Institute for Human-Centered AI (HAI) Publisher of the annual AI Index Report cited in §Environmental and Social Costs. hai.stanford.edu ↗
- Stocksy United Photographer-owned stock-imagery cooperative referenced in Appendix A for its 75% creator share. stocksy.com ↗
- Stripe Press Publisher of Dwarkesh Patel and Gavin Leech's The Scaling Era. press.stripe.com ↗
- UCLA Institute for Technology, Law and Policy Academic policy center named in Appendix A as a candidate California co-sponsor. itlp.law.ucla.edu ↗
- Verra (Verified Carbon Standard) Carbon-credit certification body referenced in §Proposed Solution as a permitted registry for HDC Carbon Fund retirements. verra.org ↗
- The White House Issuer of Executive Order 14110, cited in §Brief on LLMs. whitehouse.gov ↗
- World Wide Web Consortium (W3C) Standards body that publishes the PROV-O provenance ontology referenced in Appendix A. w3.org ↗
- X (formerly Twitter) Social network owned by xAI and cited in §Brief on LLMs as one of the largest first-party training-data holders. x.com ↗
- xAI Frontier AI laboratory founded by Elon Musk and the operator of X. x.ai ↗
Spotted a missing person, organization, or term? Open an issue at github.com/97115104/aiisp-spec ↗ or email [email protected].