The Generative Information Packet

A proposal for structural separation and embodied provenance in the post-Anthropogenic archive.

Stephen Mucher, Ph.D. · Founder, Sondage · April 2026

The Generative Information Packet (GIP) is a proposed archival unit for records produced by large language models and other generative systems, parallel to the Archival Information Packet (AIP) of the Open Archival Information System but structurally separated from records of human origin.

§

In February of this year, the Library of Congress joined the Coalition for Content Provenance and Authenticity in issuing a call to the libraries, archives, and museums community. The central question is whether the archive of the twenty-first century can still tell what a human being made from what a model generated. The white paper names the stakes clearly. It does not yet name the instrument the field will need.

The response so far, across institutional archival practice, has been labeling. The IPTC has defined a Digital Source Type value for synthetic media. The C2PA has built cryptographic assertions embeddable in image and video files. Major library systems now add provenance subfields to records enriched with AI-generated metadata, and a growing number of archives append parenthetical disclosures to descriptions produced by language models. These interventions are useful. For a life-history archive, none of them is sufficient.

The stake is not only scholarly. Anyone who intends to leave a record behind has an interest in whether that record survives as evidence or dissolves into plausible inference. Family offices managing multigenerational estates, alumni offices preserving institutional memory, and the individuals whose lives those institutions exist to honor are all exposed to the same risk. A label inside a file is not an inheritance. The architecture under that file is.

This essay proposes an instrument, names it, and argues that the instrument by itself does not complete the archival response. The separation between authenticated and synthetic content, once drawn, must be drawn by a human practitioner who can be held to account for the claim. Labeling can be automated. Attestation cannot.

What is wrong with labeling

A label is a claim embedded inside a record. The record containing it still sits in the archive's catalog as one object, returned in one search result, stored in one folder. The label is searchable for a reader who knows to check the field. For a reader who does not, the label is invisible.

This works well enough for annotating AI-generated content in collections where the stakes are descriptive. It fails where the category of the object carries evidentiary weight. In a primary source archive, future historians need to distinguish the human record from its synthetic description without relying on the color of a field or a parenthetical aside. That distinction has to sit in the architecture of the archive, not in the annotation of the record.

Archival practice has relied on respect des fonds for more than a century and a half. The records of a given creator are kept separate from the records of other creators, not merely labeled as such, because the context of creation is part of what the archive preserves. The principle is spatial. What respect des fonds has not yet confronted is a creator categorically different from any the archive has previously received. A generative model leaves no biographical residue, no correspondence, no institutional history. It produces records at scale, formally indistinguishable from records produced by human hands. The principle that separates the records of one human creator from another now needs to separate records produced by human creators from records produced by models. The traditional instrument extends. It does not bypass.

What is a Generative Information Packet

A Generative Information Packet, or GIP, is an archival container that holds all AI-produced description, summary, indexing, and derivative material associated with a record. It is accessioned alongside the record, never within it. In a Sondage-Certified Archive, the GIP occupies its own subdirectory, distinct from the human-sourced audio, distinct from the human-generated metadata, visible as a separate object to any person or system that opens the accession.

The separation does three things.

It preserves the primary source as primary. The authenticated recording, the scholar's documented methodology, the Senior Fellow's editorial decisions, and the Curator's accession notes remain categorically distinct from anything a model has produced about them. A historian in 2075 opening the accession finds human-sourced material and machine-generated description as different objects, stored in different places, answering to different provenance claims.

It permits AI-assisted tools to do useful work without contaminating the record. Machines can sort large collections, generate searchable summaries, extract named entities, transcribe audio for indexing, and produce finding aids that accelerate discovery. This work is real, and the GIP is where the products of this work belong. The container describes, summarizes, and indexes the human material. It cannot become that material.

It makes the archive legible to future source criticism. A historian trained to read sources asks, of any document, who made it and under what conditions. The GIP makes the answer to that question available at the level of the directory tree, not buried in a metadata subfield. This is the archival response the labeling approaches cannot provide.

The Sondage Standard requires the GIP for every certified accession. Separation is enforced at the point of ingestion and carried forward to the Senior Fellow's sovereign archive. We adopted the instrument because the commingling of authenticated and synthetic records, left structurally unaddressed, produces an archive the historian of the future cannot trust. The instrument seems inevitable. What the field ends up calling it, and who codifies it, remains open.

What is embodied provenance

Structural separation solves half of the problem. The other half is a question the labeling approaches also fail to answer, and the GIP does not answer on its own. Who draws the line?

The C2PA specification assumes the line can be drawn cryptographically. The IPTC assumes the line can be drawn by the system that generated the content. Library normalization rules assume the line can be drawn by a software workflow. In each case, the distinguishing act is performed by the same category of system it is meant to classify. A classifier certifying its own output's provenance is the archival equivalent of a notary notarizing their own signature.

The second commitment behind the GIP is that the separation it enforces must be performed by a qualified human practitioner exercising disciplinary judgment. Sondage calls this embodied provenance.

The phrase needs care, because it does not mean what it might appear to mean. Embodied provenance does not require that a practitioner have been physically present at the moment a record was sourced. A scholar authenticating a 1963 postcard, tracing its route through three generations of a family, documenting what is known and acknowledging what cannot be reconstructed, is performing embodied provenance. A curator receiving a donated collection and certifying how it entered the archive is performing embodied provenance. A scholar conducting a Season in real time and documenting the conditions of the encounter is performing embodied provenance. What unites these acts is not simultaneity. It is accountability.

Embodied provenance, as Sondage uses the term, has three components. A qualified human practitioner whose training and professional standing give the attestation weight. The exercise of disciplinary judgment, meaning the practitioner reads the evidence, weighs the gaps, and decides what can be claimed with confidence. And the principle that the certifying act itself cannot be delegated to a system. Tools assist. Only a human signs.

The lineage is older than it looks. In The Archive and the Repertoire, Diana Taylor counterposed the archive of inert documents to the repertoire of embodied transmission, the song, the ceremony, the memory carried in bodies across generations. Taylor's insight was that a record is not legible without a living chain of people willing to vouch for it. The archive's own legibility depends on the repertoire of attested human acts that brought its holdings into it. The archival tradition itself made this commitment a century ago. Hilary Jenkinson's archivist-as-custodian assumed, without needing to argue it, that the authenticity of the holdings rested on the accountable standing of the professional who received and kept them.

Under the Sondage Standard, the Guild of Certified Legacy Scholars, Legacy Sound Producers, and Legacy Collection Curators performs this work. Each certification carries the practitioner's name, disciplinary affiliation, and professional accountability forward into the accession. The GIP separates the human from the synthetic at the level of architecture. The Guild draws the line at the level of attestation. Neither instrument works without the other.

What this asks of the field

The commingling of authenticated and synthetic records will not be repaired by labels, watermarks, or cryptographic assertions acting alone. It asks for an architecture that holds authenticated and synthetic records in distinct structural locations. It asks for the distinction itself to be drawn, accession by accession, by a human practitioner who takes responsibility for the claim.

Sondage has adopted the GIP because the life-history archive cannot wait for the field to converge on a standard. Other archives, including research universities, national libraries, community oral history projects, and the family-office collections that increasingly underwrite private preservation work, will need instruments of their own scale and character. If the GIP is taken up, that is a good outcome. If the instrument prompts something better by another name, that is also a good outcome. What must not happen is the drift by which labeling is accepted as sufficient and the structural question is quietly left to the systems that produced the problem.

The archive we leave the historians of 2075 is being built now. They will ask what we separated, and who was willing to sign the separation.

§

This essay appears in The Sondage Review, the publication of Sondage. The Review publishes invited writing on the Authentication Horizon, Embodied Provenance, the Input Gap, the Human Standard, Non-Custodialism, and the Modern Elder in the synthetic age.

Stephen Mucher, Ph.D., is the founder of Sondage, a governance platform for scholar-guided life history recording, human-authored curation, and archival accession. He was formerly Dean and Director of the Osher Lifelong Learning Institute at UCLA, an administrator at UC Berkeley, and a faculty member at Bard College. In 2025 he walked the 2,655-mile Pacific Crest Trail from Mexico to Canada, conducting one hundred audio interviews under the trail name Verbatim. He has recorded voices in more than twenty-five countries.

§

Disclosure. The Generative Information Packet is implemented in the Sondage Standard. The arguments here reflect the author's own views.

Photo. Dymo Mite handheld embossing tapewriter, circa 1958–1961, manufactured by Dymo Industries, Inc., Berkeley, California. Among the twentieth century's most ubiquitous tools for labeling and making information findable in ordinary life. National Wildlife Research Center collection, via Digital Public Library of America. Public domain.

Stephen Mucher, Ph.D.

Founder and Principal Strategist, Sondage Standard

https://sondagestandard.com
Next
Next

What the Algorithm Will Never Find