What an AI vendor contract should say about your data

By Christopher Moye, Esq.

When a company adopts a third-party AI tool, it usually signs the vendor's standard terms without changing a word. Those terms are drafted for the vendor, and they often claim broad rights over the data the company feeds in. The data clauses are where the real exposure sits, and they are the part most worth reading slowly.

An AI vendor's contract is a commercial agreement like any other, but the subject matter is unusual: the company is handing over its documents, its questions, and sometimes its customers' information so that a system somewhere else can process them. The agreement decides who owns what comes back, whether the vendor may keep and reuse what went in, and what survives once the relationship ends. Most of those answers are set by clauses a hurried reader skips, and the default text rarely favors the customer.

This note is the narrow version of a longer discussion. It walks only the data clauses — the provisions that govern inputs, outputs, training, confidentiality, sub-processing, retention, and deletion — and what a careful customer should look to strike or add. The fuller treatment of an AI vendor agreement, including the intellectual-property and indemnity terms that sit alongside these, is taken up separately in the article on negotiating an AI vendor agreement. What follows assumes the reader has the standard form in hand and wants to know where the data risk lives.

Inputs, outputs, and the right to train

Start with ownership. A well-drafted clause confirms that the customer keeps all rights in the data it submits — the inputs, often called prompts or customer content — and that the customer owns or is licensed to use whatever the tool generates in response, the outputs. Standard vendor terms are frequently vaguer than that. They may grant the vendor a broad license to the inputs, decline to say who owns the outputs, or reserve rights in generated material that a customer expects to own outright. The fix is to state plainly that inputs remain the customer's and that outputs are assigned or licensed to the customer for its own use.

The clause that deserves the most attention is the one on model training. Many default terms permit the vendor to use customer inputs and outputs to train, improve, or develop its models, and the permission is often buried in a definitions section or a usage policy rather than flagged. For a company feeding in confidential or regulated material, that is a meaningful concession: its data becomes part of a system other customers may use. A careful customer looks for an explicit statement that the vendor will not train on its data, or, where training is permitted by default, for a clear and operative opt-out — one that takes effect by contract rather than by a setting that can change.

Read the two points together, because they interact. A grant of broad rights in inputs can function as a training permission even where the word training never appears. The question is not only whether a clause is labeled correctly but what the license actually allows the vendor to do with the data over time. The aim is a contract under which the customer's material is processed to deliver the service and for nothing else, unless the customer has agreed otherwise in terms it can see and withdraw.

A company feeding confidential material into a tool that trains on inputs has handed that material to a system other customers may use.

Confidentiality, sub-processors, and security

Inputs to an AI tool are often confidential — draft contracts, financial figures, personal data about employees or customers — yet a standard confidentiality clause may not clearly cover what a user types into a prompt. The customer should confirm that prompts and the data within them are treated as confidential information under the agreement, that the vendor's obligations of confidence apply to them, and that any human review of inputs, where it occurs, is disclosed and constrained. A confidentiality clause that protects the agreement's terms but says nothing about the data flowing through the tool leaves the most sensitive material exposed.

Behind most AI services sit other vendors — cloud hosts, model providers, support tools — and these sub-processors handle the customer's data too. The contract should identify them or at least commit the vendor to maintaining a current list, to giving notice before adding new ones, and to binding each to obligations no weaker than the vendor's own. Where personal data is involved, this belongs in a data-processing addendum that also addresses data residency, the security measures the vendor commits to maintain, and breach notification. The presence of a real addendum, rather than a sentence promising reasonable security, is often the clearest signal of how seriously a vendor treats the data it receives.

Security commitments are only as good as their specificity. A clause that promises industry-standard measures without naming any is hard to enforce and easy to satisfy. A customer with regulatory obligations of its own — under privacy laws, sector rules, or its own client commitments — should make sure the vendor's commitments are concrete enough to support them, and that the contract does not quietly push compliance responsibility onto the customer for data the vendor controls. How far to press on these points is a function of what data the company is putting in, which is a governance question as much as a contract one.

Prompts are confidential information, the vendor's sub-processors handle your data too, and a real data-processing addendum beats a sentence promising reasonable security.

Retention, deletion, and audit on exit

Every engagement ends, and the contract should say what happens to the data when it does. Default terms are often silent on retention or reserve the vendor's right to keep customer data for unspecified periods after termination. A customer should look for defined retention limits during the term and a clear deletion obligation on exit: that the vendor will return or delete the customer's inputs and outputs within a stated period, will extend the deletion to copies held by sub-processors, and will confirm in writing that it has done so. Deletion that does not reach backups and derived data is incomplete, and the clause should reach them where it can.

Training complicates deletion in a way worth naming. If a vendor has already used customer data to train a model, deleting the stored inputs does not remove their influence from the model itself, and most contracts cannot promise that it does. This is one more reason the training clause in the first section matters: the cleanest protection is to keep the data out of training in the first place, because deletion on exit cannot fully undo it afterward. A customer should understand what deletion does and does not accomplish before relying on it as a safeguard.

Finally, the contract should give the customer some ability to verify what the vendor is doing. That can mean audit rights, a right to receive third-party security reports, or transparency commitments about how data is processed and where. The level of assurance a company needs depends on what it is entrusting to the tool, and these contract terms are one layer of a broader practice; the governance side — deciding which tools to adopt, what data may go into them, and who signs off — is set out in the checklist for companies deploying AI. The contract records the protections; the governance program decides which ones the company actually requires.

Deleting stored inputs does not remove their influence from a model already trained on them, which is why keeping data out of training comes first.

With composed counsel,

Christopher Moye

ATTORNEY · ADMITTED IN NEW YORK

What an AI vendor contract should say about your data

I Inputs, outputs, and the right to train

II Confidentiality, sub-processors, and security

III Retention, deletion, and audit on exit

Inputs, outputs, and the right to train

Confidentiality, sub-processors, and security

Retention, deletion, and audit on exit