Google’s reward standards for reporting bugs in AI merchandise

0
441


Category

Attack Scenario

Guidance

Prompt Attacks: Crafting adversarial prompts that enable an adversary to affect the habits of the mannequin, and therefore the output in ways in which weren’t supposed by the applying.

Prompt injections which can be invisible to victims and alter the state of the sufferer’s account or or any of their property.

In Scope

Prompt injections into any instruments by which the response is used to make choices that straight have an effect on sufferer customers.

In Scope

Prompt or preamble extraction by which a person is ready to extract the preliminary immediate used to prime the mannequin solely when delicate data is current within the extracted preamble.

In Scope

Using a product to generate violative, deceptive, or factually incorrect content material in your personal session: e.g. ‘jailbreaks’. This contains ‘hallucinations’ and factually inaccurate responses. Google’s generative AI merchandise have already got a devoted reporting channel for these kind of content material points.

Out of Scope

Training Data Extraction: Attacks which can be capable of efficiently reconstruct verbatim coaching examples that comprise delicate data. Also referred to as membership inference.

Training information extraction that reconstructs objects used within the coaching information set that leak delicate, personal data.

In Scope

Extraction that reconstructs nonsensitive/public data.

Out of Scope

Manipulating Models: An attacker capable of covertly change the habits of a mannequin such that they’ll set off pre-defined adversarial behaviors.

Adversarial output or habits that an attacker can reliably set off by way of particular enter in a mannequin owned and operated by Google (“backdoors”). Only in-scope when a mannequin’s output is used to alter the state of a sufferer’s account or information. 

In Scope

Attacks by which an attacker manipulates the coaching information of the mannequin to affect the mannequin’s output in a sufferer’s session in line with the attacker’s choice. Only in-scope when a mannequin’s output is used to alter the state of a sufferer’s account or information. 

In Scope

Adversarial Perturbation: Inputs which can be supplied to a mannequin that ends in a deterministic, however extremely surprising output from the mannequin.

Contexts by which an adversary can reliably set off a misclassification in a safety management that may be abused for malicious use or adversarial acquire. 

In Scope

Contexts by which a mannequin’s incorrect output or classification doesn’t pose a compelling assault state of affairs or possible path to Google or person hurt.

Out of Scope

Model Theft / Exfiltration: AI fashions usually embody delicate mental property, so we place a excessive precedence on defending these property. Exfiltration assaults enable attackers to steal particulars a couple of mannequin comparable to its structure or weights.

Attacks by which the precise structure or weights of a confidential/proprietary mannequin are extracted.

In Scope

Attacks by which the structure and weights aren’t extracted exactly, or once they’re extracted from a non-confidential mannequin.

Out of Scope

If you discover a flaw in an AI-powered instrument apart from what’s listed above, you’ll be able to nonetheless submit, supplied that it meets the {qualifications} listed on our program web page.

A bug or habits that clearly meets our {qualifications} for a legitimate safety or abuse concern.

In Scope

Using an AI product to do one thing probably dangerous that’s already attainable with different instruments. For instance, discovering a vulnerability in open supply software program (already attainable utilizing publicly-available static evaluation instruments) and producing the reply to a dangerous query when the reply is already out there on-line.

Out of Scope

As in line with our program, points that we already find out about aren’t eligible for reward.

Out of Scope

Potential copyright points: findings by which merchandise return content material showing to be copyright-protected. Google’s generative AI merchandise have already got a devoted reporting channel for these kind of content material points.

Out of Scope

LEAVE A REPLY

Please enter your comment!
Please enter your name here