Open
Convert "First Party Sets" to "GDPR Validated Sets" #86

Issue Opened
Jwrosewell Opened Issue On Mar 15th 2022, 6:15

Jwrosewell

The name has been changed from First Party Sets to GDPR Validated Sets.

In summary this PR revises the proposal to align to GDPR, introduce user consent, introduce two flexible methods to validate ownership, and provide evidence to sanction bad actors should harm occur.

  1. Both Extended Validation (EV) SSL certificates, and notaries verify ownership of registerable domain to legal entities removing the need for centralized administration of ownership.
  2. Aligns to GDPR to incorporate both joint and sole controllers of data.
  3. Provides users the option to consent to common use policies applicable across domains in a set rather than on a per domain basis reducing friction for parties to a Set that wish to do so.
  4. Enables user agents to verify user’s choices are being respected and record evidence for use should harm occur.

This PR expands the Site Groups concepts first proposed by Paul Bannister in issue #22 and this proposer in #23.

Changes in this PR include.

  1. Define a Set as a group of data controllers and processors that share a common use policy. Set is now a capitalised term.
  2. Align the proposal to GDPR by removing the concept of “First Party” and “Third Party”. This change thus builds on the direction proposed by one of the group chairs in the note “Parties and browsers”.
  3. Each domain in the Set must confirm the common use policy and advertise conformance via well-known end points which are publicly inspectable.
  4. The relationship between domain and legal entity is established via notaries or EV SSL certificates (Validated Domains or VD).
  5. A valid Set is one where the same common use policy is advertised by VDs and that policy is accepted by the user.
  6. Guidance concerning what is or is not a good privacy notice, and reporting invalid Sets has been removed. These are already covered under GDPR and relevant regional guidance from Data Protection Authorities.
  7. Includes joint controller use cases.
  8. Moved the note about examples to the introduction.
  9. Domains can appear in multiple Sets.
  10. Removes sections that are no longer relevant.

Reviewers Note

Google employees will already be familiar with the 4th February 2022 agreement with the CMA, the training requirements in Annex 3, and the CMA note paragraph 4.119, “Google has committed to instruct its staff and agents not to make claims to other market players that contradict the commitments, and to provide training to its relevant staff and agents to ensure that they are aware of the requirements of the Final Commitments”.

Other reviewers are not bound by these commitments but might wish to familiarise themselves with the CMA’s role in approving changes to Google’s web browser. This PR and proposal are intended to steer the proposal and discussion towards a solution that the CMA will find acceptable whilst retaining all the features of the original First Party Sets proposal. Paragraph 21 of the commitments states “During the standstill period, the CMA may notify Google that competition law concerns remain such that the Purpose of the Commitments will not be achieved. Google will work with the CMA without delay to seek to resolve concerns raised and address comments made by the CMA with a view to achieving the Purpose of the Commitments. Google will inform the CMA of how it has responded to those comments”. First Party Sets are explicitly covered by the commitments. Spending time discussing or developing the original First Party Sets proposal will either a) fragment the web as Google will not be allowed by the CMA to implement the proposal; or b) merely waste effort and time delaying the realisation of the benefits envisaged.

As the proposer is not a party to the agreement between Google and the CMA, and the CMA are not represented in the Privacy CG, Google may wish to validate the direction of this proposal with the CMA and ICO prior to commencing review within Privacy CG.

For those wishing to better understand the commitments an explanation has been produced by Movement for an Open Web (MOW).

For those that have followed the discussion in PATCG concerning the role of consent in proposals MOW has similarly produced an explanation of consent in light of the commitments.

Explanatory Note

Google and the UK CMA agreed on 4th February 2022 to align exclusively worldwide Alphabet originated proposals and changes to GDPR. It is probable other W3C and Privacy CG members support this direction.

GDPR has no concept of first and third party or domain names. This PR commences the process of aligning this proposal to GDPR by establishing an upgraded privacy boundary for data sharing within the user agent based on GDPR. Proposers of First Party Sets already understand the need for such a change to the privacy boundary and that is not repeated in this PR note.

Legal Ownership

A prerequisite for such a solution is the need to establish the legal ownership of domains. The proposal now incorporates SSL EV certificates and the use of notaries to verify ownership.

Expeditious

Accepting these changes takes nothing away from the original First Party Sets proposal whilst expanding the participants to align with GDPR and thus addressing competition concerns associated with the original proposal.

Should the concepts be adopted then the complete retirement of unrestricted third-party cookies might be expedited.

Decentralized

These changes further avoid the need for a central authority to determine whether a Set is or is not valid, and remove ambiguity associated with reliance on brand trust or prior recognition. By identifying a common use policy for data shared within the Set users can either be prompted to accept the common use policy once for all validated members or provide approval for each member. There is no option to reject the common use policy in line with established practice for services that require sign in. i.e. one can not access the service without registering and signing in.

Consent Fatigue

Users will now have the option to avoid repeatably being asked to make choices for each website they interact with. The net results are an opportunity to improve the web experience for people without any centralized authority.

Innovation

By enabling controlled data sharing the broadest number of web participants can drive innovation and improvements to the web. Such innovation will not be restricted to web browser implementors.

Policy Wording

These changes do not provide guidance concerning the construction of common use policies. A common use policy is represented by a well-known end point. GDPR guidance is used.

Privacy Issues

The original version of the proposal contained privacy issues documented in the repository concerning the validity of a set and the potential for abuse by bad actors. These changes improve the privacy of the proposal by surfacing the common use policy associated with the Set. These changes do not attempt to fully resolve all privacy issues raised against the original proposal exclusively via engineering. The changed proposal achieves improved privacy for users by identifying the common use policies used by all parties and enabling existing law enforcement and sanctions to be applied. As such the proposer of this change requests reviewers consider the role of methods other than engineering alone are used to mitigate risk of harm on the web and how the proposed changes align to GDPR.

Comment
AramZS commented on 3 months ago

AramZS

Other reviewers are not bound by these commitments but might wish to familiarise themselves with the CMA’s role in approving changes to Google’s web browser. This PR and proposal are intended to steer the proposal and discussion towards a solution that the CMA will find acceptable whilst retaining all the features of the original First Party Sets proposal. Paragraph 21 of the commitments states “During the standstill period, the CMA may notify Google that competition law concerns remain such that the Purpose of the Commitments will not be achieved. Google will work with the CMA without delay to seek to resolve concerns raised and address comments made by the CMA with a view to achieving the Purpose of the Commitments. Google will inform the CMA of how it has responded to those comments”. First Party Sets are explicitly covered by the commitments. Spending time discussing or developing the original First Party Sets proposal will either a) fragment the web as Google will not be allowed by the CMA to implement the proposal; or b) merely waste effort and time delaying the realisation of the benefits envisaged.

@jwrosewell Has the CMA stated that First Party Sets is not acceptable to them? Have they notified Google in a public document about First Party Sets specifically? Has the CMA addressed First Party Sets in any of their comments? Has the CMA stated that they believe that First Party Sets will fragment the web and that they will not allow Google to implement it? I am very concerned with this claim that there are specific changes that need to occur because the CMA will require specific behavior or changes when I do not see that claim from the CMA.

Comment
AramZS commented on 3 months ago

AramZS

Guidance concerning what is or is not a good privacy notice, and reporting invalid Sets has been removed. These are already covered under GDPR and relevant regional guidance from Data Protection Authorities.

So judging by your changes and this statement are you proposing that this technology only be usable in states with GDPR?

Domains can appear in multiple Sets.

This would seem to entirely negate any theoretical privacy promise that First Party Sets could preserve? It's pretty core to the First Party Sets proposal as I understand it that sets be limited so that any given domain is only in one set. Without that guarantee the we would essentially create a situation where any site could add another site to their Set and unlock whatever privileges that browser theoretically would allow them. This would almost certainly have downstream effects of ad tech providers and other tracking mechanisms exerting pressure on sites to be added to their Set. The biggest companies could exert the largest amount of pressure in that case and therefor be the most likely to be adopted into a Set. Wouldn't that make this proposed change... more anti-competitive then? I only mention this question because you are specifically citing the CMA in making this change, but your analysis states a primary concern is:

Create unequal access to functionality associated with user tracking, and hence distort ad tech markets and those buying online ad inventory by restricting the functionality associated with user tracking for third parties while retaining this functionality for Google;

and a proposal which allows a party to be in multiple Sets would surely do that, as Google would come in with the largest leverage to have their domains included in Sets, right?

Comment
Jwrosewell commented on 3 months ago

Jwrosewell

Re: CMA. MOW have provided a summary of the commitments and how they relate to each Google PS proposal at a high level. I do not believe that First Party Sets (FPS) as presented will meet those commitments. However it is for the CMA to provide a decision. Rather than waste people's time discussing FPS I'm suggesting that Google ask the CMA explicitly before advancing the proposal. That will be quick and simple for them to do. I make the same request of Google in all Privacy Sandbox (PS) proposals.

To try and advance the concepts in the meantime I have proposed GDPR Validated Sets (GVS) which achieves the same benefits for sole data controllers (FPS owners) AND joint data controllers without requiring a central Independent Enforcement Entity (IEE). Should GVS advance then the need for many other PS proposals and associated complexity would likely reduce providing certainty to the digital market and people.

Re: GDPR. Once a solution that Google, the CMA and others agree would meet GDPR requirements then the solution could be adapted for other rules. Google have committed to achieving GDPR level privacy globally so at least three entities see GDPR as the right starting point.

Re: Domains can appear in multiple Sets. The information is held within the Set. A domain that was in multiple Sets could read and write from those multiple Sets. The use purpose under which they do so will be defined for each Set. If they break that purpose and process people's data in a way that they have not been authorized for then they can be sanctioned under existing laws.

Re: "exerting pressure on sites to be added to their Set" & "Google would come in with the largest leverage to have their domains included in Sets, right?". That is a market issue the choices GVS makes available to participants on the web might result in should Google (or any other dominant party) adopt such a strategy. It is not our role to address such market issues or such bad acts and actors. We do have to be mindful of them and ensure we do not create unintended market issues.

I'm glad you raise the impact on competition as it is a subject we need to openly debate at the W3C. Your example is exactly what is happening with the myriad of changes and proposals being advanced by user agent vendors that mandate others "do as they say" without any option to take a different route. It is exactly what is happening when data asymmetries exist in a market. I look to the CMA and others to intervene in such matters and not the W3C, IAB TL, IETF, or other standards setting bodies. We just need to be mindful to avoid hosting anti competitive proposals or "tilting the scales".

Ultimately the GVS amendment provides much more information to people, their agents, and society than the status quo without limiting competition.

Comment
Dmarti commented on 3 months ago

Dmarti

@jwrosewell You mention that your proposed changes to First Party Sets would not require an Independent Enforcement Entity (IEE) but it is still not clear from this PR how the IEE's responsibilities would be handled.

Changing FPS to assert that the IEE is not needed without explaining who would take on the tasks that are handed by the IEE in the original version makes this PR incomplete.

Is any web user supposed to be able to check a notarized document from any site? It seems like a lot of knowledge would be required to browse a site outside your own jurisdiction. (Personally, I just got a document notarized in California, but I would not know how to check for the validity of a notary seal from Nevada or Oregon.)

Comment
AramZS commented on 3 months ago

AramZS

Re: "exerting pressure on sites to be added to their Set" & "Google would come in with the largest leverage to have their domains included in Sets, right?". That is a market issue the choices GVS makes available to participants on the web might result in should Google (or any other dominant party) adopt such a strategy. It is not our role to address such market issues or such bad acts and actors. We do have to be mindful of them and ensure we do not create unintended market issues.

I would normally agree with you and I wouldn't have brought it up here but considering the metric by which you are identifying FPS as a proposal applicable for censure by the CMA and the CMA's goal is to deal with Google potentially being anti-competitive shouldn't your proposal deal with the ways it might make Google uniquely and specifically anti-competitive?

That's the whole reason we're having this discussion right? I'm not a lawyer so I am unclear on how the CMA applies its logic though you seem to be stating that you are so I would be interested in the alterations for this proposal being more specific in stating reasons why the proposal (which is being made at least somewhat on the basis that the CMA would find FPS anti-competitive right?) specifically is less anti-competitive? Because if you are claiming:

This PR and proposal are intended to steer the proposal and discussion towards a solution that the CMA will find acceptable

And the CMA has made no specific statements about this specific proposal (right?), then the only basis on which you are steering this proposal is specifically to make it less anti-competitive? I mean, if--as it looks to me--your changes would make it more anti-competitive then I would assume it would not accomplish your stated goal of "steer[ing] the proposal and discussion towards a solution that the CMA will find acceptable".

It is not our role to address such market issues or such bad acts and actors.

I generally agree to this in some respects (I think making standards that are not easily exploitable by bad actors is in our role) and so I'm not so interested in a general policy of speaking about how proposals are in regard to market competition, I think that's a different conversation entirely for a different place.

But since you have stated that this specific issue is what you are seeking specifically to address, I think your PR needs additional text, either here or in the body of the proposal, explaining how it specifically addresses that, which is unclear to me at this time.

Sorry for this longwinded response but wanted to make my position very clear here, but the TL;DR is:

If your PR is intended to accomplish this:

steer the proposal and discussion towards a solution that the CMA will find acceptable

And the basis on which they theoretically wouldn't find it acceptable is that it is anti-competitive (right?); then I do not think the PR has sufficiently met its self-stated goals.

Comment
AramZS commented on 3 months ago

AramZS

Re: GDPR. Once a solution that Google, the CMA and others agree would meet GDPR requirements then the solution could be adapted for other rules. Google have committed to achieving GDPR level privacy globally so at least three entities see GDPR as the right starting point.

I don't disagree with that but then why remove the notice process? Surely a notice process is in line with GDPR and without it we are limiting the applicability of this proposal because we are acknowledging that

  1. GDPR requires a notice process
  2. A notice process is needed for a Sets proposal
  3. Some level of enforcement process is required

Those are big issues and limiting our ability to deal with them to delegating those processes on to GDPR would mean that the solution is unfit to be rolled out anywhere else, right?

Re: Domains can appear in multiple Sets. The information is held within the Set. A domain that was in multiple Sets could read and write from those multiple Sets. The use purpose under which they do so will be defined for each Set. If they break that purpose and process people's data in a way that they have not been authorized for then they can be sanctioned under existing laws.

This seems unlikely to work as a solution. As we've already seen with numerous reported cases of consent fraud parties regularly do not comply with their contractual process for user data and the nature of the blackbox system in which these actions occur makes it very very difficult to catch them in the act.

Comment
Jwrosewell commented on 3 months ago

Jwrosewell

Re: how the IEE's responsibilities would be handled?

The proposed changes would provide people or their agents information about ownership and agreed use purposes if they wished to inspect them in a consistent manner. The role of the IEE in identifying and eliminating bad acts and actors would be fulfilled by the community of users and their agents. The proposal is therefore decentralized and has a low cost to operate and police.

Re: What are the issues with original FPS? How does this address the CMA concern?

FPS favours large sole data controllers that can operate many services themselves. CMA want a level playing field based on GDPR. Google agreed to this. GDPR includes the concept of joint data controllers. GVS expands FPS to support joint data controllers so that smaller organisations that must band together to compete with larger organisations can achieve the same benefits. GVS also provides people more control by enabling them to consent to use purposes once rather than individual B2B organisations. In the next iteration of the document I’ll make this clearer in the body as @AramZS suggests. Thanks.

Re: And the CMA has made no specific statements about this specific proposal (right?)

No. First Party Sets are covered. See para 32 here.

Re: why remove the notice process?

This proposal enables consent to a use purpose (a contract that can be adhered to by multiple parties) which forms the notice under GDPR. It does not remove the notice. It is true that the proposal takes a different approach to notice and consent and in doing so addresses the criticisms of current models for consent. Just like Open Source licences industry will likely gravitate towards a small number of use purpose notices that will become best practice for specific purposes.

As we've already seen with numerous reported cases of consent fraud parties regularly do not comply with their contractual process for user data and the nature of the blackbox system in which these actions occur makes it very very difficult to catch them in the act.

Whilst there is free will in the world then there will always be bad actors and bad acts. GVS adopts an approach to addressing the potential bad acts by improving information to enable existing law enforcement mechanisms to sanction bad actors when they occur. It does so without requiring a central IEE and all the problems that such a central operation entails.

Comment
Dmarti commented on 3 months ago

Dmarti

@jwrosewell Thank you for the answer. So I'm not clear where "validated" comes in in the proposal title. Sites can make claims about compliance with a particular law. It doesn't seem like the browser would need to behave differently just because a site, or group of sites, had made some unverified claim.

(The original First-Party Sets is a trade: the user gives sites permission to use additional browser functionality in exchange for the enforcement services of the IEE. It can be a good deal for the user if the impact of the extra browser functionality is balanced with the value of the protection that the IEE provides.)

Comment
Jwrosewell commented on 3 months ago

Jwrosewell

@dmarti Validated refers to the ownership of the domains either via a) the Extended Validation SSL certificate process; or b) notaries. Once there is a method of establishing ownership proof AND consented use purpose agreements then a validated set can be formed without requiring a centralized enforced authority (aka Independent Enforcement Entity). Thus browsers would enable data sharing among the domains that formed the set and be able to present proof to users concerning that data sharing.

The major difference in this regard is the removal of the single IEE to create a proposal that is truly decentralized and utilizes existing established processes for validating ownership and transparency concerning data use.

Comment
Dmarti commented on 3 months ago

Dmarti

@jwrosewell Who is actually validating, though? I understand that this proposal allows for domains to make claims about privacy policies and other issues, but from the user point of view they might be claims about business relationships I don't understand in a language I can't read. (I can recognize a few kinds of notarized legal documents in the jurisdiction I live in, but if a document comes from some places I couldn't tell if it's a GDPR policy, the web developer's high school diploma, or a Chuck E. Cheese gift certificate) Unless every user becomes an expert in every kind of notarized document everywhere, someone has to validate them.

Comment
Eligrey commented on 3 months ago

Eligrey

EV certificate metadata may not be sufficient to distinguish similarly-named-but-different entities. In order to take advantage of this proposal, EV certificates would likely be required to include additional data.

See https://arstechnica.com/information-technology/2017/12/nope-this-isnt-the-https-validated-stripe-website-you-think-it-is/ for more information on this threat model.

Comment
Jwrosewell commented on 3 months ago

Jwrosewell

@eligrey Thank you. I linked to the criticisms of EV in the PR as well as the process for EV. As a minimum it should help the group understand how complex solving data processor / controller identity is. If the EV process didn't get it 100% right after all this time then it is probably unrealistic that a new IEE is going to get it 100% right and in any case is duplicating a lot of work that has already been done and in place. My intention is to enable a path forward which recognizes nothing is 100% perfect but a lot better than what we have today. In parallel improvements to EV could be made.

@dmarti As we have no validation of domain ownership today beyond branding and the SSL certificate providing notarized proof is an improvement. It is not perfect, just like EV, or indeed an IEE. Like may aspects of the web (fake news, phishing scams, etc) we can work as a community to identify and eliminate bad actors. By using only documents produced by registered notaries the user can be certain a qualified professional verified the ownership. Identifying registered notaries is a simpler problem than identifying legal owners of all web sites. If cryptographic proof of the notaries involvement is desirable then that might be a modification to add in due course. However the first thing is to establish that all solutions are imperfect and determine the concepts that can be taken forward. I require decentralization (no single IEE) in any solution, and don't want to "reinvent the wheel" when there are at least two solutions that are acceptable in other fields.

Comment
Brownwolf1355 commented on 1 month ago

Brownwolf1355

EV certificate metadata may not be sufficient to distinguish similarly-named-but-different entities. In order to take advantage of this proposal, EV certificates would likely be required to include additional data.

See https://arstechnica.com/information-technology/2017/12/nope-this-isnt-the-https-validated-stripe-website-you-think-it-is/ for more information on this threat model.

I agree with @eligrey on this point. Is the proposal that a single EV certificate would be used across all sites in GDPR Validated Set? If so, this seems to widen the threat model rather than narrow it.

From my experience in operating one of the largest PKIs (the one for cable modems), one of the greatest challenges in operating any PKI is the governance of it. Who decides the process for validating the participants (deciding who gets to participate and who doesn't) and determining when certificates need to be revoked (e.g., compromised certificates or certificate authorities or entities falling out of compliance with the processes and procedures). Without proper governance certificates might as well just be self-signed. With cable modems, while the scope was much more clearly defined, it was easier, but still quite a challenge. In the context of the web, much, much more challenging.

About Repository

Star: 132
Fork: 24
Watchers: 132
Open Issues: 53
https://github.com/WICG/first-party-sets
Last updated: 28 Jun 2022