- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 810
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Permanently delete all data from SponsorLink's database that has been collected during builds that included Moq (notably any version 4.20.*) #1395
Comments
GDPR aside, I don't think it's even legal in Argentina. |
You wouldn't have to store the email addresses would you? I could just store the hash values and compare the hashes to authenticate? It's not perfect but it's probably good enough. A hash is not identifiable information. Regarding GDPR , the GDPR considers data anonymized if there is no “reasonably likely” means to re-identify the data subject. So a list of hashes isn't going to be used to identify persons generally. It would literally be easier to scrape github and other sites for public information on people. What exactly is the concern here with your email being hashed? If dev's on github stop hashing my email, am I going to be protected from the thousands of spam and scam emails I get every year? Is my credit card going to be protected from hackers? Many companies and developers, at least 500 or more already have my email. Is this the one to break the camel's back? I already disregard email as a safe means of communication. I certainly don't trust anything I receive via email off the bat. |
@Gavin-Williams, SponsorLink already doesn't store email addresses, but hashes derived from them. The problem here is that hashing doesn't sufficiently anonymize the email addresses, because it turns out that the hashing can be reversed in this case: email addresses are often publicly known, so if you have a list of email addresses, you can just hash them the same way SponsorLink hashes them, and build a lookup table; if you then get hold of the SponsorLink database and the hashes stored therein, you can map them back to email addresses. See devlooped/SponsorLink#31. |
That won't protect them, it would take 2 seconds to get a bunch of github commit emails, and compare hashes what stakx said |
Ironically, making it open-source means you know the hashing algorithm. Or you don't and the irony is that a non-OSS project is used for sponsoring OSS projects. Catch-22. |
If you choose a sufficient algorithm, the knowledge of which one you chose doesn't matter. |
Hashing is not magic, it's a one way function. How do I figure out who owns "1234" you may ask? Aha, but I can add a machine specific thing to the hash, to add an unknown element to the hash, so it can't be reversed you might say. That also won't work, because you see, when you make a commit, it has a time stamp. You can use the timestamp to figure out who owns it, because building generally happens right after, or before the commit. You can also find projects that specifically use SponsorLink (directly, or indirectly), so you can narrow down the targets a lot |
Thank you @kzu for following up on this! I'm relieved that this has been taken care of. |
So, let's say I'm a sponsor of moq, someone knows the hashing algorithm. They have my email and they hash it. Then they've gotten their hands on the moq sponsors hash list. They work out that I'm a sponsor of moq this way because they see that the has of email is on the moq sponsors list. They could just look at my profile page to see I'm a sponsor of moq. For all that effort they have gained absolutely nothing. I still don't see the point of the outrage. Is this just a European thing? Or a pickle for all anonymity activists. I should note that I'm opposed to anonymity and push for compulsory identification of communications online. Which might explain why I just can't understand the outrage. There must be some specific issue I can't see though, to explain all the twisted knickers. |
@Gavin-Williams, just FYI, your hypothetical scenario is no longer relevant. It's my understanding that the "hash list" has just been deleted (see devlooped/SponsorLink#49, referenced above). kzu also stated a while ago that email hashing is gone and won't happen again (https://github.com/moq/moq/issues/1374#issuecomment-1671240325), and that upcoming versions of SponsorLink won't process any PII (https://github.com/moq/moq/issues/1374#issuecomment-1671866096). (And yes, data privacy law and "anonymity by default" is a pretty hot topic in Europe. I'm not knee-deep into what exactly is going on everywhere, but it's my impression that while GDPR and related legislations aren't brand new anymore, the dust hasn't settled yet and people are still figuring out how to fully conform to it. So when faced with uncertainty, it's in one's own best interest to err on the side of caution, since the penalties for breaking those laws can be quite substantial.) |
@stakx Also noteworthy is that, while I don't assume it to not have been the case here, it is actually impossible to prove that the data was actually deleted after it has been collected (e.g. without it having been copied or distributed). Again, I'm not saying this is the case here, but just to put in context that there is nothing anyone can do to validate that the data was actually deleted, other than trusting the author. So while we're trusting him about deleting the data and we're not talking about intentional malice, for future actions it's important to note that their effects might never be able to be truly undone to any extent or at least that it's not possible to prove that their effect was undone and to which extent. (And, per my understanding of GDPR, any personally identifiable information falls under its jurisdiction. Since a hash is a deterministic function, being provided with the same email address yields the same output. Therefore, the hashed output is inextricably tied to a person and it is therefore itself personally identifiable information and subject to GDPR. Having said that, the email itself, while PII under GDPR, is not in itself sensitive enough to be likely to lead to actual real-world harm, in my opinion (although I could think of some fringe scenarios), so it's more a matter of principle and law than actual risk in this particular case.) |
@apacurariu, not being able to personally verify whether a service provider really deleted your data like they told you isn't new... so I am not especially worried about that theoretical possibility only now, in this particular instance. On the contrary, kzu has been very transparent about the deletion process and documented it in detail, why would he have fabricated all of that. (People don't seem to notice, but IIRC he has actually followed up on all demands except for a single one.) I for one see no reason to doubt his honesty here, and I probably couldn't have asked for a better outcome for this request. |
@stakx I didn't imply that he didn't delete the data. On the contrary, I said I trust that he did but this is only based on trust. However, while I stand behind the theoretical and practical impossibility of demonstrating data deletion, not just in one's inability to personally verify this, I don't want to deviate the thread further especially since this is a purely theoretical consideration at this point. |
@apacurariu, I understand, and your point is well taken. Also, don't worry too much about deviating the thread, the issue is resolved and closed anyway, and I'm probably going to step away from this whole issue tracker for a while. The whole situation is rather frustrating and I really need to take a break. |
@stakx I admit, I also need to take a step back from this. I'll try to find something constructive to add or contribute. |
Not to mention many hash algorithms can be defeated with rainbow tables. |
Using closed source with open source is not ironic. There is a group of people who think open-source is an ideology, and must be everywhere and can't be mixed. But many people see open source as simply a tool, particularly for software that is under-developed, and probably under-resourced. So that it's behavior can be understood. And fixes & features can be provided by users. But mixing open source and closed source isn't an issue at all. |
Sometimes there is just no option to not to mix them either, sometimes a single person is working on the open source part and might need help so they ask a company that if they want to use the open source code in their projects and need additional functionality that they should program in the extensions or plugins for it. After all sometimes even single developers have limitations that they can do. Example:
|
In the interest of all those devs who had their personally identifiable information (PII) processed and sent off to SponsorLink's data storage (regardless of whether it was hashed/anonymized or not) because they ran a build including Moq, I would like to request that all of this data be permanently purged from SponsorLink's and all other storage systems, without making any further usage of it. Those users had no way to opt-in (nor opt-out) of that process, so the Moq and SponsorLink projects should err on the side of caution and assume that the affected users did not give their consent. Also, IANAL but the data exfiltration may well have been in violation of certain data privacy laws (GDPR being the most prominent), so again, Moq and SponsorLink should play it safe and assume that they did in fact violate law. The damage cannot be fully undone, but the deletion of all collected data would be a demonstration of goodwill that the two projects want to conform to the law and respect users' wishes regarding the processing of their PII. Let's try to regain at least some of the trust that has been lost.
(Btw. if you've already deleted the data in question, and I've simply missed your notification about it buried deep within the other SponsorLink issues, then my apologies... in that case, could you please just restate that the data has been deleted? Thanks!)
The text was updated successfully, but these errors were encountered: