DOJ Readying Google Antitrust Investigation

A quick glance at the first page shows a lot of enforcement against Fortune 50 US companies, Caterpillar, UPS, Visa and Mastercard. Athough randomly clicking on few it like a lot time the commission took no action. I didn’t care to go through all 655 , but in the first 50 examples I did not see big European companies getting fined.

I’m looking for is billion dollar fines levied against European companies.
I’m not saying they don’t exist, but I couldn’t find them.

I’d like to read more about this trivial de-anonymization that you speak of.
(in reality, I in fact do run U-block origin and periodically just pay Tom to make up for the ads, so I’m not real worried about that, but I’m curious to read about this)

Dude… who cares?

Seriously, giving them information from a captcha entry? Who gives even one shit about this? What kind of privacy invasion is that?

That’s not “my data”.

It’s extremely trivial. All it takes is you visiting any website that has a tracker with fingerprinting enabled (see https://amiunique.org/), and any site with a google analytics type of system that has a session or user id in it. Now I can correlate your user information with all the fingerprinting information, and if you log in on another machine/browser with slightly changed fingerprints I now have multiple sets of fingerprints I can use to identify you on any other site I see the same fingerprinting on. If that session id is ever logged into google, facebook, or any site that gives personal information to the tracking system (because in some cases you are intentionally logging into the tracking system) then viola, I can now track almost every move you make across the internet.

The real profit in tracking right now is finding ways to correlate online info with offline info. Google and Facebook have put in a lot of money into this association and from all indications it’s working.

Um you should care because it’s not JUST captcha informatino you are giving them. You are giving them information about who you are (see above), what your interests are (by the overall website you are visiting AND what exact pages you are visiting), what purchases you are making (because once you are fingerprinted via the captcha your info is easily tracked), what computers you use, what cities you visit (via ip geolocation) etc…

Captcha is just a mechanism to force you to provide the tracking and cookie information to them, it’s a conduit that’s not blocked by ad blockers (cause otherwise a lot of websites wouldn’t be usable).

I use Google services because I recognize I have no way to opt out of all of this tracking, so not using their services is pointless. But again, the amount of information these companies have been able collect about you is enormous, but no amount of regulation is going to change it. It’s a lost cause unfortunately.

Oh forgot to include links. A great example is Netflix used to do a contest where people could try to get better recommendations for people using anonymized data set. They got sued because researchers were able to de-anonymize the data.

Edit: Btw if you read the paper, Netflix only released 1/10th of the data they have on people and de-anonymization could still be done.

Another example is Identification of real people via anonymized aol search logs

Edit 2: The moral of the story is there is a ton of ways to figure out not only who you are but a lot of personal information about you from a lot of seemingly trivial pieces of information, such as movie ratings, search queries, or what QT3 threads you actively read and participate in. Multiply that by 1000 for what most data brokers actually have on you

Ah… So here, you require that someone use one of Google’s products, in order to correlate your browsing data with other aspects of your data.

And this came a few years ago, with a change to Google’s privacy policy. Although, even then, you were able to just turn off that by checking a box in your activity controls.

Now, maybe there’s more to it that I’m missing. That’s why I was hoping you’d provide some more in depth article about it. But if you’re talking about associating your doubleclick info with your account, that’s a thing that they specifically don’t do if you don’t want them to.

No no no… that’s all different stuff. You were complaining specifically about crap like capcha, because it was… feeding their machine learning systems? It was a silly complaint.

In terms of my information that I actually provide to them, No, I control that, and do so pretty easily.

Now, if they’re lying and they are doing that correlation anyway, then that’s potentially a problem.

Eh, this was a pretty weak example. The researchers were basically just able to compare patterns to those made by people who DIDN’T anonymize themselves… they basically made the same reviews on other sites, using their real name.

This is likely dramatically easier than anything you could do with anonymized web traffic data. Maybe not, but when you look at the actual paper, you see that their research is specifically focused on datasets which have high dimensionality, and sparsity. So, lots of statistical features, and statistical rarity in the patterns of users. I’m not sure that you’d necessarily have the same sparsity, but maybe. Further, you need to have a non-anonymous dataset to link to. In Netflix’s case, they were only able to de-anonymize it because they had a non-anonymous dataset with essentially the exact same data in it… a non-anonymous IMDB movie rating database.

Also, ultimately… you don’t actually know how successful they were. And they say this… because they can’t ACTUALLY de-anonymize the data. Ultimately, what they found was that there were 2 users in the entire Netflix dataset, which showed a statistically significant similarity to ratings provided by IMDB users.

In terms of the AOL searches, ya, I remember when that came out. Some of those search histories were amusing. And ultimately, sure, if someone’s tracking stuff you do they probably shouldn’t release that info. Certainly google isn’t the only one doing it, as your traffic is going through any number of other folks who can track it.

Ultimately, if you’re going on the internet, folks are able to see what you do, to varying degrees. This is not remotely limited to Google. Indeed, there are a number of other folks, like your ISP, who can potentially know a lot more. And there are ways that you can choose to hide such things if you want to.

But really, the thing is, the internet isn’t private. It’s not your house. I think maybe folks need to keep that in mind. I don’t think it’s possible that it’s going to become such.

Uh I’m not sure you understand how a lot of these ad tracking systems work, nor how analytical systems work. The latter specifically is tailored to getting an understanding of exactly how each individual user is interacting with your website, which is needed to provide heat maps and other functionality they provide. Therefore they can clearly see you typing a specific email address or username into a login form, or see the username in a specific html tag that’s pretty common on wordpress or other websites. It has literal full access to pull your account information and tie it to your fingerprint without you (as a viewer of a website) using google services at all.

They have full access to the exact html of the websites you are visiting and all your actions, and that’s intentional (otherwise the analytical aspects would be worthless).

The machine learning was tangental, but still it was a brilliant move by Google to convince websites to have people help train their computer vision and ML systems for free.

That’s my point! Unless you have maintained tight control over all information you have ever leaked out you will leak out personal information, and these tracking systems are so invasive that they can then be used to correlate other data sets to known data sets (both their own, bought, or data sets that have been leaked through hacks and sold on the black market). The whole point of the research paper is it does not take much information to de-anonymize your information once any source of non-anonymized information becomes present. Once that de-anonymization occurs they have free reign to really dig down into personal insights into you.

Like you said at the end:

This is completely contradictory to what you just wrote. Since people are able to see what you do to varying degrees that means someone somewhere will capture your RL information, and once that RL information is out there it can then be correlated with “anonymized” data to de-anonymize it all from you.

I’d actually say it’s impossible to choose to hide anything. You’d have to never log into any service, use a VPN, use ad-blocking, and constantly perform actions which would cause your fingerprint to always be changing in such a way to prevent correlations (which is pretty much impossible).

To tie up these arguments so we don’t go in circles for ever:

  1. There is no way to opt out of data tracking. It is happening even if you try to use a vpn and ad-blocking tools, and even if you try to refuse to use that company’s services. They make it harder but these companies (including Google) have come up with ingenious ways to still gain insights into you, and the vast majority of internet users don’t use any type of mitigations. Furthermore, Google is actively working to lessen ad-blocking capabilities in chrome (I’m not sure if the changes are Chromium based or only Chrome based) which lowers the mitigation capabilities the vast majority of people do.

  2. You are not anonymous. Many “analytical” systems track not just that you visited a page, but how you interacted with the page and what the html of the page that’s rendered is. I know this for a fact based on interactions with Google Analytics (heatmaps + renderings) as well as we used logrocket.io at my startup (to make it easier to find bugs in our Angular apps) which recorded every interaction, dev console message, etc… that the user typed in. This means it’s trivial for systems to look for your email, real name, username, etc… in the HTML markup with high accuracy, then use fingerprinting to correlate that to other tracking records you have seen.

  3. These differing points of information can be used to gain real personal insights into you, especially since people have no qualms about browsing the internet about deeply personal secrets not realizing the consequences. Even with tracking disabled this data is still kept, and can still be correlated to non-anonymized information to de-anonymize your information.

  4. There is no technical nor governmental solution to this problem. GDPR is not enough, ad-blocking is not enough, etc…

  5. Google and all these other companies make their money from streamlining these data collection and aggregation systems. They are heavily incentivized to find new ways to learn more about you online and offline so they can better target you for advertising (I just saw an article earlier that targetted ads average 2.6x the price of untargetted ads). A lot of services they have pushed forward with marketing as “the good for the internet” (such as recaptcha, I am not a robot, chrome) have heavily been skewed towards helping their bottom line. As long as that incentive is aligned in such a way they will always prioritize gathering data on you and learning about you in any way possible than for your own benefit. Your benefit from their services is just a bonus.

These are the largest cartel fines. They’re almost all EU companies, and none are American.

Year Undertaking Case Amount in €*
2016 Daimler Trucks 1 008 766 000
++2017++ Scania Trucks 880 523 000
2016 DAF Trucks 752 679 000
2008 Saint Gobain Carglass 715 000 000
2012 Philips TV and computer monitor tubes 705 296 000 of which 391 940 000 jointly and severally with LG Electronics
2012 LG Electronics TV and computer monitor tubes 687 537 000 of which 391 940 000 jointly and severally with Philips
2016 Volvo/Renault Trucks Trucks 670 448 000 2016
Iveco Trucks 494 606 000
2013 Deutsche Bank Euro interest rate derivatives (EIRD)
465 861 000
2001 F. Hoffmann-La Roche Vitamins 462 000 000

Monopoly fines will be different, but that’s in large part because the EU doesn’t really have any giant monopolistic tech companies. Even so, there are several EU companies with abuse of dominant position fines in the hundreds of millions. For instance:
http://europa.eu/rapid/press-release_IP-14-799_en.htm

I guess the point my mind is that the stuff you are talking about at this point isn’t unique to Google. Hell, the one major tracking issue you are talking about is from a company that wasn’t part of Google until a few years ago.

Ultimately, what Google themselves are doing, is anonymously tracking data. Unless you allow them to, they aren’t linking your browsing data to your actual accounts. Systems know that there is “a user” that does certain things, but not that it’s necessarily you.

Now, certainly, as in the case of AOL’s example, it turns out that sometimes you might do things that give you away as an individual. But I’m not sure there’s anything anyone can do against Google to prevent any of that stuff. On some level, we want systems to track us to some extent, because it makes the systems work better for us. It makes the systems and to adapt to us.

Hell, even in the direct case that we are talking about, directing ads, it benefits me to see ads that I’m actually interested in. Even that is beneficial to me as a consumer.

I think that perhaps the more important issue, since this stuff is going be tracked by someone anyway, is that the people who track things protect that data. And in this regard, Google has a pretty good track history compared to… Pretty much everyone else. I think the only significant data breech they ever had was related to their Google+ network, and it was fairly minor.

Not fairly minor. They shut down Google+ early because it was so serious.

It was fairly minor because virtually no one actually had any significant data on Google+.

For instance, you had a Google+ profile. What was on it? Probably nothing at all.

Also, it’s worth noting that the case you are talking about didn’t actually involve a leak. It involved a POTENTIAL leak, due to a temporary security flaw. But there’s no evidence that anyone actually exploited it during that time.

I guess part of my perspective here is that numerous other organizations, including the government’s own OPM, and Equifax, have both leaked far more information about me.

So we have the EU doing antitrust as non-tariff trade barriers, and the DOJ doing antitrust as a way to attack companies that Trump doesn’t like.

EU opens antitrust investigation into Broadcom

EU takes on Big Kawaii.

Nooooo not hello kitty :-(

This is some seriously shady shit.

Looping in legal counsel for guidance on potentially sensitive matters is pretty standard across all tech.

I think the accusation is that they were cc’ing legal on all sorts of discussions that didn’t need legal, but then they could withhold any such emails from discovery (claiming attorney privilege) in case of any lawsuits.

The lawyers I’ve dealt with really, really don’t like it when people do that.