Jay Taylor's notes

back to listing index

The Web never forgets: Persistent tracking mechanisms in the wild

[web search]
Original source (securehomes.esat.kuleuven.be)
Tags: cookies persistent-cookies securehomes.esat.kuleuven.be
Clipped on: 2016-06-11

The Web never forgets: Persistent tracking mechanisms in the wild is the first large-scale study of three advanced web tracking mechanisms - canvas fingerprinting, evercookies and use of "cookie syncing" in conjunction with evercookies.

Read the paper »

About

The study is a collaboration between researchers Gunes Acar1, Christian Eubank2, Steven Englehardt2, Marc Juarez1, Arvind Narayanan2, Claudia Diaz1
1 KU Leuven, ESAT/COSIC and iMinds, Leuven, Belgium {gunes.acar, marc.juarez, claudia.diaz}@esat.kuleuven.be
2 Princeton University {cge,ste,arvindn}@cs.princeton.edu

Reference: G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, C. Diaz. The Web never forgets: Persistent tracking mechanisms in the wild. In Proceedings of CCS 2014, Nov. 2014. (Forthcoming)

Results

Canvas Fingerprinting

Image (Asset 1/4) alt=

Background

Canvas fingerprinting is a type of browser or device fingerprinting technique that was first presented by Mowery and Shacham in 2012. The authors found that by using the Canvas API of modern browsers, one can exploit the subtle differences in the rendering of the same text to extract a consistent fingerprint that can easily be obtained in a fraction of a second without user's awareness.

Image (Asset 2/4) alt=

Results

By crawling the homepages of the top 100,000 sites we found that more than 5.5% of the crawled sites include canvas fingerprinting scripts. Although the overwhelming majority (95%) of the scripts belong to a single provider (addthis.com), we discovered a total of 20 canvas fingerprinting provider domains, active on 5542 of the top 100,000 sites.

On the right, collage of the images printed to canvas by various fingerprinting scripts discovered during the study. The images are intercepted using a modified browser (by instrumenting the ToDataURL method). Some blank space was cropped from images to save space.


Canvas Fingerprinting Scripts

The below table shows the summary of canvas fingerprinting scripts found on the homepages of top 100K Alexa sites.

Full list of sites using Canvas Fingerprinting »

Fingerprinting script Number of 
including sites
Text drawn into the canvas
ct1.addthis.com/static/r07/core130.js (and 17 others) 5282 Cwm fjordbank glyphs vext quiz
i.ligatus.com/script/fingerprint.min.js 115 http://valve.github.io
src.kitcode.net/fp2.js 68 http://valve.github.io
admicro1.vcmedia.vn/fingerprint/figp.js 31 http://admicro.vn/
amazonaws.com/af-bdaz/bquery.js 26 Centillion
*.shorte.st/js/packed/smeadvert-intermediate-ad.js 14 http://valve.github.io
stat.ringier.cz/js/fingerprint.min.js 4 http://valve.github.io
cya2.net/js/STAT/89946.js 3 ABCDEFGHIJKLMNOPQRSTUVWXYZ
abcdefghijklmnopqrstuvwxyz0123456789+/
images.revtrax.com/RevTrax/js/fp/fp.min.jsp 3 http://valve.github.io
pof.com 2 http://www.plentyoffish.com
*.rackcdn.com/mongoose.fp.js 2 http://api.gonorthleads.com
9 others* 9 (Various)
TOTAL 5559
(5542 unique1)

*: Some URLs are truncated or omitted for brevity.
1: Some sites include canvas fingerprinting scripts from more than one domain.

Evercookies & Respawning

Image (Asset 3/4) alt=

Background

Evercookies are designed to overcome the "shortcomings" of the traditional tracking mechanisms. By utilizing multiple storage vectors that are less transparent to users and may be more difficult to clear, evercookies provide an extremely resilient tracking mechanism, and have been found to be used by many popular sites to circumvent deliberate user actions1, 2, 3.

Results

We detected respawning by Flash cookies on 10 of the 200 most popular sites and found 33 different Flash cookies were used to respawn over 175 HTTP cookies on 107 of the top 10,000 sites. The below table shows the 10 top-ranked websites found to include respawning based on Flash cookies.
Country: The country where the website is based.
3rd*: The domains that are different from the first-party but registered for the same company in the WHOIS database.

Global rankSiteCountryRespawning (Flash) domainFlash cookie name1st/3rd Party
16 sina.com.cn China simg.sinajs.cn stonecc_suppercookie.sol 3rd*
17 yandex.ru Russia kiks.yandex.ru fuid01.sol 1st
27 weibo.com China simg.sinajs.cn stonecc_suppercookie.sol 3rd*
41 hao123.com China ar.hao123.com $hao123$.sol 1st
52 sohu.com China tv.sohu.com vmsuser.sol 1st
64 ifeng.com Hong Kong y3.ifengimg.com www.ifeng.com.sol 3rd*
69 youku.com China irs01.net mt_adtracker.sol 3rd
178 56.com China irs01.net mt_adtracker.sol 3rd
196 letv.com China irs01.net mt_adtracker.sol 3rd
197 tudou.com China irs01.net mt_adtracker.sol 3rd

Cookie Syncing

Image (Asset 4/4) alt=

Background

Cookie synchronization or cookie syncing is the practice of tracker domains passing pseudonymous IDs associated with a given user, typically stored in cookies, amongst each other.

Read the blog post that explains cookie syncing and our findings with animated diagrams: The hidden perils of cookie syncing (Freedom to Tinker)

Results

The below table shows the number of IDs known by the top 10 parties involved in cookie sync under both the policy of allowing all cookies and blocking third-party cookies.

Full list of domains involved in Cookie Syncing »

All Cookies Allowed No 3P Cookies
Domain # IDs Domain # IDs
gemius.pl 33 gemius.pl 36
doubleclick.net 32 2o7.net 27
2o7.net 27 omtrdc.net 27
rubiconproject.com 25 cbsi.com 26
omtrdc.net 24 parsely.com 16
cbsi.com 24 marinsm.com 14
adnxs.com 22 gravity.com 14
openx.net 19 cxense.com 13
cloudfront.net 18 cloudfront.net 10
rlcdn.com 17 doubleclick.net 10

The table presents the comparison of high-level cookie syncing statistics when allowing and disallowing third-party cookies (top 3,000 Alexa domains).

Statistic Third party cookie policy
Allow Block
# IDs 1308 938
# ID cookies 1482 953
# IDs in sync 435 347
# ID cookies in sync 596 353
# (First*) Parties in sync (407) 730 (321) 450
# IDs known per party 1 / 2.0 / 1 / 33 1 / 1.8 / 1 / 36
# Parties knowing an ID 2 / 3.4 / 2 / 43 2 / 2.3 / 2 / 22

The format of the bottom two rows is minimum/mean/median/maximum.
*Here we define a firstparty as a site which was visited in the first-party context at any point in the crawl.

Data

Due to the size of the files, data is available by request. Please feel free to email us on web-never-forgets [AT] lists.cs.princeton.edu for your requests. In the meantime, you can download a sample database.

Databases available for download


(DO = Digital Ocean, EC2 = Amazon EC2)

Name Size Machine # - Location (Provider) # of sites Flash enabled? cookie setting Data from previous crawls (Exp. #)
- Data loaded
Continuous Profile Comments
P01_alexa10k_05012014_fresh 114M 1 - N. Virginia (EC2) 10K yes Allow all no yes fresh profile
P04_alexa10k_05032014_fresh 306M 1 - N. Virginia (EC2) 10K yes Allow all no yes fresh profile
P06_alexa3k_05062014_fresh 84M 1 - N. Virginia (EC2) 3k yes Allow all No yes
P08_alexa3k_05062014_fresh 84M 2 - N. Virginia (EC2) 3k yes Allow all No yes
P09_alexa3k_05072014_flash 84M 2 - N. California (EC2) 3k yes Allow all (P6) - Flash yes loaded Flash from P6
P10_alexa3k_05072014_localStorage 77M 3 - N. Virginia (EC2) 3k yes Allow all (P6) - localStorage yes loaded localStorage from P6
P11_alexa3k_05072014_HTTP_cookies 90M 4 - N. Virginia (EC2) 3k yes Allow all (P6) - HTTP Cookies yes loaded cookies.sqlite from P6
P14_alexa3k_05122014_DNT 76M 1 - N. Virginia (EC2) 3k yes Allow all No yes DNT Enabled
P15_alexa3k_05122014_DNT 81M 2 - N. California (EC2) 3k yes Allow all No yes DNT Enabled
P16_alexa3k_05122014_no3Pcookies 55M 4 - N. Virginia (EC2) 3k yes Allow 1st party No yes Block third-part cookies
P17_alexa3k_05122014_no3Pcookies 55M 3 - N. Virginia (EC2) 3k yes Allow 1st party No yes Block third-part cookies
P21_alexa3k_06132014_opt-out 60M 5 - N. Virginia (EC2) 3k yes Allow all No yes Loaded Opt-out from: NAI, DAA, EDAA
P22_alexa3k_06132014_opt-out 64M 6 - N. California (EC2) 3k yes Allow all No yes Loaded Opt-out from: NAI, DAA, EDAA
L03_alexa10k_05032014_flash 295M 7- New York (DO) 10K yes Allow all (P1) - Flash no Flash loaded from P1
L04_alexa10k_05042014_flash 295M 7- New York (DO) 10K yes Allow all (P1) - Flash no Flash loaded from P1
L05_alexa10k_05042014_fresh 289M 8- New York (DO) 10K yes Allow all no no fresh profile
L06_alexa100k_flash_no3Pcookies 2.1G 9- Leuven (local machine) 100K yes Allow 1st party Flash, from pilot crawls no Flash from pilot crawls, everything else cleared, no POST data, isolated with chroot.

Code

The code developed during the study can be found at GitHub. This includes crawling infrastructure, modules for analysing browser profile data and crawl databases.

Press

Contact

Gunes Acar gunes.acar@esat.kuleuven.be
Christian Eubank cge@cs.princeton.edu
Steven Englehardt ste@cs.princeton.edu
Marc Juarez marc.juarez@esat.kuleuven.be
Arvind Narayanan arvindn@cs.princeton.edu
Claudia Diaz claudia.diaz@esat.kuleuven.be