Jay Taylor's notes
back to listing indexThe Web never forgets: Persistent tracking mechanisms in the wild
[web search]The Web never forgets: Persistent tracking mechanisms in the wild is the first large-scale study of three advanced web tracking mechanisms - canvas fingerprinting, evercookies and use of "cookie syncing" in conjunction with evercookies.
About
The study is a collaboration between researchers Gunes Acar1, Christian Eubank2, Steven Englehardt2,
Marc Juarez1, Arvind Narayanan2, Claudia Diaz1
1 KU Leuven, ESAT/COSIC and iMinds, Leuven, Belgium {gunes.acar, marc.juarez, claudia.diaz}@esat.kuleuven.be
2 Princeton University {cge,ste,arvindn}@cs.princeton.edu
Reference: G. Acar, C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, C. Diaz. The Web never forgets: Persistent tracking mechanisms in the wild. In Proceedings of CCS 2014, Nov. 2014. (Forthcoming)
Results
Canvas Fingerprinting
Background
Canvas fingerprinting is a type of browser or device fingerprinting technique that was first presented by Mowery and Shacham in 2012. The authors found that by using the Canvas API of modern browsers, one can exploit the subtle differences in the rendering of the same text to extract a consistent fingerprint that can easily be obtained in a fraction of a second without user's awareness.
Results
By crawling the homepages of the top 100,000 sites we found that more than 5.5% of the crawled sites include canvas fingerprinting scripts. Although the overwhelming majority (95%) of the scripts belong to a single provider (addthis.com), we discovered a total of 20 canvas fingerprinting provider domains, active on 5542 of the top 100,000 sites.
On the right, collage of the images printed to canvas by various fingerprinting scripts discovered during the study.
The images are intercepted using a modified browser (by instrumenting the ToDataURL
method).
Some blank space was cropped from images to save space.
Canvas Fingerprinting Scripts
The below table shows the summary of canvas fingerprinting scripts found on the homepages of top 100K Alexa sites.
Full list of sites using Canvas Fingerprinting »
Fingerprinting script | Number of including sites |
Text drawn into the canvas |
---|---|---|
ct1.addthis.com/static/r07/core130.js (and 17 others) | 5282 | Cwm fjordbank glyphs vext quiz |
i.ligatus.com/script/fingerprint.min.js | 115 | http://valve.github.io |
src.kitcode.net/fp2.js | 68 | http://valve.github.io |
admicro1.vcmedia.vn/fingerprint/figp.js | 31 | http://admicro.vn/ |
amazonaws.com/af-bdaz/bquery.js | 26 | Centillion |
*.shorte.st/js/packed/smeadvert-intermediate-ad.js | 14 | http://valve.github.io |
stat.ringier.cz/js/fingerprint.min.js | 4 | http://valve.github.io |
cya2.net/js/STAT/89946.js | 3 | ABCDEFGHIJKLMNOPQRSTUVWXYZ abcdefghijklmnopqrstuvwxyz0123456789+/ |
images.revtrax.com/RevTrax/js/fp/fp.min.jsp | 3 | http://valve.github.io |
pof.com | 2 | http://www.plentyoffish.com |
*.rackcdn.com/mongoose.fp.js | 2 | http://api.gonorthleads.com |
9 others* | 9 | (Various) |
TOTAL | 5559 (5542 unique1) |
*: Some URLs are truncated or omitted for brevity.
1: Some sites include canvas fingerprinting scripts from more than one domain.
Evercookies & Respawning
Background
Evercookies are designed to overcome the "shortcomings" of the traditional tracking mechanisms. By utilizing multiple storage vectors that are less transparent to users and may be more difficult to clear, evercookies provide an extremely resilient tracking mechanism, and have been found to be used by many popular sites to circumvent deliberate user actions1, 2, 3.
Results
We detected respawning by Flash cookies on 10 of the 200 most popular sites and found 33 different Flash cookies
were used to respawn over 175 HTTP cookies on 107 of the top 10,000 sites.
The below table shows the 10 top-ranked websites found to include respawning based on Flash cookies.
Country: The country where the website is based.
3rd*: The domains that are different from the first-party but registered for the same company in the WHOIS database.
Global rank | Site | Country | Respawning (Flash) domain | Flash cookie name | 1st/3rd Party |
---|---|---|---|---|---|
16 | sina.com.cn | China | simg.sinajs.cn | stonecc_suppercookie.sol | 3rd* |
17 | yandex.ru | Russia | kiks.yandex.ru | fuid01.sol | 1st |
27 | weibo.com | China | simg.sinajs.cn | stonecc_suppercookie.sol | 3rd* |
41 | hao123.com | China | ar.hao123.com | $hao123$.sol | 1st |
52 | sohu.com | China | tv.sohu.com | vmsuser.sol | 1st |
64 | ifeng.com | Hong Kong | y3.ifengimg.com | www.ifeng.com.sol | 3rd* |
69 | youku.com | China | irs01.net | mt_adtracker.sol | 3rd |
178 | 56.com | China | irs01.net | mt_adtracker.sol | 3rd |
196 | letv.com | China | irs01.net | mt_adtracker.sol | 3rd |
197 | tudou.com | China | irs01.net | mt_adtracker.sol | 3rd |
Cookie Syncing
Background
Cookie synchronization or cookie syncing is the practice of tracker domains passing pseudonymous IDs associated with a given user, typically stored in cookies, amongst each other.
Read the blog post that explains cookie syncing and our findings with animated diagrams: The hidden perils of cookie syncing (Freedom to Tinker)
Results
The below table shows the number of IDs known by the top 10 parties involved in cookie sync under both the policy of allowing all cookies and blocking third-party cookies.
Full list of domains involved in Cookie Syncing »
All Cookies Allowed | No 3P Cookies | ||
---|---|---|---|
Domain | # IDs | Domain | # IDs |
gemius.pl | 33 | gemius.pl | 36 |
doubleclick.net | 32 | 2o7.net | 27 |
2o7.net | 27 | omtrdc.net | 27 |
rubiconproject.com | 25 | cbsi.com | 26 |
omtrdc.net | 24 | parsely.com | 16 |
cbsi.com | 24 | marinsm.com | 14 |
adnxs.com | 22 | gravity.com | 14 |
openx.net | 19 | cxense.com | 13 |
cloudfront.net | 18 | cloudfront.net | 10 |
rlcdn.com | 17 | doubleclick.net | 10 |
The table presents the comparison of high-level cookie syncing statistics when allowing and disallowing third-party cookies (top 3,000 Alexa domains).
Statistic | Third party cookie policy | |
---|---|---|
Allow | Block | |
# IDs | 1308 | 938 |
# ID cookies | 1482 | 953 |
# IDs in sync | 435 | 347 |
# ID cookies in sync | 596 | 353 |
# (First*) Parties in sync | (407) 730 | (321) 450 |
# IDs known per party | 1 / 2.0 / 1 / 33 | 1 / 1.8 / 1 / 36 |
# Parties knowing an ID | 2 / 3.4 / 2 / 43 | 2 / 2.3 / 2 / 22 |
The format of the bottom two rows is
minimum/mean/median/maximum.
*Here we define a firstparty
as a site which was visited in the first-party context
at any point in the crawl.
Data
Databases available for download
(DO = Digital Ocean, EC2 = Amazon EC2)
Name | Size | Machine # - Location (Provider) | # of sites | Flash enabled? | cookie setting | Data from previous crawls (Exp. #) - Data loaded |
Continuous Profile | Comments |
---|---|---|---|---|---|---|---|---|
P01_alexa10k_05012014_fresh | 114M | 1 - N. Virginia (EC2) | 10K | yes | Allow all | no | yes | fresh profile |
P04_alexa10k_05032014_fresh | 306M | 1 - N. Virginia (EC2) | 10K | yes | Allow all | no | yes | fresh profile |
P06_alexa3k_05062014_fresh | 84M | 1 - N. Virginia (EC2) | 3k | yes | Allow all | No | yes | |
P08_alexa3k_05062014_fresh | 84M | 2 - N. Virginia (EC2) | 3k | yes | Allow all | No | yes | |
P09_alexa3k_05072014_flash | 84M | 2 - N. California (EC2) | 3k | yes | Allow all | (P6) - Flash | yes | loaded Flash from P6 |
P10_alexa3k_05072014_localStorage | 77M | 3 - N. Virginia (EC2) | 3k | yes | Allow all | (P6) - localStorage | yes | loaded localStorage from P6 |
P11_alexa3k_05072014_HTTP_cookies | 90M | 4 - N. Virginia (EC2) | 3k | yes | Allow all | (P6) - HTTP Cookies | yes | loaded cookies.sqlite from P6 |
P14_alexa3k_05122014_DNT | 76M | 1 - N. Virginia (EC2) | 3k | yes | Allow all | No | yes | DNT Enabled |
P15_alexa3k_05122014_DNT | 81M | 2 - N. California (EC2) | 3k | yes | Allow all | No | yes | DNT Enabled |
P16_alexa3k_05122014_no3Pcookies | 55M | 4 - N. Virginia (EC2) | 3k | yes | Allow 1st party | No | yes | Block third-part cookies |
P17_alexa3k_05122014_no3Pcookies | 55M | 3 - N. Virginia (EC2) | 3k | yes | Allow 1st party | No | yes | Block third-part cookies |
P21_alexa3k_06132014_opt-out | 60M | 5 - N. Virginia (EC2) | 3k | yes | Allow all | No | yes | Loaded Opt-out from: NAI, DAA, EDAA |
P22_alexa3k_06132014_opt-out | 64M | 6 - N. California (EC2) | 3k | yes | Allow all | No | yes | Loaded Opt-out from: NAI, DAA, EDAA |
L03_alexa10k_05032014_flash | 295M | 7- New York (DO) | 10K | yes | Allow all | (P1) - Flash | no | Flash loaded from P1 |
L04_alexa10k_05042014_flash | 295M | 7- New York (DO) | 10K | yes | Allow all | (P1) - Flash | no | Flash loaded from P1 |
L05_alexa10k_05042014_fresh | 289M | 8- New York (DO) | 10K | yes | Allow all | no | no | fresh profile |
L06_alexa100k_flash_no3Pcookies | 2.1G | 9- Leuven (local machine) | 100K | yes | Allow 1st party | Flash, from pilot crawls | no | Flash from pilot crawls, everything else cleared, no POST data, isolated with chroot. |
Code
Press
- Meet the Online Tracking Device That is Virtually Impossible to Block (ProPublica)
- Browser 'fingerprints' help track users (BBC)
- Canvas Fingerprinting: Neue Methode zum Online-Tracking macht Verstecken fast unmöglich (Der Spiegel)
- Publicité : une nouvelle technique pour pister les internautes (Le Monde)
- Stealthy new online tracking software puts your privacy at risk (The Globe and Mail)
- Deze online tracker volgt je in het geniep en is niet uit te schakelen (De Correspondent)
- White House Website Includes Unique Non-Cookie Tracker, Conflicts With Privacy Policy (EFF)
- Web Trackers Paint a Fresh Picture of You (Boing Boing)
- Researchers reveal 3 devious ways online trackers shatter your privacy and follow your digital footsteps (PCWorld)
Contact
Gunes Acar | gunes.acar@esat.kuleuven.be |
Christian Eubank | cge@cs.princeton.edu |
Steven Englehardt | ste@cs.princeton.edu |
Marc Juarez | marc.juarez@esat.kuleuven.be |
Arvind Narayanan | arvindn@cs.princeton.edu |
Claudia Diaz | claudia.diaz@esat.kuleuven.be |