Ting-Fang Yen, Yinglian Xie, Fang Yu, Roger Peng Yu and Martin Abadi
Many web services aim to track clients as a basis for analyzing their behavior and providing personalized services. Despite much debate regarding the collection of client information, there have been few quantitative studies that analyze the effectiveness of host-tracking and the associated privacy risks.
In this paper, we perform a large-scale study to quantify the amount of information revealed by common host identiﬁers. We analyze month-long anonymized datasets collected by the Hotmail web-mail service and the Bing search engine, which include millions of hosts across the global IP address space. In this setting, we compare the use of multiple identiﬁers, including browser information, IP addresses, cookies, and user login IDs.
We further demonstrate the privacy and security implications of host-tracking in two contexts. In the ﬁrst, we study the causes of cookie churn in web services, and show that many returning users can still be tracked even if they clear cookies or utilize private browsing. In the second, we show that host-tracking can be leveraged to improve security. Speciﬁcally, by aggregating information across hosts, we uncover a stealthy malicious attack associated with over 75,000 bot accounts that forward
cookies to distributed locations.