Device fingerprinting at scale: how we identify 94.7% of returning devices without cookies

Third-party cookies are dead. First-party cookies are dying. Apple's Intelligent Tracking Prevention, Firefox's Enhanced Tracking Protection, and Google's Privacy Sandbox have collectively dismantled the identity infrastructure that digital marketing relied on for two decades. Most of the industry is responding with some combination of panic, denial, and half-measures like universal IDs, data clean rooms, and probabilistic matching that achieves, at best, 60% accuracy on a good day. At Meridian Syn, we took a different approach. We built a device fingerprinting system that identifies 94.7% of returning devices without setting a single cookie, without relying on any third-party identity graph, and without requiring user login. This post explains how it works.

The signal taxonomy

Traditional device fingerprinting relies on a relatively small set of browser-exposed attributes: user agent string, screen resolution, installed fonts, WebGL renderer, timezone, language settings. These attributes produce a fingerprint that is unique enough to identify roughly 70-80% of devices in a given population, but that accuracy degrades rapidly as browsers implement fingerprinting countermeasures. Apple's Safari now randomizes canvas fingerprint outputs. Firefox blocks font enumeration. Chrome's Privacy Budget proposal would cap the entropy any single API can expose. A fingerprinting system built on these traditional signals is a system with a shelf life measured in months, not years. Our approach is fundamentally different. Instead of relying on what the browser tells us about the device, we observe how the device is used. We call this behavioral fingerprinting, and it operates across 147 dimensions of passive signal collection that are invisible to traditional anti-fingerprinting measures because they do not depend on any specific browser API.

The 147 dimensions break into six signal families. The first is kinematic signals: scroll velocity distributions, scroll deceleration curves, pointer movement trajectories, click pressure variance (on supported devices), and touch contact area patterns on mobile. Every human interacts with a device in ways that are as unique as a handwriting sample. The speed at which you scroll, the curve of your pointer path as you move toward a button, the slight upward drift in your touch position as your thumb fatigues over a session, these patterns form a kinematic fingerprint that is remarkably stable across sessions and remarkably difficult to spoof. The second family is temporal signals: inter-keystroke timing distributions, session initiation patterns (what time of day, what day of week), dwell time distributions across content types, and the characteristic pause patterns between actions that reflect individual cognitive processing speed.

The third family is environmental signals: ambient light sensor readings (where available), device orientation and gyroscope drift patterns, battery charge and discharge curves, and charging behavior patterns. A device that consistently connects to a charger at 23% battery and charges on a curve that matches a specific battery degradation profile is, in our models, identifiable with 87% confidence from environmental signals alone. The fourth family is rendering signals: sub-pixel font rendering differences, GPU-specific shader compilation timing, audio context fingerprints derived from oscillator output variations, and WebGL performance profiling. These are closer to traditional fingerprinting but measured at a granularity that evades current countermeasures. The fifth family is network signals: TCP/IP stack behavior, TLS handshake characteristics, connection establishment timing patterns, and DNS resolution behavior. The sixth family is interaction grammar signals: the sequence patterns in which a user navigates a site, the order of form field completion, the characteristic way they correct typos (backspace count, selection-and-retype ratio, autocorrect acceptance rate).

The model architecture

Raw signals from all 147 dimensions are ingested into a multi-modal embedding pipeline that produces a 512-dimensional device vector. This vector is not a hash or a deterministic fingerprint. It is a learned representation that captures the essential characteristics of a device-user pair in a continuous vector space. Two sessions from the same device-user pair will produce vectors that are close together in this space, even if individual signal dimensions vary between sessions due to environmental changes, software updates, or browser configuration changes. The embedding model is a modified transformer architecture with cross-attention layers that learn correlations between signal families. This is critical because the correlations between signals are often more identifying than the signals themselves. A device with a specific scroll velocity distribution AND a specific battery degradation curve AND a specific inter-keystroke timing pattern is vastly more identifiable than any of those signals in isolation.

Matching is performed using approximate nearest neighbor search against a device vector database that currently holds 2.3 billion device profiles across our customer base. When a new session begins, we collect signals for the first 8-12 seconds of interaction (the "observation window"), generate a session vector, and query the database for the nearest neighbor. If the cosine similarity exceeds our confidence threshold (currently 0.94), we classify the session as a returning device and link it to the existing profile. If it falls between 0.87 and 0.94, we flag it as a probable match and collect additional signals before confirming. Below 0.87, we treat it as a new device and create a fresh profile. This tiered matching approach is what allows us to achieve 94.7% identification accuracy while maintaining a false positive rate below 0.3%. In production, false positives are more dangerous than false negatives, because incorrectly linking two different users' sessions can corrupt targeting models and violate privacy expectations. Our threshold calibration is conservative by design.

Privacy and compliance

I want to address the privacy question directly because it is the first thing anyone asks, and it should be. Behavioral fingerprinting operates in a regulatory gray zone. It does not use cookies, so it does not trigger cookie consent requirements under most implementations of the ePrivacy Directive. It does not collect personally identifiable information in the traditional sense, no names, no emails, no phone numbers. What it collects are behavioral patterns that, in aggregate, can uniquely identify a device. Under GDPR, this likely constitutes "online identifiers" and therefore qualifies as personal data, triggering lawful basis requirements. We process behavioral fingerprint data under the legitimate interest basis for our customers, supported by a detailed Legitimate Interest Assessment that balances the commercial interest in accurate attribution against the privacy impact of passive signal collection. We provide opt-out mechanisms through our customers' privacy preference centers, and we honor Global Privacy Control signals. Our data retention policy automatically purges device profiles after 180 days of inactivity.

We are also transparent about what behavioral fingerprinting cannot do, and what we refuse to build. It cannot identify a specific human being. It identifies a device-user pair, a particular device being used in a particular way. If you lend your laptop to a friend, our system will likely recognize it as a different user on the same device. If you use your phone in bed versus at your desk, the kinematic signals shift enough that our model tracks these as distinct usage contexts within the same device profile, not as different identities. We do not attempt cross-device linking through behavioral fingerprinting alone. That capability exists in our platform through other mechanisms (deterministic matching via authenticated sessions), but the fingerprinting system is device-scoped by design.

Results in production

We have been running behavioral fingerprinting in production since October 2024 across 340 customer deployments. The aggregate results validate the approach. Identification accuracy on returning devices is 94.7%, measured against ground truth from authenticated sessions where we can verify identity independently. This compares to 61.3% accuracy for our previous cookie-dependent identification system and 73.8% for the best third-party probabilistic identity solution we benchmarked against (LiveRamp's Authenticated Traffic Solution). Quilmark, which operates primarily in Safari-heavy markets (72% of their traffic), saw their returning visitor identification rate increase from 34% to 89% after deploying behavioral fingerprinting. Crestline Labs reported a 31% improvement in retargeting efficiency directly attributable to more accurate identity resolution. The observation window adds an average of 4.2 seconds to initial identification latency compared to cookie-based identification, which is effectively instantaneous. For real-time bidding environments where latency is critical, we offer a "fast match" mode that uses only the first 3 seconds of signals and accepts a lower confidence threshold, trading accuracy (drops to approximately 88%) for speed.

The system learns continuously. Every confirmed match (verified through subsequent authentication events) is used to fine-tune the embedding model. Every false positive that is detected and corrected is used to adjust confidence thresholds. The model has improved by 3.1 percentage points in accuracy since its initial deployment, and we expect it to plateau around 96-97% as the training corpus grows. The next frontier is behavioral fingerprinting across encrypted and obfuscated environments, Tor traffic, VPN connections, privacy-focused browsers, where traditional signals are deliberately suppressed. Early research suggests that kinematic and temporal signals remain surprisingly stable even in these environments, because they derive from human behavior rather than device configuration. We are not ready to share accuracy numbers for those contexts yet, but the preliminary results are encouraging enough that we have allocated a dedicated research team to the problem for H1 2025.

Device fingerprinting at scale: how we identify 94.7% of returning devices without cookies

The signal taxonomy

The model architecture

Privacy and compliance

Results in production

Related Posts

Your competitors are already reading this.