Authentication at the cost of privacy: a non-technical overview

Our daily life is full of interaction with technology in the form of smart devices, computers, cars, the Internet of Things (IoT), or other sensor-equipped devices.  Technology is becoming more beneficial than ever; it enables users to conveniently use services and products worldwide.   

The identity providers (IdPs), e.g., Google, Facebook, Microsoft, Apple, LinkedIn, and many more, provide digital identities to the users. The identity providers act as trusted third-party (TTP) and authenticate users when they request to access services or products offered by other service providers. Usernames and passwords are very common concepts of authentication, but biometrics are also utilized for authentication.  The IdPs collect user personal data, including location data, devices information, IP address, cookies, etc. The collected data are utilized for two purposes: 1) to enable continuous authentication as second-factor authentication by recognizing user data in passive and seamless manners, it mitigates the risk if somebody steals passwords; and 2) to utilize such data for their commercial use, e.g., for advertisements (1,2,3).  For example, Google collects data about users' apps installed on the device, cookies, videos users watch, biometric information (voice, audio), location information, IP address,  WiFi access points, cell towers, Bluetooth-enabled devices, etc.

Different types of authentications can be utilized to achieve continuous authentication. Specifically, continuous authentication is achieved by physiological biometrics techniques (face recognition, iris recognition), behavioral biometrics  techniques (keystroke dynamics, motion dynamics, touch dynamics, stylometry, signature recognition, voice recognition, etc.), and context-aware modes  (IP address, MAC addresses, GPS data, cookies, etc.) [4].  

Each continuous authentication mode has pros and cons. Physiological biometric modes have good performance in terms of accuracy, but when it comes to continuous authentication, some modes require user attention during the authentication process and other modes require users to perform explicit actions, for instance, facing the camera constantly. Thus, such modes are neither continuous, passive, nor seamless. That is why physiological biometric modes are not ideal for continuous authentication.

Context-aware authentication modes are consistent with the continuous, passive, and seamless properties but rely on information about the device and other contextual parameters rather than users themselves. For instance, they can differentiate whether someone attempts to log in from a different location or different devices; in such cases, most users are asked to authenticate themselves by second-factor authentication mechanisms. However, when devices get compromised in the specified location, they cannot distinguish whether the legitimate user uses the device or an imposter. Such problems appear when users leave the device open and an imposter uses it in their absence.  

Behavioral biometrics aims to identify users by analyzing and recognizing behavioral patterns, such as identifying users by their typing behaviors, touch gestures, stylometry, mouse movements, walking style, etc. The behavioral patterns can be collected passively and continuously; for instance, users don't need to perform any specific action for authentication. Therefore, the users' validity is determined by the activities they usually perform while using the devices. However, the potential problem with behavioral biometrics is the lack of significant performance in terms of accuracy. The external factors influence the performance, for instance, users' sudden changes in emotional states and other psychological factors.  The risk of identity theft remains even though one of the continuous authentication mechanisms is applied.  These problems can be mitigated by combining two more modalities (multimodal), e.g., enabling both context-aware modes and behavioral biometrics as a second factor.   

Value of consent

The consent is given before data collection during the sign-up phase, but the consent cannot preserve privacy; most people do not read the conditions stated in the consent. However, consent only provides the awareness of what data are collected but cannot give any privacy solution.   

According to GDPR Art. 4, data containing identifiable information that can directly or indirectly identify individuals are considered personal data. Personal data can be classified as sensitive data and non-sensitive data.

Sensitive data include physiological and behavioral biometric data. Moreover, the GPS location data and other identifiable data, including IP addresses, cookies, etc., fall in the category of personal data.

Privacy challenges 

The data are outsourced to the cloud servers, where the processing is done, and most of the users do not trust the server. The personal data are collected to authenticate us continuously. The data utilized for behavioral biometric-based continuous authentication contain our behavioral information, our activities, additional information about demographic data in different domains, and emotional states.  

Context-aware data reveal our contextual information, device information, applications installed on the device, access to calendar data, cookies, and many more, which are considered personal data (see this blog).  Whether the personal data are utilized for continuous authentication or recommendation systems, it should be processed and stored in a privacy-preserving manner.  

As an outcome, security is strengthened by sacrificing the level and values of our privacy.  The trade-off between security and privacy affects user experiences when it becomes a matter of personal data. In other words, the consequences of this trade-off lead users to avoid the technology. 

  Besides authentication for security, contextual information and other personal data are also being collected and utilized for commercial use, such as for the purpose of recommendations and business advertisements. These data let them know about the user's personality and choices so that they can make business advertisements and recommend them services according to the individual's preferences and contexts [5].  Another trade-off is between the performance and the privacy level of contextual data. Higher levels of privacy-preserving solutions require high computation resources. Otherwise, it could cause low performance or decreased service quality; for instance, efficient privacy-preserving information retrieval solutions need devices with high computation capabilities to make a faster query response. 

Current research directions mainly focus on the privacy aspects of physiological data. However, there is very little attention to behavioral and other context-aware data. In conclusion, these data must be stored and processed in privacy-preserving manners, as stated in Art. 9  of GDPR.

This blogpost was written by Ahmed Fraz Baig.