- Chief Technical Officer, Approov
More details of the UK's controversial NHSX contact tracing app are being released as the app starts a wider scale trial on the Isle of Wight this week. NHSX is a digital transformation group associated with the UK National Health Service.
Why controversial? There are many reasons, some to do with how the app development was initially procured, but also specifically from a technical perspective as the UK has opted for a centralised contact tracing approach rather than the decentralised model being championed by Apple and Google amongst others (including ourselves).
The centralised model means that the checking for proximity contacts is done on a central server, rather than in a decentralised fashion within the privacy of end user apps. When a user of the app becomes ill with Covid and uploads their data to the central server, the IDs of all their contacts will also be uploaded.
It does appear at least that the NHSX app signup process only requires a regional postcode which typically maps to many thousands of households. A unique random app instance ID is also generated automatically. So this information does not directly identify the app user.
From a privacy perspective this is actually an improvement over the approach adopted by Singapore’s TraceTogether app (from which Australia's COVIDSafe app is derived). These apps require a phone number during the signup process so that contact tracers can get in contact if proximity to an infectious person is suspected. Identifying the individual from their phone number is trivial in most cases. NHSX have adopted the approach of using a push notification to individual app instances to inform them of a proximity event. They won’t know the identity of the user they are sending the notification to.
The app user’s phone will be constantly open to transmitting identifiers t_hat include (amongst other details) an encrypted form of the app instance ID. Anyone with access to the NHSX database, and thus the decryption key, will be able to know the instance ID of the device. Even though the database doesn’t contain the identity of the user, it only takes one event in the real world (say though a personal contact, a modified Point of Sale device, or face recognition systems) to associate the instance ID with the actual person’s identity. Once this association is made it is then possible to track the presence of that individual, both in the past since they installed it and in the future until they disable it. The NHSX (nor indeed any other contact tracing) app cannot tell if another device using the app’s Bluetooth protocol is really a valid version of the app running on somebody else’s phone, or some other app or hardware device that is simply spoofing it. Users of the app can be subject to tracking if there is a network of devices invisibly collecting the encrypted IDs and storing them centrally. Thus installation of the app opens up the user to some heightened risk of tracking. We just have to trust that the database of private keys is indeed heavily secured and only accessible appropriately.
There is a great analysis in a technical paper, with an excellent blog introducing it, prepared by the National Cyber Security Centre. I’m pleased that some thought has been given to the sort of replay and relay attack scenarios we covered in our previous blog on Privacy vs Security.
On first reading though the most striking design decision is the following:
“The privacy and security design is there to support the epidemiological model and the needs of clinicians who are managing the virus in the UK. There are balances and trades to be made. An obvious one is the period of randomisation of the encrypted blob sent out over BLE. Some people reckon 15 minutes is right. Initially, the NHS app randomises every day because it allows us to give users feedback on how they’re doing, for example ‘Today, you were close to 10 people and really close to 2 people'. But that’s a policy balance and the period can be changed if necessary.”
I believe this design decision is a significant miss-step, and the discussion of policy balance is a strong indication that they already know this to be the case. The implication of this choice is that a phone continues to respond with exactly the same Broadcast Value of encrypted data for the entire day. This provides a unique identifier allowing an individual device to be potentially traced through any locations where Bluetooth tracking software is operating for the whole day. All that is required is an event that occurs at some point during the day that can be used to associate the Broadcast value with a particular individual. This significant potential privacy loss is simply a side effect of this design decision. No access to the NHSX database is required to undertake such tracking.
No other contact tracing solution is using such an elongated period before rotating the identifiers. Other solutions such as Apple/Google, TraceTogether and CovidSafe are all using something like the 15 minute suggested. There is a reason for this, it is typically the period before the MAC hardware address for Bluetooth is randomized. There is little point in making the period shorter, but anything longer reintroduces the significant degradation to privacy that randomization sought to protect.
If there is an expectation that a large percentage of the population should install this app, then a significant duty of care is required with respect to individual privacy. Andy Warhol and the developers of other contact tracing apps were right. Fifteen minutes should be the limit of our individual broadcast fame.
It’s understandable that NHSX would like a feature to show users the number of their contacts in a day. But the tradeoff in privacy to enable this feature simply isn’t justified. What’s more, it’s not even clear it is necessary! If NHSX really wanted this feature then they could provide an additional portion of the Broadcast Value that does only change every day, whilst the rest of the value changes at the more sensible pace of every 15 minutes. This Daily ID could be simply randomized each day, and be short in length - say 7 bits to only provide 128 different random values. This would be sufficient to accurately estimate the number of close contacts in a day, but not enough to easily track anybody. With only 128 different values, nearly 1% of the app user population would be allocated the same Daily ID. So as soon as an individual disappears into the crowd it wouldn’t be practical to track them further.
No doubt more will become clear once the app is available in the coming days as the first reversing analysis is made available. We also hope that the promise of source code disclosure will be fulfilled quickly. I’ve no doubt that the app code will only collect the information it describes, but of course this does need to be verified in case there are any unexpected leaks of data that might reveal identities more easily than intended.
In a centralised system, however, app side only analysis is insufficient. It is not possible to know what code is running on the servers, who has access to the data stored or the types of analysis that are being applied to it. In a decentralised system the server only acts as an information conduit, so the inability to know for sure what it is doing is of much less relevance. As the proverb says: “Trust, but verify”. Unfortunately in this case we appear to be just left with trust, something a little eroded in this project to date.
- Chief Technical Officer, Approov