Yesterday morning security forums reportednewsthat an AI researcher had published a dataset of 40,000 photos that had been scraped from the dating app Tinder. The purpose was simply to extract a real world data set that can be used for training Convolutional Neural Networks (CNN) to tell the difference between men and women. This seems innocent enough, although the author's choice ofvariable namingcaused a bit of a stir. He quicklychangedthe variable name "hoe" to "subject" soon after the story broke. Apparently this original naming was inherited from theTinder Auto-Likercode.
This isn't the first time this has happened. Tinder has a long history of API scraping abuses:
The supposedly private Tinder API has been reverse engineered and fully documentedhere. This sort of knowledge allows easy to use open source API clients. For instancethisone andthisone both use Python, It is easy for anyone to download these and extend them for whatever purpose they see fit.
Back in February 2015 a software developer from Vancouverautomatedhis Tinder experience. "The dating app, like so many popular apps, has seen its internal,private APIreverse engineered and employed by third parties. Unauthorized users of Tinder's API commonly use it to create Tinderbots that interact with the service and other users, but Justin Long's Tinderbot looks to be one of the most ambitious Tinderbot creations." This bot can even start initial messaging conversations and try and work out if the sentiment is looking good.
Swipebusteris a paid service that lets you find out if somebody you know (and maybe love) is using Tinder (and perhaps you don't think they should be).
All that is needed to access the Tinder API is a single access token. That is pretty shocking. To get one of those, as explainedhere, you just need to sign up as a Tinder user. That is a pretty low barrier to entry and effectively anonymous. The python code provides a user-agent string of "Tinder Android Version 3.2.0". It's not of course, it's a script running on a PC. User agent strings provide absolutely no surety of caller identity whatsoever. Not even an API key required. As we at CriticalBlue have discussedbeforethis is not necessarily a very big barrier to securing an API, but at least it is a start and forces the Tinder app to be reverse engineered to extract the keys. There are many more advanced techniques that wecoverextensively in our mobile API security techniques series. Beyond that ourApproovproduct provides full software attestation to specifically protect against this type ofautomatedmobile API scraping.
Rate limiting might be in place in the API implementation. It is difficult to tell without abusing it. However if there is then it is pretty ineffective. The face scraper code just seems to add some small random delays (which presumably gives the interaction a more human like characteristic) after downloading the photos of each subject before effectively swiping left. The point about swiping left is that there is nodaily limit, and I suspect some real users swipe left at a prodigious rate. It must be hard to set a swipe left limit that doesn't curtail the rate of disdain some users need to demonstrate to their potential matches. The posted code amply demonstrates how far this automation can be taken. It can apparently extract 40,000 images using the same user ID from the same IP address. From looking at the code it seems a new image can be extracted every few seconds on average, so this takes less than a day to do. This must beat even the greatest power dislikers on the platform. Ultimately rate limiting can't solve the problem. All it can do is slow down and complicate the scripts. You can always create enoughfake usersdistributed over enough IP addresses to fly under the radar of any rate limiting system. What is needed is a concerted attempt to lock down access to the API to only the app or other approved software clients. Sure, attempts could be made to try and automate those but that is considerably more difficult to achieve and easier to detect.
Given the extensive history of abuses of the Tinder API at least some of these countermeasures should be in place for bot mitigation. Perhaps most users don't care about these things, but it only seems a matter of time before such mass profile data scraping and republishing turns into a much bigger and uglier story. That could really damage the brand and make would-be customers think twice before signing up and letting their personal data be swiped.