A new empirical study from Wake Forest University, Mind your key: An Empirical Study of LLM API Credential Leakage in iOS Apps, is the first systematic look at how large language model credentials leak from iOS applications. It deserves attention beyond the iOS community, because the mechanism it documents is not specific to iOS, not specific to LLMs, and not solved by any of the things developers usually reach for first.
The researchers analyzed 444 iOS apps with working LLM features. 282 of them — 64% — exposed exploitable credentials or backend access in their network traffic. Of those, 146 were fully exploitable: an attacker could send arbitrary inference requests billed to the developer's account, and in many cases extract the proprietary system prompts that encode the app's business logic. The most popular affected app had over 2.3 million ratings. Ninety days after responsible disclosure, 72% were still exploitable.
What the Study Actually Measured — and Why the Method Matters
The most important design decision in the paper is methodological. iOS apps from the App Store are encrypted with FairPlay DRM, so static binary scanning — the standard approach on Android, where APKs decompile readily — doesn't work without a jailbroken device and decryption. So the researchers didn't look inside the binaries at all. They built a tool called LLMKeyLens that intercepts the app's HTTPS traffic at runtime using a man-in-the-middle proxy, with a VPN-based transparent capture as a fallback for apps that try to bypass the system proxy. They then replayed each captured credential against the real backend to confirm it actually worked.
That choice has a consequence worth sitting with: everything they found was observable in transit, no matter how it was stored, obfuscated, or constructed inside the app. Whether the key was hardcoded in plaintext, assembled at runtime, hidden behind native code, or fetched from a server, it still had to travel over the network to reach an LLM endpoint — and at that moment it was visible. The study is, in effect, a large-scale demonstration that on-device storage hardening and code obfuscation do not protect a secret that still crosses an interceptable channel.
Three Leakage Patterns, Three Different Failures
The study sorts the leaks into three patterns. Each one tells a different story about where developers' mental models break down.
|
Leakage Pattern |
App Count / Percentage |
Core Failure Mode
|
|---|---|---|
|
Plaintext API keys |
54 apps (19%) |
Hardcoded provider keys sent directly in headers or query strings; often exposed system prompts. |
|
Unauthenticated backend proxies |
92 apps (33%) |
Keys moved to backend cloud functions, but the proxy itself lacked authentication, acting as an open relay. |
|
JWT / bearer token leakage |
136 apps (48%) |
Client tokens intercepted; characterized by infinite lifetimes or absolute lack of server-side validation. |
A clear hierarchy of misunderstanding runs through these three. The first group didn't separate the secret from the client. The second separated it but forgot to guard the new front door. The third built the front door and the lock but wired the lock backwards. All three were caught by the same passive interception.
The Defenses Developers Tried — and Why They Mostly Failed
The study also looked at what apps did to resist interception. Only 32% deployed any anti-interception mechanism at all. The single most popular one, HTTP proxy bypass, had an 81% bypass rate against the researchers' VPN-layer capture — it stops a naïve proxy and nothing more. Defenses only became effective when layered: combinations of proxy bypass with custom payload encryption or WebSocket transport drove the bypass rate to near zero, a roughly 7.9x improvement. But only about 10% of apps used layered defenses.
This is the empirical core of the transport-security argument. A single client-side defense is brittle; an attacker who controls the device routes around it. And critically, even certificate pinning — the standard transport defense — is defeated on a compromised device by hooking frameworks like Frida that nullify the pinning check at runtime. Pinning raises the bar against a passive network attacker. It does not, by itself, stop an attacker who owns the endpoint.
That matters because the baseline is low to begin with. A separate large measurement study (Pradeep et al., IMC 2022) found that only roughly 0.9–8% of Android apps and 2.5–11% of iOS apps pin certificates at runtime at all. The majority of apps have no transport-layer defense in the first place — which is precisely why the Wake Forest interception was so easy.
The Remediation Result is the Part Executives Should Read Twice
Three months after the researchers disclosed every finding through the apps' official channels, they re-tested. Only 28% had remediated. 72% were still exploitable.
The reasons are the important part. The persistent cases fell into two groups. The first was unauthenticated backend proxies — there was no credential to revoke, because the vulnerability was the absence of authentication, an architectural gap that disclosure alone doesn't close. The second was broken token logic: static tokens, missing expiry claims, servers accepting expired tokens. In both groups, the failure was structural, not a matter of rotating a leaked string.
This is the finding with the most direct bearing on how a security program should think about the problem. If your only lever is "find the leaked secret and rotate it," you cannot fix an app whose problem is that it has no authentication boundary, or whose token validation is fundamentally wrong. The leaked-string model of credential security simply doesn't describe what's broken here.
This is Not an iOS Problem
It would be easy to read this as an Apple-ecosystem story. It isn't. The same failure modes are already documented across the rest of the mobile landscape — the iOS study is simply the first to measure the LLM-specific version of it on Apple's platform.
- Android, with Google services: The LM-Scout study (Ibrahim et al., 2025) examined LLM integration in Android apps and bypassed usage restrictions in 127 of 181 manually analyzed apps. Its automated tool then generated working attack scripts — fully automatically, no human in the loop — for 120 apps out of 2,950 scanned from the Play Store. The authors' diagnosis is almost identical to the iOS study's: there is no secure, dedicated SDK for LLM integration, so developers enforce restrictions on the client, where they can be removed. A separate 2025 study, SecretLoc, independently confirms that hardcoded secrets remain pervasive in Android binaries, which decompile far more easily than their iOS counterparts.
- Android, without Google services: This is a coverage gap teams routinely miss. Google's Play Integrity API — the platform's app and device attestation mechanism — depends on Google Mobile Services and may simply refuse to run on AOSP and other non-GMS builds. It also reports only coarse verdicts (basic / device / strong integrity) defined by Google rather than by the developer, and community projects exist specifically to spoof those verdicts on rooted devices. So apps that lean on Play Integrity inherit two problems at once: they don't cover the growing population of non-GMS Android devices, and the verdicts they do get are bypassable at scale on compromised hardware.
- Both Android and iOS, at scale: Schmidt et al.'s Leaky Apps (CCS 2025) analyzed secrets across both platforms and found iOS apps more likely to expose secrets than Android — and, echoing the Wake Forest result, that developers who removed embedded credentials in later app versions frequently forgot to revoke the old ones, leaving them live. Two affected apps had over a billion installs each.
- HarmonyOS and HarmonyOS NEXT: No one has yet published the LLM-key-leakage study for Huawei's ecosystem, but every precondition is in place, and there is no reason to expect a different outcome. Since the 2024 NEXT release, HarmonyOS is fully Android-free: apps ship as HAP packages (which are ZIP archives), and ArkTS source compiles to Ark/Panda bytecode (.abc) that is disassemblable and decompilable with publicly available tooling. Huawei provides ArkGuard for obfuscation, but security researchers analyzing the platform are explicit that ArkGuard raises the cost of reverse engineering and "should not be treated as a security boundary on its own." Huawei's own attestation primitive, SysIntegrity (part of Safety Detect / HMS Core), requires the Huawei TEE and only works inside the Huawei ecosystem — the same single-ecosystem limitation that Play Integrity has on Android and App Attest has on iOS.
Put those three platform attestation services side by side and the structural problem is obvious: Play Integrity covers GMS Android, App Attest covers iOS, SysIntegrity covers HarmonyOS, and none of them covers the others. A team shipping a cross-platform app — especially one built in Flutter or React Native — has to stitch together three different, ecosystem-locked, individually bypassable mechanisms, and still has no coverage for non-GMS Android at all. Meanwhile the credential being protected is identical on every platform, because it's the same LLM provider key.
What Actually Closes the Gap
The study points, without naming products, at a specific set of capabilities. Reading its three leakage patterns and its remediation failure together, the requirements fall out almost mechanically.
1. Get the secret off the device, and deliver it just in time
The plaintext-key pattern exists because the credential lives in or passes through the client. The durable fix is for the provider key never to be present in the app package or in client-controlled traffic at all — delivered from a cloud service only at the moment an API call is made, and only after the requesting app instance has been verified. Runtime secrets protection is the term for this: secrets held centrally, injected just-in-time, and — critically given the remediation findings — rotatable instantly across the entire installed base without shipping an app update. The Wake Forest result that developers couldn't or didn't rotate leaked keys is precisely the problem centralized, release-independent rotation removes.
2. Make the backend demand proof of a genuine app, not just a token
The unauthenticated-proxy and broken-JWT patterns are both failures of "who is calling." App and device attestation answers that by having an off-device service verify that the request comes from an untampered build of your app, running in an uncompromised environment, before the backend will respond. The verification decision is made off the device, so it can't be reverse-engineered or patched out the way a client-side check can. A short-lived attestation token, validated server-side, is what the 92 open-relay apps were missing and what the 136 broken-token apps thought they had. Because the decision is centralized rather than dependent on a platform's native service, the same mechanism can cover iOS, GMS Android, non-GMS Android, and HarmonyOS uniformly — closing the coverage gap that Play Integrity, App Attest, and SysIntegrity individually leave open.
3. Treat transport protection as a layered control, not a checkbox
The study's defense analysis is unambiguous: single mechanisms fail, layered ones work. Certificate pinning is necessary but, on its own, defeated by hooking on a compromised device. The effective posture pairs pinning with detection of the very tools used to bypass it — so that when a Frida or similar framework is present, the app is never issued a valid attestation token or secret, and the backend refuses to respond regardless of whether pinning was nominally "bypassed." Two further practical notes from the data: pins should be managed centrally and rotatable over the air, because the study repeatedly shows developers failing to push security changes that require an app release; and the trust anchor should not be the device's own certificate store, which an attacker with device access can populate with their own root.
None of these three is sufficient alone. The plaintext-key apps needed secret externalization; the open-relay apps needed attestation; the weak-pinning apps needed layered transport defense plus anti-hooking. The 64% leakage rate is what you get when all three are absent, which for most of the studied apps they were.
The Takeaway for Developers and CISOs
The Wake Forest study is valuable precisely because it doesn't depend on novel attacks. It uses a man-in-the-middle proxy — a tool that has existed for years — and finds that nearly two-thirds of LLM-integrated iOS apps hand over working credentials to anyone willing to look at their traffic. The LLM angle makes the cost concrete and immediate, because a leaked inference key is a metered, billable resource an attacker can drain from day one, alongside the system prompts that represent the app's actual product. But the underlying weakness — secrets and trust decisions placed on a client the developer doesn't control — predates LLMs and spans every mobile platform.
For developers, the actionable lesson is that "move it to a backend proxy" is necessary but not sufficient; the proxy has to authenticate the caller, and the caller has to be something an attacker can't trivially impersonate. For CISOs, the lesson is in the remediation numbers: 72% are still exploitable after disclosure, because the broken thing was the architecture, not a string in a config file. A credential-security program built only around scanning for and rotating leaked secrets will not move that number. One built around externalized secrets, off-device attestation, and layered transport integrity — applied uniformly across iOS, Android with and without Google services, and HarmonyOS — addresses the failure where it actually lives.
The credential isn't the vulnerability. The channel is. And the channel looks the same on every platform.
Ted Miracco
CEO of Approov
Ted’s high-technology experience spans 30 years in cybersecurity, electronic design automation (EDA), RF/microwave circuit design, semiconductors, and defense electronics.
