Generative AI is no longer a futuristic concept; it's a core feature for user interaction. From hyper-personalized assistants in retail apps to intelligent code completion tools, mobile applications are becoming smarter, more interactive, and more powerful, thanks to Large Language Models (LLMs). But with this new power comes a new attack surface that bypasses traditional security measures.
Breaches and issues with AI in mobile apps are starting to be in the news. Here is an example, where call recordings and transcripts were exposed by a mobile app.
Securing a GenAI-powered application isn't just about protecting your servers. It's about securing the entire channel—from the user's fingertips to the LLM and back. Before you even begin to analyze a user's prompt for malicious intent, you must answer a fundamental question: Is this request coming from my genuine mobile app on a safe device, or from a bot, a script, or a tampered version of my app?
Let's start by taking a look at the table stakes: the things you should already be worrying about even before injecting AI into your app.
The Starting Point: Basic App and API Security
First, even without AI, your app accesses your own and third-party APIs to get the job done. You should already be applying these best practices:
- The client is untrusted. Reverse-engineering, repackaging, rooted/jailbroken devices, MitM, scripts, emulators — all apply.
- The backend must authenticate the app instance, not just the user. API calls should be gated with short-lived, server-verified tokens (e.g., Approov JWTs), keep secrets out of the app, and pin TLS to prevent MitM attacks
- Rate limiting and abuse controls are still required (per user, per device/installation, per IP).
- You must be able to rotate certificates and API keys for any API immediately, when needed without reinstalling apps
Now let's look at why the risks presented by the use of AI in mobile apps are different from “traditional” API access.
What’s Different When Mobile Apps Use AI?
When you use AI in your app, the stakes are even higher. Here are 6 reasons why:
- Cost and Impact of DoS Attacks: LLM calls are expensive and expected response times vary widely for AI requests. LLM calls consume expensive GPU/CPU time and a few abusive devices can burn real money very quickly. Attackers can burn budget (“prompt-bombing”) far faster than with typical REST calls. In addition, scraped mobile API keys fuel scripted attacks, and emulators can be used to set up hundreds of “devices” to send long, expensive AI requests (big prompts/contexts) at the same time, multiplying compute cost and load.
- Untrusted free-form content in API calls: In a well-architected flow, the app never forwards a raw prompt; the server composes the final prompt, prepends system instructions, and filters inputs. Still, because LLMs interpret unstructured text/images, hidden instructions can influence behavior or try to trigger tool calls if user content isn’t strictly isolated from the system prompt and tools aren’t tightly allowlisted/validated.You can keep the impact bounded by using server-side prompt construction with guardrails, schema/parameter checks, and user confirmation for side-effects
- Outputs from LLMs must be treated as untrusted data, not executable logic: In a well-designed flow the server never “runs” model text; it validates or constrains it first. Variability in answers typically comes from sampling/decoding choices (e.g., temperature), and even with deterministic decoding, models can produce plausible but wrong results. This is primarily a robustness/quality concern, and it only becomes a security issue when unvalidated outputs can trigger privileged actions (e.g., calling arbitrary URLs or tools). Best practices to avoid issues include use of schema-bound function calling, strict server-side validation/allowlists for URLs and commands, and authorization checks.
- Data handling & privacy risk: In a well-architected flow, the server composes prompts (with prefix/suffix guardrails) and filters inputs/outputs, and real personal data should never be used to train models. A bigger risk surfaces with Retrieval-Augmented Generation (RAG) where an LLM answers questions using facts it just retrieved from a new knowledge source (e.g. something the user provided): if retrieval isn’t tightly scoped, context can bleed across users or sessions. You must constrain RAG to the current user/session/tenant, and treat prompts/outputs as high-risk payloads—minimize/mask PII, avoid logging it, encrypt any on-device caches, and use vendor no-retention and regional routing. Guardrails reduce exposure; careful context scoping prevents cross-session leakage.
- On-device AI models: You may think you can solve the RAG/privacy problem mentioned above If you use models designed to run on the device. There are lightweight ML runtimes from both Google (TFLite) and Apple (Core ML) designed to do image and language processing locally on your phone. There is also GGUF (GPT Graph Universal Format) , which is designed to run LLMs efficiently on local devices and LoRAs (Low-Rank Adaptations) which are specialized, constrained small footprint versions of larger models. App developers are already adopting and embedding all of these in apps. The problem is that now these models and local vector stores become assets to steal (and targets for memory dumps). A competitor could extract your model (IP theft), or an attacker could inject a trojanized model into your app, or create a fake app to impersonate your app and call the ML-powered API. Finally, your “code” now includes prompts, models and vector stores. These will need to be updated over the air, elevating the risk of “supply-chain” poisoning attacks.
- Streaming protocol risks. Long-lived AI streams keep auth valid longer; token theft/reuse or MitM attacks could leak entire conversations. Proper channel protection is even more critical with AI.
How Approov helps protect AI Enabled Apps and APIs
These are all major risks but the good news is that there are ways to address them all by putting in place robust run time protection. Approov allows you to use AI in your apps, while mitigating the risks, maintaining privacy, and maintaining customer experience.
Let's break down how Approov addresses AI related threats:
- Proves the caller (not just the user). Remote attestation verifies the request is from your genuine app on a safe device—blocking emulators, repacks, rooted/debugged environments, and scripted clients before they touch AI endpoints.
- Enforces trust at the edge. Approov issues short-lived, standard JWTs that your WAF/gateway can verify. Approov scopes/claims, which are fine-grained permissions baked into the Approov JWT, can be checked at the edge or backend before allowing an AI action(e.g., tool/action/region/tenant) to allow only intended AI operations and routes.
- Controls cost and abuse. A per-installation identity enables rate limits, quotas, and concurrency caps (context size, output tokens, stream count) per install—throttling prompt-bombing and making mass abuse uneconomic.
- Keeps secrets out of the app. Runtime secrets delivery ensures AI vendor keys, model keys, and backend credentials are never hardcoded—removing the primary fuel for scripted abuse.
- Protects the channel. Dynamic TLS pinning (managed OTA) resists MitM—even on modified devices—and secures long-lived AI SSE/WebSocket streams; tokens are short-lived to shrink replay windows.
- Hardens on-device AI. Just-in-time keys are delivered only to attested apps. Memory-dump detection (Android) and dozens of other indications of device manipulation are used to deny tokens — reducing the risk of model/LoRA/vector-store theft.
- Secures updates & supply chain. Centrally managed, signed OTA config (pins, policies, key sets) plus optional artifact signing controls let you react fast to poisoning or compromise—no app-store release needed.
- Balances UX with security. Fast resumption and token caching keep “time-to-first protected action” fast while still enforcing attestation and edge checks.
In summary, Approov makes AI endpoints accessible only to a verified mobile app instance over a pinned, short-lived, scoped, and observable path—cutting off the major AI-specific abuse vectors (cost fraud, injection-driven tool misuse, data leakage, model theft, and stream hijacking) without sacrificing user experience.
Final Thoughts: A Security-First Mindset for AI in Mobile Apps
Applying a Security-First mindset to AI implementations means treating both the client and the model as untrusted until proven otherwise. The admission ticket for every AI action must be “a genuine app on a safe device, validated with a scoped, short-lived token” and then make sure you constrain tools, minimize data, pin the channel, sign what you ship, and watch cost/abuse in real time.
Do those things, and AI features become fast, private, and durable instead of fragile and expensive. Approov provides the guardrails—remote attestation, scoped JWTs, runtime secrets, dynamic pinning, and OTA controls—so you can ship ambitious AI experiences with confidence, not risk.
By implementing a zero-trust model that begins with client-side attestation, you can confidently block bad actors at the door, allowing you to focus on building the safe, innovative, and intelligent mobile apps of the future.
Approov are experts in mobile app and API security.

George McGregor
VP Marketing, Approov
George is based in the Bay Area and has an extensive background in cyber-security, cloud services and communications software. Before joining Approov he held leadership positions in Imperva, Citrix, Juniper Networks and HP.