What's New in Privacy AI
Stay up to date with the latest features and improvements
What’s New
Version 1.1.23
- llama.cpp Upgrade – Updated from b5950 to b6131 for better GPU performance on Apple devices, more efficient KV cache handling, and support for new models: GLM-4.5, SmallThinker, Qwen3-Embedding, and Hunyuan Dense.
- Faster Large Model Imports – Refactored the GGUF file processor to handle 4B+ models with the latest llama.cpp engine, importing quickly without memory overflows.
- Perplexity API Update – Added support for the latest models: sonar, sonar-pro, sonar-deep-research, sonar-reasoning, and sonar-reasoning-pro. Model names can now be edited in Perplexity API settings. (Tip: If you already use the Perplexity API, remove it from the API list and restart the app to apply changes.)
- TTS Settings Enhancement – The TTS settings view now displays your device’s CPU core count to help you choose the optimal thread count.
- Clearer X.com API Key Guide – Added more detailed instructions in Settings for using your X.com API key.
- Tool Description Improvements – Updated the search_contact local tool description so AI knows when and how to use it. Related tools like send_email and send_sms now rely on it for recipient lookup. For example: “Send my wife an SMS and tell her I love her” will first trigger search_contact to find the recipient, then create the message.
- Better HuggingFace Downloads – Model downloads now run in the background and can resume after interruptions. A prompt will notify you when the download is complete—ideal for large models like Qwen3-4B-Thinking/Instruct-2507.
- Tool Search Bar – Added a search bar for local and MCP tools to quickly find what you need as the tool list grows.
- UI Tips Added – Added quick usage tips for API Key Management, Tools, and Remote Services to help new users get started faster.
Bug Fixes
- Fixed: Search bar in Remote Services now filters models correctly.
- Fixed: Removed the “Add” menu from Local Tools, since adding external tools is no longer supported.
- Fixed: Resolved a crash when sending SMS with the send_sms tool.
Version 1.1.21
- Offline Text-to-Speech
An offline TTS model Kokoros-82M (53 distinct voice styles) is now bundled directly in the app. Any AI reply or Reader article can be spoken aloud entirely on-device, so nothing is sent to the cloud and there are no per-character fees. You can also export the generated audio (M4A,WAV,AIFF) to Files, AirDrop, or any media player for later listening.
- API Server Templates and Cloning
You can duplicate an existing server profile—such as the HuggingFace—to create custom endpoints in seconds. All models, headers, tokens, and endpoint settings are copied automatically. You only adjust what differs (base URL, model path, etc.). This is ideal when a provider exposes multiple endpoints under a single API key or when you run several private vLLM clusters.
- Built-in GitHub Provider
GitHub has been added to the list of internal API provider.
- Flexible Model Selection for Forked or Cloned Chats
When you fork an existing conversation or clone it into a new thread, you can now change the underlying model—local or remote—before the next message is generated. The original chat remains intact, and the new branch inherits the full context and tool settings while letting you compare answers or continue with a faster or cheaper engine.
Bug Fixes
• The file-import button that disappeared inside chats is back.
• Manually entered model names now save correctly in API Key configuration screen.
Version 1.1.17
Built-in API Access to z.ai
We’ve embedded native support for https://api.z.ai/api/paas/v4/. You can now call Z.AI services directly from Privacy AI with your token.
Perplexity Model Deprecation
The outdated r1-1776 model from Perplexity has been removed. Please switch to sonar-reasoning or sonar-reasoning-pro for continued access via OpenRouter.
Expanded OpenRouter Protocol Compatibility
Improved protocol handling ensures better performance and compatibility with the latest OpenRouter models and backends.
Major Siri Integration Upgrade
Local Models Now Work with Siri: You can now trigger local models using Siri voice commands. This is made possible by enhanced performance of llama.cpp-based models.
Faster AI Replies for Siri: Adjusted prompt logic helps AI respond within Siri’s strict time limits (typically under 8 seconds). For best results, use fast-response models—avoid long-thinking agents.
Improved Thinking Text View
When thinking models produce long outputs, the text now scrolls smoothly with a visible scrollbar—no more UI lockups on lengthy thoughts.
Search Tool Now Has Speed/Balance/Quality Modes
Choose your preferred mode for the searching_tool. “Speed” mode delivers up to 3× faster results, perfect for quick lookups.
Code Block UI Stability
Markdown rendering has been optimized to remain responsive even when AI outputs include very large code blocks (1000+ lines). No more freezing or slowdowns in the chat UI.
Version 1.1.16
Fixed API Endpoint Save Bug
Resolved an issue where custom API base URLs were not being saved properly in the API settings screen.
Improved URL Input Experience
Disabled automatic capitalization for URL fields to prevent input errors when entering API endpoints.
Streamlined API Server Creation
You can now create a new API server directly from the API Key detail view—making it faster to configure your remote models.
Enhanced Launch Feedback
Added detailed progress indicators during app startup to show what’s being synced or initialized.
Version 1.1.6
- Upgraded to llama.cpp b5950 with expanded local models support. Added support for the following new local GGUF models:
- Menlo_Lucy: https://huggingface.co/Menlo/Lucy
- SmolLM3: https://huggingface.co/blog/smollm3
- OpenReasoning-Nemotron-1.5B: https://huggingface.co/nvidia/OpenReasoning-Nemotron-1.5B
- Comprehensive rebuild and optimization of llama.cpp for iOS
- Rebuilt the entire iOS build system for llama.cpp, generating a smaller and faster xcframework optimized for Apple devices.
- The Swift wrapper has been completely rewritten to improve memory handling and inference throughput.
- Benchmark results show a ~30% performance boost in local model prediction on Apple chips. Technical details: https://privacyai.acmeup.com/docs/article_30_localmodel_optimization.html
- Improved YouTube subtitle handling The YouTube caption downloader now automatically falls back to the default subtitle track when the English caption is unavailable, improving compatibility with non-English content.
- Enhanced Markdown conversion for specific blog websites in Reader Reader now better supports various blog layouts and formats, producing cleaner, more accurate Markdown output for improved content processing and analysis.
Version 1.1.5
- Groq Cloud Support You can now connect directly to Groq API inference service.
- Chat Prompt Sync with iCloud Each chat now syncs its prompt properly across devices via iCloud.
- Faster First-Time Setup on New Devices We’ve optimized iCloud sync performance when launching the app for the first time on a new device. Chats, settings, and models now load faster and more reliably.
- Improved Database Readiness for iCloud Devices All critical data — including API keys and remote server configs — are now synced before the app starts. We’ve also added a “Refresh” button to manually trigger sync if needed.
Version 1.1.4
We’ve just released Privacy AI v1.1.10 (Build 47) with a few important improvements: 1. Fixed a bug that prevented users from switching between text-only and text-to-image chat modes without creating a new session. 2. Resolved a serious performance issue on iPad that caused scrolling to drop below 10 FPS in some cases. 3. Added a new cache management system that significantly improves app launch speed.
v1.1.3Version 1.1.3
Here’s what’s new in this build — coming to App Store globally this weekend:
- Scanned PDF OCR Extract text from scanned PDFs — fully offline, no cloud involved.
- Moonshot API Integration Now supports Moonshot servers like Kimi-K2 natively.
- Parallel Conversations iPhone: run 8 AI chats at once. iPad: up to 12. Seamless multitasking.
- iPad Split View Enhanced Smarter layout adaptation when multitasking on iPad.
- iCloud Key Fix Resolved API key sync issues across your Apple devices.
- Chat Launch Boosted Chats open faster than ever.
- LiquidAI Ready Upgraded llama.cpp to support Liquid series models for offline use.
Version 1.1.2
- Qwen Model Optimization – Switched from 1.7B to 0.6B for better performance on older devices, with strong summarization and tool execution still intact
- YouTube Caption Summarization – Quickly summarize videos using available captions
- Improved Reading – Remembers your last reading position and features a smoother outline view
- Photo Sharing Fixes – Sharing images into Privacy AI now works reliably across apps
- llama.cpp Upgrade – Updated to b5846, with support for Baidu’s ERNIE-4.5 models
Version 1.1.1
This update brings powerful new capabilities—built for professionals, researchers, and AI power users:
- HuggingFace Integration – Connect to any Inference Endpoint with your token
- Polymarket Tool – Analyze real-time prediction markets for research and strategy
- Statistical Toolkit – Run advanced Bayesian and frequentist analysis with any tool-capable model
- MCP Upgrade – Now supports Authorization headers for secure remote access
Version 1.1.0
- Upgraded OpenAI Protocol Support Compatibility and responsiveness with services like Perplexity, Gemini, Anthropic, Mistral, and xAI have been significantly improved by updating to the latest OpenAI-compatible protocol version.
- Faster Web Search Tool The search_web tool has been completely rewritten, resulting in a 60% boost in search speed and responsiveness.
- llama.cpp Core Upgrade Updated to b5760, this release brings full support for the latest Gemma 3n open-source model, enabling faster and smarter on-device AI performance.
- Fork a Chat Anytime Now you can instantly fork any conversation into a new thread—preserving all previous messages and tool settings for seamless exploration.
- Improved Academic Search Accuracy The search_arxiv tool has been fine-tuned for more accurate academic paper search, delivering better results from the ArXiv database.
Ready to Experience Privacy AI?
Download now and take control of your AI experience with complete privacy.
Download Privacy AI