Apple Intelligence on the command line

By Srihari K • March 18, 2026

Apple Intelligence isn’t just a suite of system-wide writing tools or a smarter Siri. For developers, it’s an on-device Foundation Models framework that we can hook into using Python.

The python-apple-fm-sdk ¹ provides a bridge to the ~3B parameter language model running on your Mac’s Neural Engine. It is entirely offline and requires no API keys.

What it can be used for

Because the model is local, it excels at high-volume, low-latency tasks where sending data to the cloud is either too slow, too expensive, or a privacy risk.

Email Classification: Sorting through thousands of subject lines to flag promotions for deletion (as seen in my previous post).
Utility Content Detection: Scanning your Photos library to filter out screenshots, documents, and blurry shots before cloud processing ².
Constrained Extraction: Pulling dates, amounts, or names from unstructured text without paying per-token fees.
On-Device Search: Building semantic search indices for personal notes or files that never leave your machine.

How to use it

The SDK is currently in beta and requires an Apple Silicon Mac running macOS 15.0+ with Xcode 16 installed ³.

Implementation is centered around the LanguageModelSession. It is strictly asynchronous:

import apple_fm_sdk as fm
import asyncio

async def classify_text(text):
    model = fm.SystemLanguageModel()
    if not model.is_available(): return "Unavailable"

    session = fm.LanguageModelSession()
    prompt = f"Is this SPAM or HAM? {text}"
    
    response = await session.respond(prompt)
    return str(response).strip()

Common gotchas

Working with on-device models is different from calling the OpenAI or Claude APIs. Here is what I’ve encountered:

Availability Check: You must always check model.is_available(). It can fail if the model hasn’t finished downloading or if Apple Intelligence is disabled in System Settings.
Serial Execution: While the Python code is async, the Apple Neural Engine (ANE) typically processes one inference request at a time. Parallelizing calls won’t make the hardware go faster.
Content Management: The model is stateless. You are responsible for managing the token limit in your LanguageModelSession.
Async Overhead: Integrating it into synchronous CLI scripts often requires a bit of asyncio boilerplate or a dedicated event loop runner.

Value

On-device AI changes the economics of software. When inference is free and private, you can apply “intelligent” logic to mundane tasks—like cleaning a mailbox or deduplicating photos—that were previously too small to justify a cloud API bill.

python-apple-fm-sdk - The official Python interface for Apple’s Foundation Models. ↩
On-Device Scene Analysis - Apple’s research into using local models for utility content detection. ↩
Apple Intelligence for Developers - Documentation on hardware requirements and framework adoption. ↩