Skip to content

Custom Dictation Action

The "Custom Dictation" action elevates voice input by combining advanced speech-to-text technology with the power of Large Language Models (LLMs), offering a highly customizable and context-aware dictation experience.

Sophisticated Dictation Workflow:

Our dictation process unfolds in several intelligent steps:

  1. Recording: Begins when you activate dictation and captures your voice.
  2. Transcription: Once recording stops, your audio is transcribed using your chosen speech recognition model (either on-device for privacy or cloud-based for potentially higher accuracy).
  3. Word Replacements: Predefined substitutions are applied to correct commonly misrecognized words or phrases.
  4. AI-Powered Refinement: The transcribed text is then processed by an LLM (local or cloud-based, using your own API keys). You define a system prompt (e.g., "Correct any spelling or grammar mistakes") and a user prompt (which includes the transcribed text) to guide the AI. This step can correct nuanced errors, improve phrasing, or transform the text according to your specifications.

Key Features:

1. Context-Aware AI Transformations:

Custom Dictation can intelligently adapt its behavior based on your current application.

  • Example (Coding): If you're dictating in a programming IDE, a specific prompt can instruct the AI to format variable or function names into snake_case or camelCase.
  • Example (Email): When dictating in an email client, a different prompt might guide the AI to adopt a more formal tone or incorporate business-specific terminology.

2. Multiple Custom Commands:

You're not limited to a single dictation setup. Create multiple "Custom Dictation" commands, each tailored with different prompts or settings for various tasks or contexts, even within the same application.

3. Placeholders in Prompts:

Utilize placeholders within your system or user prompts for dynamic content insertion. This allows the AI to incorporate information like the current date, clipboard content, or selected text into its processing.

4. Screenshot Integration:

Provide visual context to the AI by including a screenshot with your dictation.

  • Example: If you dictate, "Please correct the names of the people mentioned, based on the attendees list," you can include a screenshot of that list. The AI can then use the visual information to ensure names are spelled correctly, even if the initial transcription had errors.

5. Dictation as an AI Command Interface:

Transform dictation into a powerful command tool for your AI.

  • Example (Code Refactoring):
    1. Set up a Custom Dictation command where the system prompt defines the AI's role (e.g., "You are an expert Swift developer. You will be given a piece of code and instructions to modify it.").
    2. The user prompt can be minimal, primarily passing the transcribed dictation.
    3. Select a piece of code in your IDE.
    4. Activate dictation and say, "Refactor this to be more efficient and add inline documentation."
    5. The AI will process your spoken instructions along with the selected code (if configured to use selected text placeholder) and output the modified code, which can then be automatically pasted back, replacing the original selection.

The "Custom Dictation" action offers unparalleled flexibility, allowing you to tailor voice input to your specific needs and harness AI to transform spoken words into precisely formatted and contextually accurate text or even executable commands.