Skip to content

🎙️ Dictation

While many actions can be automated, once you have automated a significant portion of your repetitive tasks on your computer using your keyboard or scripts, you may realize that the most time-consuming task is actually typing extensive amounts of text throughout the day.

That's why we included a dictation feature in our app. We are currently using OpenAI's best-in-class whisper transcription model that runs entirely on your Mac. Your voice and your transcription never leaves your device.

Tip

Disconnect from the internet and test out the dictation featue. It will still work.

How to use

To begin dictation, you have the option to click on the microphone icon in the toolbar or menu bar, or alternatively, you can use a customized keyboard shortcut for more efficient access. To stop dictation, simply click on the microphone icon once again or use the designated keyboard shortcut.

During the transcription process, utilizing the keyboard shortcut will promptly abort the ongoing transcription. Additionally, the microphone icon will transform into a progress view, allowing you to easily halt the process by clicking on it.

Keyboard Shortcut Setting

Typing or Pasting

The transcribed text can be inserted in two ways: either by simulating a paste action, which preserves your current clipboard, or by simulating keystrokes as if you were manually typing the text with your physical keyboard, but at a significantly faster speed.

Typing or Pasting

Initial Prompt

Whisper distinguishes itself from traditional voice-to-text models by incorporating characteristics of an LLM model. To enhance accuracy, particularly for specific words that you possess or believe may not be recognized by Whisper, you have the option to enter them here. This will improve accuracy and allow for customized output.

Keep it short

It's important to keep in mind that Whisper only considers the first 244 tokens of the prompt.

Read more about the initial prompt

OpenAI's API documentation the initial prompt

You can use a prompt to improve the quality of the transcripts generated by the Whisper API. The model will try to match the style of the prompt, so it will be more likely to use capitalization and punctuation if the prompt does too. However, the current prompting system is much more limited than our other language models and only provides limited control over the generated audio. Here are some examples of how prompting can help in different scenarios:

  1. Prompts can be very helpful for correcting specific words or acronyms that the model often misrecognizes in the audio. For example, the following prompt improves the transcription of the words DALL·E and GPT-3, which were previously written as "GDP 3" and "DALI": "The transcript is about OpenAI which makes technology like DALL·E, GPT-3, and ChatGPT with the hope of one day building an AGI system that benefits all of humanity"

  2. To preserve the context of a file that was split into segments, you can prompt the model with the transcript of the preceding segment. This will make the transcript more accurate, as the model will use the relevant information from the previous audio. The model will only consider the final 224 tokens of the prompt and ignore anything earlier. For multilingual inputs, Whisper uses a custom tokenizer. For English only inputs, it uses the standard GPT-2 tokenizer which are both accessible through the open source Whisper Python package.

  3. Sometimes the model might skip punctuation in the transcript. You can avoid this by using a simple prompt that includes punctuation: "Hello, welcome to my lecture."

  4. The model may also leave out common filler words in the audio. If you want to keep the filler words in your transcript, you can use a prompt that contains them: "Umm, let me think like, hmm... Okay, here's what I'm, like, thinking."

  5. Some languages can be written in different ways, such as simplified or traditional Chinese. The model might not always use the writing style that you want for your transcript by default. You can improve this by using a prompt in your preferred writing style.

Replacements

You have the ability to replace frequently inaccurately transcribed words with the correct ones. This feature is particularly useful for words that you believe the model may not recognize or when you wish to customize the output.

For example you can replace chat GPT with ChatGPT.

You can also use regular expressions here. Because Whisper produces a lot of explanatory text in brackets, such as [BLANK AUDIO] or [ Silence ], you can use regular expressions to remove them all at once from the output, instead of adding them one by one as you encounter them.

The following regular expression will remove all the text wrapped inside square brackets.

\[.*\]

Replace square bracket text

Choose a Whisper Model

The whisper model consumes a relatively high amount of memory and disk space. There are smaller models with lower accuracy that consume less resources. In the Preferences window, you can choose which model you want to use for your future dictations. First, you need to download the model by clicking on the download button next to each model's name. After the download is complete, it will show you a checkbox that you can click to activate the model. We suggest starting with the smaller ones and switching to a bigger one if you feel the accuracy is not good enough for you.

Whisper Model

ChatGPT Post Process

To enhance the precision and quality of the transcript, an alternative approach is to employ ChatGPT. This can effectively add punctuation, rectifies grammar errors, and substitutes misidentified words based on the provided context.

Simply input the desired prompt and select the preferred ChatGPT model. Take into consideration the cost and time required for generating the output when making your model selection.

Warning

If you enable this feature, the text after the replacements are applied will be sent directly to OpenAI. You will use your own API key and you will be subject to the privacy policy that you agreed when you started using OpenAI.

You can get your API key from https://platform.openai.com/api-keys.

If this process fails for some reason, such as you are not connected with the internet, the text after replacement will be inserted.

ChatGPT Post Process

Examples

Below are some prompts OpenAI shared as examples. You can use them as a starting point for your own prompts.

Punctuation

You are a helpful assistant that adds punctuation to text. Preserve the original words and only insert necessary punctuation such as periods, commas, capialization, symbols like dollar sings or percentage signs, and formatting. Use only the context provided.

Product and Company Names

You are a helpful assistant for the company ZyntriQix. Your first task is to list the words that are not spelled correctly according to the list provided to you and to tell me the number of misspelled words. Your next task is to insert those correct words in place of the misspelled ones. List: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, AstroPixel Array, QuantumFlare Five, CyberPulse Six, VortexDrive Matrix, PhotonLink Ten, TriCircuit Array, PentaSync Seven, UltraWave Eight, QuantumVertex Nine, HyperHelix X, DigiSpiral Z, PentaQuark Eleven, TetraCube Twelve, GigaPhase Thirteen, EchoNeuron Fourteen, FusionPulse V15, MetaQuark Sixteen, InfiniCircuit Seventeen, TeraPulse Eighteen, ExoMatrix Nineteen, OrbiSync Twenty, QuantumHelix TwentyOne, NanoPhase TwentyTwo, TeraFractal TwentyThree, PentaHelix TwentyFour, ExoCircuit TwentyFive, HyperQuark TwentySix, GigaLink TwentySeven, FusionMatrix TwentyEight, InfiniFractal TwentyNine, MetaSync Thirty, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T.

Earning Call

You are an intelligent assistant specializing in financial products; your task is to process transcripts of earnings calls, ensuring that all references to financial products and common financial terms are in the correct format. For each financial product or common term that is typically abbreviated as an acronym, the full term should be spelled out followed by the acronym in parentheses. For example, '401k' should be transformed to '401(k) retirement savings plan', 'HSA' should be transformed to 'Health Savings Account (HSA)', 'ROA' should be transformed to 'Return on Assets (ROA)', 'VaR' should be transformed to 'Value at Risk (VaR)', and 'PB' should be transformed to 'Price to Book (PB) ratio'. Similarly, transform spoken numbers representing financial products into their numeric representations, followed by the full name of the product in parentheses. For instance, 'five two nine' to '529 (Education Savings Plan)' and 'four zero one k' to '401(k) (Retirement Savings Plan)'. However, be aware that some acronyms can have different meanings based on the context (e.g., 'LTV' can stand for 'Loan to Value' or 'Lifetime Value'). You will need to discern from the context which term is being referred to and apply the appropriate transformation. In cases where numerical figures or metrics are spelled out but do not represent specific financial products (like 'twenty three percent'), these should be left as is. Your role is to analyze and adjust financial product terminology in the text. Once you've done that, produce the adjusted transcript and a list of the words you've changed

History

The History feature is useful when you need to revisit your dictation. Additionally, if you are unsatisfied with the replacements and ChatGPT post-processing, you can refer back to the original transcript before any modifications were made.

The history is saved on your Mac in a local database. It is not synced or shared at all.

Single click on a text will copy the text to your clipboard. Double click will open the history item in a new window.

History

Hallucination

You will notice that sometimes the model will output things that you didn't say, or worse, it will repeat part of what you say many many times ignoring the rest. This usually occurs when you give a pause while you are dictating. Try to decrease the amount of pauses or breaks that you have and try to say what you want to say in one go.