WhatsApp Voice & Transcription Glossary
Last updated June 12, 2026
WhatsApp voice messaging and transcription use a small set of recurring terms, and most of them get mixed up. This glossary defines each one plainly so you can tell a voice note from a voice message, native transcripts from a third-party bot, and on-device processing from cloud processing.
Voice message
A voice message is an audio recording sent inside a chat app instead of typed text. In WhatsApp you record it by holding the microphone button, then release to send. The recipient sees a waveform and a play button rather than words. Voice messages are convenient to record but slow to consume, since the listener has to play the full clip in real time to know what was said.
Voice note
A voice note is the same thing as a voice message: a short spoken audio clip sent in a chat. The two terms are used interchangeably across WhatsApp, Telegram, and Instagram. Some people reserve "voice note" for casual, off-the-cuff recordings and "voice message" for the formal product name, but functionally there is no difference. TxtPlease auto-transcribes both received and sent voice notes.
Voice Message Transcripts (WhatsApp feature)
Voice Message Transcripts is WhatsApp's built-in feature that turns a voice message into readable text on your phone. You enable it in Settings, Chats, then tap an individual voice message to generate its transcript. The transcription runs on-device and is end-to-end encrypted, so the audio never leaves your phone. It is manual and per-message, not automatic, and language support is limited.
Speech-to-text
Speech-to-text (STT) is the technology that converts spoken audio into written text. A speech-to-text system listens to an audio signal, recognizes the words, and outputs a transcript. It powers dictation, captions, voice assistants, and voice-note transcription. Quality depends on audio clarity, accent, background noise, and the language model behind it. Modern STT uses neural networks trained on large multilingual audio datasets.
Transcription
Transcription is the process of writing down what is spoken in an audio or video recording. It can be done by a human typist or by automated speech-to-text software. The output, called a transcript, is a text version of the spoken content. In the WhatsApp context, transcription means converting a voice note into text you can read, search, and skim instead of listening to the audio.
Automatic transcription
Automatic transcription means a voice recording is converted to text by software without any manual step from the user. The text appears on its own as soon as the audio arrives. This contrasts with manual transcription, where you tap each message or forward it somewhere. TxtPlease uses automatic transcription: every incoming WhatsApp voice note is turned into text and posted back into the chat with no action required from you.
Whisper (model)
Whisper is an open-source automatic speech recognition model released by OpenAI in 2022. Trained on a large multilingual dataset, it transcribes and translates speech across dozens of languages and handles accents, background noise, and technical vocabulary reasonably well. Many voice-note transcription tools build on Whisper or similar neural models because it offers strong accuracy without per-language tuning. It can run on servers or, in smaller variants, on local hardware.
WhatsApp multi-device
WhatsApp multi-device is the architecture that lets one WhatsApp account run on several devices at once without the phone staying online. Each linked device holds its own encryption keys and connects independently to WhatsApp's servers. It is what allows WhatsApp Web, desktop apps, and linked services to send and receive messages. TxtPlease connects through this same multi-device system using a one-time QR link.
WhatsApp Web
WhatsApp Web is the browser-based version of WhatsApp that mirrors your account on a computer. You open web.whatsapp.com, scan a QR code with your phone, and your chats appear in the browser. It runs on the multi-device protocol, so recent versions keep working even when your phone is offline. It uses your regular personal number and requires no separate account or Business API.
WhatsApp Business API
The WhatsApp Business API (now the Cloud API) is Meta's paid platform for companies to send and receive WhatsApp messages programmatically at scale. It requires a verified business account, a dedicated number, and usually a provider, and it bills per conversation. It is built for customer support, notifications, and chatbots. It is distinct from a regular personal WhatsApp account. TxtPlease does not use the Business API and works on your normal number.
Linked device
A linked device is any computer or service connected to your WhatsApp account through the multi-device feature, in addition to your primary phone. You see all linked devices in Settings under Linked Devices, where each one is listed with its last activity and can be removed at any time. A linked device can send and receive messages on your behalf. TxtPlease appears here as one linked device.
QR code linking
QR code linking is the method WhatsApp uses to connect a new device to your account. The new device displays a QR code, you scan it once from your phone using the Linked Devices screen, and an encrypted session is established. After the one-time scan no password is exchanged and the link persists until you remove it. TxtPlease is set up with a single QR scan, the same way WhatsApp Web works.
Read receipts (blue ticks)
Read receipts, shown as blue ticks (checkmarks) in WhatsApp, indicate that a message has been read by the recipient. One grey tick means sent, two grey ticks mean delivered, and two blue ticks mean read. For voice messages the ticks turn blue once the recipient opens the chat, separate from whether they actually played the audio. Read receipts can be turned off in privacy settings.
Played receipt
A played receipt is the indicator that tells the sender their voice message has been listened to. In WhatsApp the voice-message microphone icon turns blue once the recipient plays the audio, distinct from the blue ticks that only signal the message was read. Played receipts follow your read-receipt privacy setting: if read receipts are off, played status is not shared either.
On-device transcription
On-device transcription means the speech-to-text conversion happens locally on your phone or computer, not on a remote server. The audio never leaves the device, which is stronger for privacy but limited by the device's processing power, storage, and the language packs installed. WhatsApp's native Voice Message Transcripts feature works on-device. Server-based tools can offer broader language coverage and more compute at the cost of sending audio off-device.
Diarization
Diarization, or speaker diarization, is the process of detecting who spoke when in an audio recording with multiple voices. A diarized transcript labels segments by speaker, for example "Speaker 1" and "Speaker 2," so a conversation reads as a structured dialogue. It is common in meeting and interview transcription. For one-to-one WhatsApp voice notes, which usually have a single speaker, diarization is rarely needed.
Code-switching
Code-switching is the act of alternating between two or more languages within a single conversation or even a single sentence. It is common in multilingual households and regions, for example mixing German and Turkish or English and Hindi. Code-switching is hard for speech-to-text systems that expect one fixed language. Models trained on large multilingual data, such as Whisper-class models, handle mixed-language voice notes far better than single-language transcribers.
Opus (audio codec)
Opus is the audio compression format WhatsApp uses to encode voice messages. It is an open, royalty-free codec designed for speech and music at low bitrates, which keeps voice-note file sizes small while preserving clarity. WhatsApp voice notes are typically delivered as .opus or .ogg files. A transcription tool must decode Opus audio before it can run speech-to-text on the recording.
End-to-end encryption
End-to-end encryption (E2EE) is a security method where only the sender and the intended recipient can read a message; not even the platform's servers can decrypt it. WhatsApp applies E2EE to all messages, including voice notes, using keys held on the devices. Linked devices each hold their own keys under the multi-device model, so this protection extends to WhatsApp Web and to services you connect yourself.
Forward-to-bot transcription
Forward-to-bot transcription is the method where you manually forward a voice message to a third-party bot account, which replies with the text. The transcript lands in the bot's chat thread, not in the original conversation, and the audio is sent to that provider's server to be processed. It works but adds a manual step per message and moves your audio off WhatsApp's direct path. TxtPlease instead posts the text back into the original chat automatically.
Voice note fatigue
Voice note fatigue is the frustration of receiving long voice messages you have to stop and listen to in real time, often when you cannot play audio out loud or want to skim quickly. Unlike text, a voice note cannot be scanned, searched, or read silently in a meeting. It is the core problem automatic transcription solves: turning unskimmable audio into text you can read at a glance.
See also
How it works | FAQ | Blog