Terminal based applications write two kinds of data to the terminal emulator: printable text that is displayed to the user, and control codes, which modify the terminal emulator’s state. Control codes are either single bytes in the C0 character set (bytes 0x00 through 0x1f) or sequences of bytes that begin with the escape character (0x1b). These sequences are most commonly referred to as “escape sequences”, and it is these sequences that do the bulk of the heavy lifting in terminal applications. (View Highlight)
Most control codes from the C0 character set are not used today, but regardless of experience with terminals or terminal applications, most developers are likely familiar with control codes such as \r (carriage return), which moves the cursor to the beginning of the current line, and \n (line feed), which moves the cursor to the next line. (View Highlight)
Escape sequences are varied and numerous, but the vast majority used in practice fall into one of three categories: Control Sequence Introducer (CSI), Device Control String (DCS), and Operating System Command (OSC). (View Highlight)
CSI sequences are those which begin with the prefix ESC [ (0x1b 0x5b). Escape sequences in this category are those which reposition the cursor, change the cursor style, clear the screen, set foreground and background colors, and more. (View Highlight)
OSC sequences are those which begin with the prefix ESC ] and are typically used for things that modify or interact with the user’s environment outside of the terminal emulator itself (hence the name “Operating System Command”). Examples are reading from or writing to the system clipboard, changing the title of the terminal emulator’s window, or sending desktop notifications. (View Highlight)
Escape sequences are actually quite easy to use, and you can even do it straight from your shell. Try running the following command from any shell:
printf ‘\e[1;32mHello \e[0;4;31mworld!\n\e[0m’
This command will print the text “Hello world!”, with “Hello” in green, bold text and “world!” in red, underlined text.
The escape sequences used here are of the form CSI <parameters> m, which is so common it has its own name: Select Graphic Rendition (SGR). The SGR escape sequence sets foreground and background colors for all printed text. The first escape sequence in the example \e[1;32m enables the bold attribute (1) and sets the foreground color to green (32). The second escape sequence \e[0;4;31m first clears any existing styles (0), then enables the underline attribute (4), and finally sets the foreground text color to red (31). Finally, the last escape sequence \e[0m resets all styles back to their defaults. (View Highlight)
Terminal emulators and terminal applications communicate through a stream of bytes. When a user presses a key the terminal sends the byte representation of the character associated with that key. The old video terminals only supported ASCII so this was, generally, fairly straightforward. (View Highlight)
Modifier keys like Ctrl and Alt complicate this situation. Alt modified keys are encoded by prefixing the character with an Esc. But this has a problem: including an extra Esc byte for the Alt modifier introduces ambiguity between Alt modified key presses and two separate key presses. When an application sees Esc C, should it interpret it as Alt-C or did the user press Esc and then press C? Applications usually solve this by measuring the amount of time between Esc and the next character. If the time is less than some defined interval, it is considered an Alt modified key press (Vim uses the ttimeoutlen option, tmux uses the escape-time option). (View Highlight)
Ctrl modified keys are an even bigger problem. When Ctrl is used as a modifier, the shifted2 version of the key has the 7th bit masked off (for example, C is 0x43 and after masking the 7th bit the byte becomes 0x03). This means that not only can the Shift modifier not be used in conjunction with Ctrl, but that certain Ctrl modified keys are completely indistinguishable from other control codes. (View Highlight)
Note: Here, control codes mean the C0 control codes above
For instance, when you press the Return key the terminal emulator sends the byte \r (0x0d) to the application. But if you press Ctrl-M then the terminal emulator also sends the byte 0x0d to the application (M is 0x4d in ASCII, so when the 7th bit is masked out, it becomes 0x0d). From the application’s perspective, there is literally no way to distinguish these two events. (View Highlight)
For a long time this meant that certain modified keys like Ctrl-I, Ctrl-J, and Ctrl-M could not be used in terminal applications like Vim. There have been a few attempts to solve this problem: the first came from Xterm in 2006 through the modifyOtherKeys option. Paul Evans (author of libvterm and libtickit) introduced an alternate key encoding using the CSI u escape sequence in an essay which is sometimes colloquially referred to as “fixterms”. The CSI u encoding proposed by Evans was extended by Kovid Goyal, the author of the kitty terminal emulator, in what has become known as the kitty keyboard protocol. (View Highlight)
What all of these solutions have in common is that key presses are sent to the terminal application encoded as escape sequences. This eliminates any ambiguity for modified keys and enables certain modifier combinations (such as Ctrl + Shift) that are not possible using “legacy” encoding. The CSI u encoding proposed by Evans and adapted by kitty encodes a modified key press like Ctrl-M as \e[109;5u. The encoding of unmodified key presses like Return depend on which “level” of the kitty keyboard protocol is enabled. Applications can opt-in to different levels to ease adoption (for instance, Neovim uses only the first level, “Disambiguate escape keys”). See the kitty documentation for more details. (View Highlight)
Terminal emulators do not all support the same features. In some cases, the same feature is implemented in different ways. Terminal applications need some way to know which features the terminal they’re running in support and how to properly use those features.
Today this is primarily done using a distributed database of “terminfo” files. The terminal emulator uses the $TERM environment variable to communicate to terminal applications which terminfo file to use to lookup which capabilities the terminal supports. (View Highlight)
This has a multitude of problems, however. The terminfo database is part of the ncurses library, and different operating systems and distributions package different versions of ncurses. This was a problem for tmux users on macOS for many years because the version of ncurses packaged with macOS was so old that it did not even include the tmux-256color terminfo entry at all! (View Highlight)