Design

From Colemak
Revision as of 19:24, 18 October 2020 by Admin (talk | contribs)
(diff) ←Older revision | view current revision (diff) | Newer revision→ (diff)

(draft version)

Colemak has been designed on a basis of logical contraints, filling piece by piece the keyboard layout puzzle. One of the most difficult aspects of keyboard layout design is avoiding high levels of same-finger typing. Thus same-finger is used as one of the primary constraints of the layout design, rather than an afterthought. It's easy to ignore same-finger when designing a keyboard layout, as it's only something you really notice at high proficiency levels of typing.

Colemak is designed to make touch typing natural, by keeping your fingers on the home position most of the time.

There are constraints for each letter and each position. When filling each piece, you go through all the possible letters and positions, and find the next piece that has one obvious candidate that satisfies the constraints.

Colemak attempts to find a layout that is better than Dvorak on all important metrics, while making it reasonably easy to switch from QWERTY, in a way that is pragmatic and balanced.

The following is a simplified recreation. The original process involved a lot more trial and error.

English simplified bigram frequency

Letter: | ideal 0-4 | acceptable 5-9 || too high >= 10
E: Q1 Z2 J3 | O6 U9 || X12 K18 Y21 F26 G28 I35 W36 P37 B41 A50 C60 M72 V74 T91 L94 D127 S136 N138 H237 R250
A: Q0 Z1 X1 J1 O4 | K9 || U13 F17 Y17 G22 B22 V24 P30 I36 W39 D42 C48 E50 M54 H81 L82 S85 R100 T113 N147
I: J0 Q0 Z1 Y1 X2 | U7 || K10 B10 P12 G25 F26 V26 W29 E35 A36 O37 M41 C45 D45 L61 R62 H71 S101 T126 N160
O: Q0 Z0 X1 J3 A4 | E6 K6 || Y13 G14 V15 B18 D24 P32 I37 H41 W42 S45 L46 C47 M54 U73 T98 F107 R113 N130
R: J0 Q0 Z0 W3 K4 V4 | L6 H6 B9 || N10 M11 C14 F16 Y16 G17 D21 P24 S25 U42 T43 I62 A100 O113 E250
U: Y0 W0 Z0 V0 K0 X0 J3 | H6 I7 F8 Q8 E9 || D12 A13 M14 P16 G16 C19 B20 N29 L31 R42 T43 S47 O73
S: Z0 J0 X0 V0 Q0 F1 B3 G3 W4 | K5 Y6 C8 D8 M9 || L10 P14 N24 R25 H27 O45 U47 T79 A85 I101 E136
T: J0 Q0 V0 K0 D0 Z0 M0 G1 B1 X2 | W5 P5 F6 || Y13 L15 C20 R43 U43 N52 S79 E91 O98 A113 I126 H274
L: J0 Q0 X0 Z0 H0 M1 V2 W2 K2 N4 | G6 R6 C8 F9 || S10 P14 B15 T15 D24 Y30 U31 O46 I61 A82 E94
N: Z0 X0 P0 B0 J0 Q0 M1 H1 V2 F3 L4 | W7 Y7 K9 || R10 C23 S24 U29 T52 G65 D95 O130 E138 A147 I160
C: J0 V0 Z0 B0 P0 G0 W0 F0 D0 M0 Q0 Y1 X2 | K8 S8 L8 || R14 U19 T20 N23 H40 I45 O47 A48 E60
H: J0 V0 Q0 Z0 F0 K0 X0 B0 L0 D0 M1 N1 Y3 P3 | U6 R6 || G21 S27 W35 C40 O41 I71 A81 E237 T274
P: Q0 V0 J0 Z0 C0 G0 F0 K0 B0 D0 W0 N0 Y1 X2 H3 | T5 || M11 I12 S14 L14 U16 R24 A30 O32 E37
Y: Q0 J0 X0 Z0 U0 F0 W0 V0 K0 G1 P1 C1 I1 H3 D3 | S6 N7 M7 || B12 T13 O13 R16 A17 E21 L30
D: Q0 X0 Z0 K0 C0 P0 T0 J0 B0 F0 W0 H0 V1 M1 G2 Y3 | S8 || U12 R21 L24 O24 A42 I45 N95 E127
G: Q0 J0 V0 X0 Z0 C0 P0 F0 B0 K0 W0 M0 Y1 T1 D2 S3 | L6 || O14 U16 R17 H21 A22 I25 E28 N65
M: J0 X0 Q0 Z0 V0 W0 K0 C0 F0 G0 T0 H1 D1 N1 L1 B4 | Y7 S9 || P11 R11 U14 I41 A54 O54 E72
B: Q0 F0 X0 Z0 C0 K0 G0 W0 P0 V0 D0 N0 H0 J1 T1 S3 M4 | R9 || I10 Y12 L15 O18 U20 A22 E41
F: Q0 V0 Z0 B0 J0 G0 P0 C0 X0 K0 W0 H0 D0 M0 Y0 S1 N3 | T6 U8 L9 || R16 A17 I26 E26 O107
W: Q0 V0 J0 X0 Z0 B0 C0 M0 P0 G0 F0 K0 U0 Y0 D0 L2 R3 S4 | T5 N7 || I29 H35 E36 A39 O42
K: Q0 J0 X0 Z0 V0 B0 T0 G0 P0 M0 D0 F0 H0 W0 Y0 U0 L2 R4 | S5 O6 C8 A9 N9 || I10 E18
V: F0 P0 Q0 W0 J0 G0 H0 C0 K0 Z0 M0 T0 S0 B0 X0 Y0 U0 D1 L2 N2 R4 | || O15 A24 I26 E74
X: Z0 K0 J0 M0 B0 G0 S0 D0 R0 W0 Q0 L0 Y0 F0 V0 H0 N0 U0 O1 A1 C2 P2 I2 T2 | || E12
Q: B0 F0 G0 J0 K0 P0 V0 W0 Y0 Z0 L0 M0 T0 D0 H0 R0 A0 X0 O0 I0 S0 C0 N0 E1 | U8 ||
J: Q0 C0 K0 V0 W0 Z0 M0 T0 L0 Y0 G0 H0 P0 X0 F0 S0 R0 D0 I0 N0 B1 A1 E3 O3 U3 | ||
Z: Q0 X0 F0 J0 K0 M0 S0 B0 P0 C0 V0 G0 D0 W0 H0 R0 Y0 L0 T0 N0 U0 O0 A1 I1 E2 | ||

Finger information

      Hand:  Left hand               Right hand
    Finger:  Ltl Rng Mid Indx  Indx Mid Rng Ltl
Ideal load:   7%  8% 15%  20%   20% 15%  8%  7%
  Strength:    1   2   4    5     5   4   2   1
   Agility:    2   1   3    5     5   3   1   2
Keys typed:    3   3   3    6     6   3   3   8

It would be optimal to also balance the load of the thumbs. Unfortunately on a normal keyboard there is only the space bar that can be hit by the thumbs.

QWERTY Position Classification

Each QWERTY position is scored on a scale of 1-9, based on the finger scoring and the distance.

QWERTY layout for reference:

QWERT YUIOP[]\
ASDFG HJKL;'
ZXCVB NM,./

On a standard staggered keyboard:

13563 26531111
88997 799883
11352 55421

On a matrix layout:

13563 36531111
88997 799883
12443 34421

Colemak is designed to allow comfortable typing on both staggered and matrix keyboards.

Letter classification

Each letter gets a range of positions classes which will fit.

           Freq,     Range, Colemak Dvorak  QWERTY  Ideal
E        12.49%,         9, 9       9       5 (-4)  9
T         9.28%,         9, 9       9       3 (-5)  9
A         8.04%,       8-9, 8       8       8       9
O         7.64%,       8-9, 8       8       3 (-5)  9
I         7.57%,       8-9, 8       7 (-1)  5 (-3)  8
N         7.23%,       8-9, 9       8       5 (-3)  8
S         6.51%,       8-9, 9       8       8       8
R         6.28%,       7-8, 8       3 (-4)  6 (-1)  7
H         5.05%,       7-8, 7       9       7       8
L         4.07%,       5-7, 6       1 (-4)  8 (+1)  6
D         3.82%,       5-7, 7       7       9 (+1)  7
C         3.34%,       4-6, 3 (-1)  5       3 (-1)  6
U         2.73%,       4-6, 5       9 (+3)  6       5
M         2.51%,       3-5, 5       5       5       5
F         2.40%,       3-5, 5       2 (-1)  9 (+4)  5
P         2.14%,       3-5, 6 (+1)  6 (+1)  1 (-2)  3
G         1.87%,       3-5, 3       6 (+1)  7 (+2)  5
W         1.68%,       3-5, 3       4       3       5
Y         1.66%,       3-5, 3       3       2 (-1)  4
B         1.48%,       2-4, 2       5 (+1)  2       3
V         1.05%,       2-4, 5 (+1)  2       5 (+1)  3
K         0.54%,       2-3, 4 (+1)  5 (+2)  9 (+6)  2
X         0.23%,       1-2, 1       2       1       2
J         0.16%,       1-2, 2       3 (+1)  9 (+8)  2
Q         0.12%,       1-2, 1       1       1       1
Z         0.09%,       1-2, 1       1       1       1
[BKSP]    4.00%,       3-5, 3       0 (-3)  0 (-3)
[.]       1.85%,       2-4, 2       5 (+1)
[']/["]   1.78%,       2-4, 3       2       3       3
[,]       1.56%,       2-4, 4       3
[/],[?]   0.19%,         1, 1       1       1       1
[;], [:]  0.08%,         1, 1       1       8 (+7)  1

The list above (updated 2012) has been taken from: English Letter Frequency Counts: Mayzner Revisited or ETAOIN SRHLDCU.

The historical ETAOIN SHRDLU frequency is not very accurate, as it was based on a very limited corpus. One of the challenges of developing Colemak was the difficulty of finding a high quality corpus (e.g. english-corpora.org) to get accurate letter/bigram frequency data.

There are some keys that don't fit into the ideal range (C, P, V, K and perhaps B), but they're never more than 1 point off in Colemak.

Corpus Resources

When doing frequency research, it's very important to use a large corpus from a wide variety of sources.

More corpus resources:

By class

9: ETAOINS
8:  TAOINSRH
7:        RHLD
6:          LDCU
5:           LCUMWFGYP
4:            CUMWFGYPBV      [. <][' "][, >]
3:              MWFGYPBVK     [. <][' "][, >][BKSP]
2:                    BVKJX   [. <][' "][, >]
1:                       JXQZ [/ ?][; :]

Letters that are in their appropiate positions in QWERTY

The following letters are already in a good position in QWERTY, so we'll try to avoid moving them if possible.

QW??? ?U???[]\
AS??? H????'
ZX?V? ?M??/

Filling in the layout

Backspace/Caps Lock

The backspace key causes significantly more strain than any other key. In typewriter era, the Caps Lock key was important, because there was no other good way to emphasize text.

With computers, that has become obsolete. Caps Lock is a libability for touch writers, as there is no way to know the state of the Caps Lock key without looking down on the keyboard. Some keyboards don't even an indicator. So in those cases, you have to type a letter, check what is outputted on the screen, and if it's in the wrong state press Caps Lock. In case of passwords it's even worse, because you can't even tell.

It's important to keep the existing backspace in place and not convert it to Caps Lock. In my experience Backspace is the key with strongest muscle memory, and removing it would be extremely frustrating for those learning the layout and those switching back and forth between layouts.

Therefore Caps Lock is replaced with an extra Backspace key.

ZXCV

Colemak has been designed to conserve the Z/X/C/V shortcuts. Those are very common and also selected because of their position, and easy access with the left hand, while the right hand holds the mice.

????? ????????
????? ??????
ZXCV? ?????

Punctuation

Colemak keeps the punctuation in the same place. Rare keys take the longest to relearn, and most of the punctuation is very rare. The comma and the dot aren't that rare, but '<' and '>' are. The only punctuation that moves is the rare ';'/':' that wastes a prime spot on the home position. Moreover it's important to keep the punctuation in a logical positions, where '['/']', '<'/'>' and '{'/'}' are next to each other. For aesthetic reasons and for comptability with a wide range of hardware and software, it's important to keep punctuation to the side (for example, when the layout is displayed on a mobile phone).

????? ?????[]\
????? ?????'
ZXCV? ??,./

The QWERTY 'Q' position

The little finger is the weakest finger, and stretches of the little finger cause strain, therefore a low frequency letter needs to be placed there. Since Q is quite rare, and it already fits well in that position, it's a good idea to leave it there, since rare letters take the most time to relearn (since they get much less practice). Also the shortcut Ctrl+Q can be destructive (Quit), it's better to keep it in place.

Z: Q0 X0 F0 J0 K0 M0 S0 B0 P0 C0 V0 G0 D0 W0 H0 R0 Y0 L0 T0 N0 U0 O0 A1 I1 E2 | ||
Available class 1 candidates: QJ;
Q???? ?????[]\
????? ?????'
ZXCV? ??,./

The QWERTY 'P' position

In order to have a logical layout, we keep all punctuation keys to be hit with the right little finger. The semicolon needs to be in a class-1 position. Since the little finger is the weakest finger, and the right little finger needs to do a lot of movements to reach the Enter key and the other punctuation keys.

Available class 1 candidates: J;
Q???? ????;[]\
????? ?????'
ZXCV? ??,./

The letter E

E is by far the most frequent letter in English, so it must be in a Class-9 position. It is excluded from the index finger because it can't be fitted with 5 other keys. It doesn't fit in the left hand with the C. It also makes sense to move 'E' to the right hand, since most of the vowels will be there in order to increase hand alternation.

E: Q1 Z2 J3 | O6 U9 || X12 K18 Y21 F26 G28 I35 W36 P37 B41 A50 C60 M72 V74 T91 L94 D127 S136 N138 H237 R250
Q???? ????;[]\
????? ??E??'
ZXCV? ??,./

The letter A

A need to be in a class 8-9 position. It shouldn't be on the index finger, because the index finger types 6 letters, and therefore will cause a high same-finger ratio. Since A is already good in the home position, there's no reason to move it from there. It also fits perfectly with Q and Z, and it's very important to prevent same-finger on the weaker little finger.

A: Q0 Z1 X1 J1 O4 | K9 || U13 F17 Y17 G22 B22 V24 P30 I36 W39 D42 C48 E50 M54 H81 L82 S85 R100 T113 N147
Q???? ????;[]\
A???? ??E??'
ZXCV? ??,./

The QWERTY 'I' position

E is the most common letter in English, and not many letters would suit. This is a class 5 position. J (class 1-2) and O (class 8-9) are unsuitable. Only U suits.

E: Q1 Z2 J3 | O6 U9 || X12 K18 Y21 F26 G28 I35 W36 P37 B41 A50 C60 M72 V74 T91 L94 D127 S136 N138 H237 R250
Q???? ??U?;[]\
A???? ??E??'
ZXCV? ??,./

The QWERTY 'W' position

The 'W' position is on the ring finger, which is the least dexterous finger, which makes high same-finger more problematic. Since W on QWERTY is already in a good position, and Ctrl+W is a destructive shortcut (Close Window), we keep W in its place.

X: Z0 K0 J0 M0 B0 G0 S0 D0 R0 W0 Q0 L0 Y0 F0 V0 H0 N0 U0 O1 A1 C2 P2 I2 T2 | || E12
QW??? ??U?;[]\
A???? ??E??'
ZXCV? ??,./

The letter I

I is a very common letter and needs to be placed in the home position (class 8-9). The only place that suits is the QWERTY 'L' position. In the other places it will cause high same-finger. In the QWERTY ';' position it will cause high same-finger with the single quote (e.g. I'm, I'll).

I: J0 Q0 Z1 Y1 X2 | U7 || K10 B10 P12 G25 F26 V26 W29 E35 A36 O37 M41 C45 D45 L61 R62 H71 S101 T126 N160
QW??? ??U?;[]\
A???? ??EI?'
ZXCV? ??,./

The QWERTY 'O' position

The only letters remaining that fit with I are J and Y. We can exclude J because it's a class 1 letter and this is a class 3 position. This leaves Y as the only option.

I: J0 Q0 Z1 Y1 X2 | U7 || K10 B10 P12 G25 F26 V26 W29 E35 A36 O37 M41 C45 D45 L61 R62 H71 S101 T126 N160
QW??? ??UY;[]\
A???? ??EI?'
ZXCV? ??,./

The letter O

O need to be in a class 8-9 position, but not on the index finger, so the only remaining position is the QWERTY ';' position.

O: Q0 Z0 X1 J3 A4 | E6 K6 || Y13 G14 V15 B18 D24 P32 I37 H41 W42 S45 L46 C47 M54 U73 T98 F107 R113 N130
QW??? ??UY;[]\
A???? ??EIO'
ZXCV? ??,./

The letter R

R needs to be on the home row, but it can't be on the index finger, because it causes a high same-finger ratio with almost all the letters. It is often suggested to put R in Colemak in the QWERTY 'D' position, so S can maintain it's position, but that would significantly increase the same-finger ratio on the ring finger which is the least dexterous finger.

R: J0 Q0 Z0 W3 K4 V4 | L6 H6 B9 || N10 M11 C14 F16 Y16 G17 D21 P24 S25 U42 T43 I62 A100 O113 E250
QW??? ??UY;[]\
AR??? ??EIO'
ZXCV? ??,./

The QWERTY 'D' position

Must be a class-9 letter (ETAOINSHR). Of these letters only S will not cause a very high same-finger ratio with C. S is already in a good spot on QWERTY but in order to prevent high same-finger, it had to be moved.

C: J0 V0 Z0 B0 P0 G0 W0 F0 D0 M0 Q0 Y1 X2 | K8 S8 L8 || R14 U19 T20 N23 H40 I45 O47 A48 E60
QW??? ??UY;[]\
ARS?? ??EIO'
ZXCV? ??,./

The letter H

R and H have almost the same frequency. The R has already been placed in the home position, that means that H won't be in the home position, but H needs to be in the home row, on either the 'G' or 'H' positions. Moving the H to the 'G' position would be very confusing, so H is left in its QWERTY position. Moreover, it makes sense not to put H in the home position since H is quite rare in other languages. Although 'the' is the most common word in English, it's still better overall to place R on the home position instead of H.

H: J0 V0 Q0 Z0 F0 K0 X0 B0 L0 D0 M1 N1 Y3 P3 | U6 R6 || G21 S27 W35 C40 O41 I71 A81 E237 T274
QW??? ??UY;[]\
ARS?? H?EIO'
ZXCV? ??,./

The letter T

T needs to be in a class-9 position. The only remaining positions are the 'J' and 'F' positions. On the 'J' position it will cause a very high same-finger ratio with the letter H, so it must be in the 'F' position.

T: J0 Q0 V0 K0 D0 Z0 M0 G1 B1 X2 | W5 P5 F6 || Y13 L15 C20 R43 U43 N52 S79 E91 O98 A113 I126 H274
QW??? ??UY;[]\
ARST? H?EIO'
ZXCV? ??,./

The letter N

N needs to be on the home position. The only remaining position is the 'J' position. Now the nine highest frequency letters in English have been placed.

N: Z0 X0 P0 B0 J0 Q0 M1 H1 V2 F3 L4 | W7 Y7 K9 || R10 C23 S24 U29 T52 G65 D95 O130 E138 A147 I160
QW??? ??UY;[]\
ARST? HNEIO'
ZXCV? ??,./

The QWERTY 'G' position

To maximize the home row frequency, a high frequency letter should be put in there. The 10th to 12th most frequent letters are D, L and U. U would cause a high same-finger ratio there because the index finger types 6 keys. L also causes a high same-finger ratio with the letter T, so D needs to be placed there.

T: J0 Q0 V0 K0 D0 Z0 M0 G1 B1 X2 | W5 P5 F6 || Y13 L15 C20 R43 U43 N52 S79 E91 O98 A113 I126 H274
V: F0 P0 Q0 W0 J0 G0 H0 C0 K0 Z0 M0 T0 S0 B0 X0 Y0 U0 D1 L2 N2 R4 | || O15 A24 I26 E74
QW??? ??UY;[]\
ARSTD HNEIO'
ZXCV? ??,./

The letter L

Since L is rather frequent, it needs to be placed on a good position on a strong finger, on the 'E'/'R'/'U'/'I' positions. The 'I' position is already occupied, and putting it in the 'E'/'R' positions would cause a high same-finger ratio, it must be in the 'U' position.

L: J0 Q0 X0 Z0 H0 M1 V2 W2 K2 N4 | G6 R6 C8 F9 || S10 P14 B15 T15 D24 Y30 U31 O46 I61 A82 E94
QW??? ?LUY;[]\
ARSTD HNEIO'
ZXCV? ??,./

The letter M

M is also rather frequent. The only places where it won't cause a high same-finger ratio are on the 'Y', 'N' and 'M' positions. The 'Y' position is a long stretch, so a low frequency letter should be placed there. It makes sense leaving the M in the 'M' position.

M: J0 X0 Q0 Z0 V0 W0 K0 C0 F0 G0 T0 H1 D1 N1 L1 B4 | Y7 S9 || P11 R11 U14 I41 A54 O54 E72
QW??? ?LUY;[]\
ARSTD HNEIO'
ZXCV? ?M,./

The QWERTY 'Y' position

The 'Y' position is a long stretch, so a rare letter should be placed there. Moreover, the right index finger is already doing a lot of work. The rare letters are J or K. J is rarer so we put it there.

QW??? JLUY;[]\
ARSTD HNEIO'
ZXCV? ?M,./

The QWERTY 'N' position

The last few pieces of the puzzle were somewhat less obvious, but with the problem space significantly simplified, it is possible at this stage to easily brute force the different combinations to find the best option.

(incomplete explanation) The 'N' position needs to fit with a lot of other letters. Only K fits there, B also can be considered, as it is a more appropiate position for the price of higher same-finger. But overall, I felt it wasn't worth moving B from its QWERTY position.

L: J0 Q0 X0 Z0 H0 M1 V2 W2 K2 N4 | G6 R6 C8 F9 || S10 P14 B15 T15 D24 Y30 U31 O46 I61 A82 E94
N: Z0 X0 P0 B0 J0 Q0 M1 H1 V2 F3 L4 | W7 Y7 K9 || R10 C23 S24 U29 T52 G65 D95 O130 E138 A147 I160
H: J0 V0 Q0 Z0 F0 K0 X0 B0 L0 D0 M1 N1 Y3 P3 | U6 R6 || G21 S27 W35 C40 O41 I71 A81 E237 T274
M: J0 X0 Q0 Z0 V0 W0 K0 C0 F0 G0 T0 H1 D1 N1 L1 B4 | Y7 S9 || P11 R11 U14 I41 A54 O54 E72
J: Q0 C0 K0 V0 W0 Z0 M0 T0 L0 Y0 G0 H0 P0 X0 F0 S0 R0 D0 I0 N0 B1 A1 E3 O3 U3 | ||
QW??? JLUY;[]\
ARSTD HNEIO'
ZXCV? KM,./

The letters P and G

It's better to put the P in the 'R' position so it won't cause a higher distance on the 'pt'/'tp' combos, that are more common that 'gt'/'tg' combos. It also better to leave 'G' in the same horizontal position from an ease of learning point of view, as it maintains the boundary of the left hand.

P: Q0 V0 J0 Z0 C0 G0 F0 K0 B0 D0 W0 N0 Y1 X2 H3 | T5 || M11 I12 S14 L14 U16 R24 A30 O32 E37
G: Q0 J0 V0 X0 Z0 C0 P0 F0 B0 K0 W0 M0 Y1 T1 D2 S3 | L6 || O14 U16 R17 H21 A22 I25 E28 N65
QW?PG JLUY;[]\
ARSTD HNEIO'
ZXCV? KM,./

The QWERTY 'E' position

(incomplete explanation) The only letters that don't cause high same-finger frequency with S/C are F and B

S: Z0 J0 X0 V0 Q0 F1 B3 G3 W4 | K5 Y6 C8 D8 M9 || L10 P14 N24 R25 H27 O45 U47 T79 A85 I101 E136
C: J0 V0 Z0 B0 P0 G0 W0 F0 D0 M0 Q0 Y1 X2 | K8 S8 L8 || R14 U19 T20 N23 H40 I45 O47 A48 E60
QWFPG JLUY;[]\
ARSTD HNEIO'
ZXCV? KM,./

Design FAQ

Design FAQ