Ticket #6280 (new defect)

Opened 7 years ago

Last modified 6 years ago

Keyboard creates non-standard accents

Reported by: homunq Owned by: sayamindu
Priority: high Milestone: 8.2.0 (was Update.2)
Component: keyboards Version:
Keywords: accents unicode dead key polish:8.2.0 Cc: mstone
Action Needed: review Verified: no
Deployments affected: Blocked By:


It is neat to be able to put any number of any accents on top of any letter in AbiWord. However, when you cut and paste the resulting text anywhere else, you don't get the standard accented vowels (líkë thìs), but a series of two separate unicode characters. This leads to some interesting but pretty irrelevant bugs (try it in the terminal).

AbiWord should replace character-accent combos with single accented unicode characters when possible, at the very least on cut. (Actually it would be better if it happened immediately, because having to hit backspace twice is unexpected and mostly unuseful behaviour).

Change History

  Changed 7 years ago by marco

  • milestone changed from Never Assigned to Future Release

  Changed 7 years ago by homunq

  • priority changed from normal to high
  • summary changed from Abiword creates non-standard accents to Keyboard creates non-standard accents

I realized that this is not a problem of abiword, it is a problem with the keyboard. This makes text created by this keyboard incompatible with standard unicode text. For instance, if I want to look up an article on Spanish Wikipedia, I can write something that looks identical to "más" but is really four unicode characters long, and so I won't find my article. Haven't tested whether Google has the smarts to ignore this, I wouldn't be surprised, but if not even Google becomes unusable.

I'm sorry, unicode is great, but accented characters are accented characters. Just because you can write them as character plus accent, doesn't mean you don't sometimes NEED the accented character to find something in your language. To me, this is going back to the days of manual search-and-replace to fix encoding problems.

I'm bumping this up to high just to get it re-assessed, I think it's justified but you can put it back down to Normal if you think I'm wrong.

  Changed 7 years ago by AlbertCahalan

See also:

Bug #140, where user ianb wisely saw this coming

Bug #4009, where jhbuild gives pre-composed characters (Normalization Form KC, or perhaps C)

Bug #6125, where the terminal activity and Linux console can't cope

Such accents also do not work in Tux Paint.

The root of the problem is the very strange way that the XO keyboard has been configured. On a normal system you press the accent first. This sets some normally-hidden state in the keyboard driver. (idea: the XO could display this on a screen overlay if the state persists for more than 0.5 seconds) After setting that state, you press the letter. At that point, the whole thing is available in the keyboard driver. Assuming the keyboard driver is not malicious, it then produces the pre-composed form (Normalization Form KC) if possible. The XO is configured to send an initial letter immediately, which might or might not then get followed by a combining accent at some far-future time.

follow-up: ↓ 5   Changed 7 years ago by walter

Which keyboard? Each one is configured differently. On some keyboards we used dead keys. On others, we use Unicode combining characters. The former does not cover the full extent of characters we need for some languages. The latter doesn't automatically convert to single Unicode characters. We could add all of the missing dead keys to X, if someone wants to take that project on.

in reply to: ↑ 4   Changed 7 years ago by AlbertCahalan

Replying to walter:

Which keyboard? Each one is configured differently. On some keyboards we used dead keys. On others, we use Unicode combining characters. The former does not cover the full extent of characters we need for some languages. The latter doesn't automatically convert to single Unicode characters. We could add all of the missing dead keys to X, if someone wants to take that project on.

US International has problems.

You have two workable choices:

1. Use dead keys and/or AltGr as an extra modifier. Note that you can still emit multiple Unicode characters as required, allowing support for everything outside CJK. The u-umlaut must be one character, but a @-umlaut probably can't be.

2. Use a full input method system, which is really just a heavy-duty version of the above. (this may involve a tooltip-like pop up showing character choices, etc.)

The requirement for combining characters does not mean that the keyboard needs to transmit them as if they were wholly independant characters. They can and should be considered as an inseparable part of a whole. The requirement for combining characters on some symbols does not mean that combining characters can or should be used elsewhere.

There should not be any way (outside of hexadecimal entry) to type a loose combining character.

  Changed 7 years ago by walter

I've started working on a dead key version of the US keyboard (See  http://wiki.laptop.org/go/OLPC_Keyboard_layouts#OLPC_section_of_the_XKB_symbol_file_.28dead_key_variant.29). Please try it out to see if it helps.

I don't know the dead key equivalent for the following Unicode combining character codes: combining breve below; combining ring below; combining circumflex below; combining caron below; combining diaeresis below; combining macron below; combining tilde below.

Do they exist already? Shall we define them? Shall we just stick with the Unicode in this case? Or should we teach GTK applications to do something sensible with the Unicode combining accents and "compress" them to a single character? Or all of the above?

  Changed 7 years ago by pierre

  • keywords dead key added

This is really a *major* bug as it prevents input entered with Sugar to be shared with other applications or more conventional computers. To the previous examples, I would add failing to use an orthographic dictionary that will flag all words containing accentuated characters as invalid. Or the impossibility to share texts written with Write with other persons using MS-Word or OpenOffice, even when exporting the text only...

Walter, where is the dead keys US International variant? The previous link seems to be dead.

  Changed 7 years ago by walter

 http://wiki.laptop.org/go/OLPC_Keyboard_layouts/xkb on the bottom half of the page.When I come up for air, I'll define the missing dead keys: ring below circumflex below caron below diaeresis below macron below tilde below

Unless someone beats me to the punch.


  Changed 7 years ago by mstone

  • owner changed from uwog to walter
  • cc mstone added
  • component changed from write-activity (abiword) to keyboards

Folks - when should I expect to see progress on this bug, who is it truly important to, and how bad will things be if it's not fixed?

  Changed 7 years ago by walter

I am working on defining sufficient dead keys so that we can use just dead keys for the US keyboard. I will not be able to complete this in time for Update.1, so we should target Update.2. I will try to get it into Joyride sooner than later, so that it can be well tested.

  Changed 6 years ago by homunq

  • owner changed from walter to sayamindu
  • keywords changed from accents unicode dead key to accents unicode dead key
  • next_action set to review
  • milestone changed from Future Release to 8.2.0 (was Update.2)

I just added the last missing character to  http://wiki.laptop.org/go/OLPC_Keyboard_layouts/xkb . This is ready for shipping. Nominating bug for 8.2

  Changed 6 years ago by homunq

  • keywords polish:8.2.0 added

  Changed 6 years ago by sayamindu

The current input method that we use (in all cases, expcet for Amharic) has a list of predfined sequences - see  http://svn.gnome.org/viewvc/gtk%2B/trunk/gtk/gtkimcontextsimpleseqs.h?view=markup for the list.

homunq: is something missing here ? Could you test a recent joyride and tell me if this works ?

  Changed 6 years ago by pierre

Walter's keyboard layout is really an improvement upon the default US, and should be made the default for latin languages (G1G1 laptop. There are still some non working key combinations, but I don't know if there are special rules preventing them from working. For instance, one can obtain the character å with the keys sequence [AltGr]+[6],[a] but can't obtain a similar character with [e], [o] or [i].

  Changed 6 years ago by pierre

I corrected a few errors on the wiki page:
- guillemontright -> guillemotright
- guillemontleft -> guillemotleft

Also, I suppose that the code for Xorg does not yet include all the dead keys definitions specific to the XO keyboard: in the linux console, I get syntax errors for undefined keysymbols:
- dead_belowbreve
- dead_belowcaron
- dead_belowring
- dead_belowcircumflex
- dead_belowdiaraesis
- dead_belowmacron
- dead_belowtilde

Last, but I suppose that's related to the previous errors, I had to explicitely add the definition for the space key <SPCE> because the generated code, though looking like a space in terminal, was not the traditional 0x20 character.
key <SPCE> { [ space, space, space, nobreakspace ] };

Note: See TracTickets for help on using tickets.