Opened 8 years ago

Closed 7 years ago

Last modified 7 years ago

#884 closed defect (fixed)

Web activity has trouble with arabic and urdu fonts

Reported by: sj Owned by: marco
Priority: blocker Milestone: Ship.2
Component: browse-activity Version:
Keywords: Cc: sj, behdad
Blocked By: Blocking:
Deployments affected: Action Needed:
Verified: yes

Description

The web activity has trouble reading a number of non-Latin fonts; including Urdu and Arabic. Look at www.wikipedia.org for some sample broken fonts. This is a problem observed in 251 and 252.

Change History (47)

comment:1 follow-ups: Changed 8 years ago by jg

  • Cc behdad@… added
  • Milestone changed from Untriaged to BTest-3
  • Owner changed from marco to jg

Behdad,

What fonts exist for Urdu?

SJ, last I knew, Khaled had given us a clean bill of health on Arabic.

Telling us to just look at www.wikipedia.org is not a helpful bug report: we need to have examples of actual pages that have problems to examine; wikipedia is millions and millions of pages (giving a nod to Sagan).

comment:2 in reply to: ↑ 1 Changed 8 years ago by krstic

Replying to jg:

Telling us to just look at www.wikipedia.org is not a helpful bug report: we need to have examples of actual pages that have problems to examine; wikipedia is millions and millions of pages (giving a nod to Sagan).

The point was that www.wikipedia.org is a portal page that shows many different languages, in their respective fonts, on one page.

comment:3 Changed 8 years ago by jg

  • Priority changed from blocker to normal

It is unrealistic for us to have as set of font with enough coverage to image all the scripts on that page: eastern fonts can be very large, and that page covers a large fraction of the scripts of the world. Unless you want to fill flash and RAM with fonts, that is...

If there are specific problems with specific scripts, we'd like to know about them. Fonts don't come for free, in RAM or flash....

It sounds like you have a specific complaint about an urdu font; we certainly need that packaged for OLPC.

comment:4 in reply to: ↑ 1 Changed 8 years ago by behdad

Replying to jg:

Behdad,

What fonts exist for Urdu?

KacstFarsi from the Kacst fonts should be a good start. Not sure if the coverage is enough though.

Urdu Nastaliq Unicode is a very good looking one. Maybe not the best for web use.

Two good Arabic fonts with very broad coverage and reasonably sized are the SIL Lateef and Scheherezade ones. Any of those two should be enough for Arabic and Urdu in the olpc. Scheherezade being the preferred one IMO, though your Urdu and Arabic users may have differing opinions.

There are also the FarsiWeb fonts but they lack coverage for Urdu and don't have a very clear copyright owner.

comment:5 Changed 8 years ago by cjb

  • Milestone changed from BTest-3 to Trial-1
  • Priority changed from normal to blocker

We're still not rendering Arabic or Urdu in the web browser as of build303, however they *are* rendered in the address bar, so there are clearly some appropriate fonts on the system.

comment:6 Changed 8 years ago by cjb

  • Summary changed from Web activity has trouble with arabic and thai fonts to Web activity has trouble with arabic and urdu fonts

comment:7 Changed 8 years ago by cjb

In particular, the problem is with the on-disk wikipedia pages; just follow the Arabic or Urdu link from the main library page to see the problem.

comment:8 Changed 8 years ago by sj

  • Owner changed from jg to marco

comment:9 Changed 8 years ago by sj

For Marco.

comment:10 follow-up: Changed 8 years ago by tomeu

Looks like the arab and urdu pages have wrong attributes in the html element:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">

Changing 'en' to 'ur' or 'ar' makes those fonts appear correctly. The 'ltr' bit looks to be incorrect also, but haven't found any difference when setting to one value or another.

Please note that when you make this change, there's some trouble with some accented characters in the name of the languages in the menu on the right of the page. Any suggestion as to how to show correctly arab, urdu and west european encodings correctly in the same page?

comment:11 in reply to: ↑ 10 Changed 8 years ago by tomeu

Replying to tomeu:

Please note that when you make this change, there's some trouble with some accented characters in the name of the languages in the menu on the right of the page. Any suggestion as to how to show correctly arab, urdu and west european encodings correctly in the same page?

Using the 'lang' attribute in the individual elements that don't use the same script as the document fixes this issue:

http://tlt.psu.edu/suggestions/international/web/tips/langtag.html

comment:12 follow-up: Changed 8 years ago by behdad

Replying to tomeu:

Looks like the arab and urdu pages have wrong attributes in the html element:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">

Changing 'en' to 'ur' or 'ar' makes those fonts appear correctly. The 'ltr' bit looks to be incorrect also, but haven't found any difference when setting to one value or another.

This means that some of the fonts you ship have crappy Arabic glyphs. We can hide them using fontconfig, but even better is to remove them.

Please note that when you make this change, there's some trouble with some accented characters in the name of the languages in the menu on the right of the page. Any suggestion as to how to show correctly arab, urdu and west european encodings correctly in the same page?

Use the xml:lang attribute (not lang). dir="rtl" is recommended too. It has effects on the text line alignment and some other stuff.

comment:13 in reply to: ↑ 12 ; follow-up: Changed 8 years ago by tomeu

Replying to behdad:

Replying to tomeu:

Looks like the arab and urdu pages have wrong attributes in the html element:

<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en" dir="ltr">

Changing 'en' to 'ur' or 'ar' makes those fonts appear correctly. The 'ltr' bit looks to be incorrect also, but haven't found any difference when setting to one value or another.

This means that some of the fonts you ship have crappy Arabic glyphs. We can hide them using fontconfig, but even better is to remove them.

Sorry, what means we have some fonts with crappy Arabic glyphs?

comment:14 in reply to: ↑ 13 Changed 8 years ago by behdad

Replying to tomeu:

Replying to behdad:

This means that some of the fonts you ship have crappy Arabic glyphs. We can hide them using fontconfig, but even better is to remove them.

Sorry, what means we have some fonts with crappy Arabic glyphs?

For Arabic to render correctly, fonts need to have OpenType tables mapping Arabic chars to various glyphs they make take. If the font doesn't have the tables, you get the wrong, disjoint, rendering.

Many Chinese fonts are known to have this problem, because Arabic is used in Uighar language in China, and Chinese fonts have to have glyphs for Arabic to pass some certification level to be used in China...

Bitmap fonts are also unusable for Arabic with Pango for the same reason.

It is possible to fix the problem in Pango, though not going to change soon, but the main problem still remains: that the wrong font is chosen for Arabic.

comment:15 follow-up: Changed 8 years ago by jg

Behdad, we need to sort this font problem out as soon as possible and get rid of the wrong fonts.

What information do you need? xlsfonts + fc-list + our fontconfig file? That would be my guess.

comment:16 in reply to: ↑ 15 Changed 8 years ago by behdad

Replying to jg:

Behdad, we need to sort this font problem out as soon as possible and get rid of the wrong fonts.

What information do you need? xlsfonts + fc-list + our fontconfig file? That would be my guess.

Ok, I have the latest stable build running on my B1. I'll look into this tomorrow (it's 2:30AM here) and see what we need to do.

comment:17 Changed 8 years ago by marco

Behdad, it would be cool if you could test this with build 331. We used to explicitly set families in the mozilla configuration, we removed that in 331.

(Or you can manually edit /usr/share/sugar/gecko-prefs.js)

I think with this change page was rendered correctly with lang="ur", still seeing problems with lang="en" (which is what our on flash copy of wikipedia have).

comment:18 follow-ups: Changed 8 years ago by behdad

Ok, I now see the defect. Urdu and Arabic just don't show up. Weird.

I'm trying to look more into it, but suddenly half of my B1 keyboard stopped working (the QAZ and EDC columns in fact, plus tab and ctrl), and I'm trying to find a USB keyboard. I'm in vacation so it's a bit hard..

If you can try this, it may help: run gucharmap on the laptop, go to the Arabic block and see if they look right. If not, right-click on them and see which font it says is used to render them.

Thanks

comment:19 in reply to: ↑ 18 Changed 8 years ago by cjb

Replying to behdad:

I'm trying to look more into it, but suddenly half of my B1 keyboard stopped working (the QAZ and EDC columns in fact, plus tab and ctrl), and I'm trying to find a USB keyboard. I'm in vacation so it's a bit hard..

It's a known static bug, fixed in B2. You can get the missing keys back by removing all power (AC and battery) for a minute or two (or if that doesn't work, even longer).

comment:20 in reply to: ↑ 18 Changed 8 years ago by tomeu

Replying to behdad:

Ok, I now see the defect. Urdu and Arabic just don't show up. Weird.

The arabic and urdu wikipedia pages we have on the current builds are wrong, but you can edit them so mozilla knows which font has to use:

- <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en lang="en" dir="ltr">
+ <html xmlns="http://www.w3.org/1999/xhtml" xml:lang="ar" lang="ar" dir="rtl">

comment:21 Changed 8 years ago by marco

sj, can we fix the library to use the use the same lang as the original content? See Tomeu diff in the previous comment.

comment:22 Changed 8 years ago by marco

  • Component changed from distro to olpc library

Reassigning to the library component. I think the language should really be fixed (it's not fixed in the latest update either).

comment:23 Changed 8 years ago by jg

  • Cc sj added
  • Owner changed from marco to jg

Sounds like more an sj problem: marco, should this be reassinged?

comment:24 Changed 8 years ago by marco

  • Owner changed from jg to sj

Honestly I'm not sure if mozilla is supposed to still render fonts correctly, even with the wrong xml:lang. Behdad will probably know better.

Though there is definately a bug in the olpc library here, and fixing that one will be enough to make stuff work.

So I'm assiging to sj. Feel free to reassign back once the library problem is solved, so we can figure out the mozilla problem (if there is one).

comment:25 Changed 8 years ago by behdad

Yes, Mozilla is supposed to render Arabic correctly regardless of the language tag. Unfortunately it seems like a bug in gecko 1.9's new text layout code that I'm not familiar with yet.

comment:26 Changed 8 years ago by marco

Ok, thanks.

For Trial1 I think we should just fix the content.

comment:27 Changed 8 years ago by jg

  • Component changed from olpc library to web browser
  • Keywords relnote added; web activity fonts scripts removed
  • Milestone changed from Trial-1 to BTest-3
  • Owner changed from sj to marco

OK, for now we fix our content. SJ is working on that. I think for the trial, it isn't now a blocker, since we have a workaround. But is a longer term blocker we need to get fixed since we can't control all content, so I'm pushing the milestone. It also needs release noting.
SJ, please add a release note in http://wiki.laptop.org/go/OLPC_Software_Release_Notes. And I'm moving the component back to web browser.

comment:28 Changed 8 years ago by blizzard

  • Verified unset

I think that you guys should talk to Robert O'Callahan about this. He's the guy who owns the text code these days and I think he's in the middle of rewriting a ton of it.

comment:29 Changed 8 years ago by jg

  • Milestone changed from BTest-4 to Trial-2

Behdad, Marco, what did Robert O'Callahan say? What are we doing for Trial 2 for this issue?

comment:30 Changed 8 years ago by marco

I posted a bug we Robert cced, let's see what he says.

https://bugzilla.mozilla.org/show_bug.cgi?id=385327

comment:31 Changed 7 years ago by marco

Is this really a blocker? I'd probably consider it normal priority.

From people comments it seem like non trivial. Also I don't think we have expertise to fix this in team right now. If it really is a blocker we should figure out something...

comment:32 follow-up: Changed 7 years ago by jg

  • Cc marco added; behdad@… removed
  • Owner changed from marco to behdad

It is a symptom of a much more general problem, from perusing the mozilla bug reports, that goes beyond Arabic (which is widely used in many countries). Even more fun: it is a regression.

However, I'm disagree with the analysis in the mozilla bug reports, from a quick look there. Fontconfig should be selecting usable fonts for a given set of languages, which it is very good at indeed.

There are two possibilities here:

1) moz is providing the wrong languages to fontconfig for lookup
2) a fontconfig bug; either in computing coverage or the orthography database for particular languages.

Behdad, do you agree?

comment:33 in reply to: ↑ 32 Changed 7 years ago by behdad

Replying to jg:

It is a symptom of a much more general problem, from perusing the mozilla bug reports, that goes beyond Arabic (which is widely used in many countries). Even more fun: it is a regression.

However, I'm disagree with the analysis in the mozilla bug reports, from a quick look there. Fontconfig should be selecting usable fonts for a given set of languages, which it is very good at indeed.

There are two possibilities here:

1) moz is providing the wrong languages to fontconfig for lookup
2) a fontconfig bug; either in computing coverage or the orthography database for particular languages.

Behdad, do you agree?

From my understanding what ff3 is doing is to just pass the explicit language tags to fontconfig, get the first matching font and use it. This is a far cry from what ff2 does with pango.

When using pango like ff2 does, pango guesses languages for pieces of text and passes language tags to fontconfig that make a lot more sense, and if the first found font doesn't handle a character the next font is tried, etc.

The end result is that for an untagged piece of Arabic text under an English locale, ff2+pango detects that Arabic script is doesn't make sense with English language, so tags the Arabic segment as Arabic language. This results in the right font to be selected. AFAIU ff3 is not doing this, and is not doing fallback fonts either.

comment:34 Changed 7 years ago by marco

Behdad comment:

"Current plan is for Vlad to come to Toronto early August so we can fix this
together."

Guess that means we should punt this from Trial-2?

comment:35 Changed 7 years ago by behdad

Actually we had an IRC meeting three nights ago and Stuart is working on a fix.

comment:36 Changed 7 years ago by jg

  • Milestone changed from Trial-2 to Trial-3

comment:37 Changed 7 years ago by behdad

can we get a test page that doesn't render correctly in the XO? All Persian/Arabic pages that I tried render correctly (modulo font issues that are expected).

comment:38 Changed 7 years ago by marco

There are testcases on the mozilla bug report:

https://bugzilla.mozilla.org/show_bug.cgi?id=385327

comment:39 follow-up: Changed 7 years ago by marco

Does not seem to be fixed even with the recent snapshot we have in the builds. Behdad, did the pango work land or when is it going land? (is there a bug about it on b.m.o?)

comment:40 in reply to: ↑ 39 Changed 7 years ago by behdad

Replying to marco:

Does not seem to be fixed even with the recent snapshot we have in the builds. Behdad, did the pango work land or when is it going land? (is there a bug about it on b.m.o?)

No, it's just a patch on my laptop right now. Will try to fix the remaining bugs and submit it today.

comment:41 Changed 7 years ago by marco

  • Milestone changed from Trial-3 to Untriaged

I think we should postpone this. Monday is code freeze and it seem too risky to get it in a new mozilla snapshot (and unless we decide to switch to 24 bits we will have the rendering issues with a snapshot).

I think the only way we could get this in, is if behdad can provide a patch for alpha6 or alpha7, which he is comfortable with, this weekend.

comment:42 Changed 7 years ago by jg

  • Milestone changed from Untriaged to First Deployment, V1.0

Behdad, as soon as you have committed the patch upstream, let us know; we'll package it and put it in our experimental feed for testing. We really do need this for

I agree it's late for Trial-3, unless a miracle occurs.

comment:43 Changed 7 years ago by behdad

Ok, patch is now available here:

https://bugzilla.mozilla.org/show_bug.cgi?id=362682

The first part of the patch is a cairo patch. If you build mozilla with system cairo (which you should), the cairo patch needs to get into the cairo package.

comment:44 Changed 7 years ago by jg

  • Cc behdad added; marco removed
  • Owner changed from behdad to marco

OK, we should try this right after Trial-3 is in the can. With luck, the patch will have been applied by then; it seems no worse on memory loss than the previous state.

comment:45 Changed 7 years ago by marco

  • Resolution set to fixed
  • Status changed from new to closed

Fixed in joyride 19

comment:46 Changed 7 years ago by kimquirk

  • Keywords relnote removed
  • Verified set

comment:47 Changed 7 years ago by kimquirk

  • Milestone changed from Update.1 to Ship.2
Note: See TracTickets for help on using tickets.