Ticket #7733 (closed defect: fixed)

Opened 6 years ago

Last modified 6 years ago

Cannot install Wikipedia-10.xo

Reported by: tomeu Owned by: frances
Priority: blocker Milestone: 8.2.0 (was Update.2)
Component: sugar Version: Development source as of this date
Keywords: 8.2.0:? blocks:8.2.0 Cc: gregorio, jg, cjb, mstone
Action Needed: test in release Verified: no
Deployments affected: Blocked By:
Blocking: #8211

Description

After the download from http://dev.laptop.org/~cjb/eswiki/Wikipedia-10.xo finishes:

  File "/var/lib/python-support/python2.5/dbus/service.py", line 696, in _message_cb
    retval = candidate_method(self, *args, **keywords)
  File "/home/tomeu/sugar-jhbuild/install/share/sugar/service/activityregistryservice.py", line 52, in AddBundle
    return registry.add_bundle(bundle_path)
  File "/home/tomeu/sugar-jhbuild/install/share/sugar/service/bundleregistry.py", line 184, in add_bundle
    bundle = ActivityBundle(bundle_path)
  File "/home/tomeu/sugar-jhbuild/install/lib/python2.5/site-packages/sugar/bundle/activitybundle.py", line 69, in __init__
    self.read_manifest()
  File "/home/tomeu/sugar-jhbuild/install/lib/python2.5/site-packages/sugar/bundle/activitybundle.py", line 113, in read_manifest
    if not self.is_file(line):
  File "/home/tomeu/sugar-jhbuild/install/lib/python2.5/site-packages/sugar/bundle/bundle.py", line 122, in is_file
    path = os.path.join(self._path, filename)
  File "/usr/lib/python2.5/posixpath.py", line 65, in join
    path += '/' + b
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 34: ordinal not in range(128)

Attachments

0001-Filesystem-paths-should-be-byte-arrays-not-unicode.patch (2.1 kB) - added by cscott 6 years ago.
Two-line patch which should fix the problem.
0001-Trac-7733-fix-severe-performance-regression-when-c.patch (6.4 kB) - added by cscott 6 years ago.

Change History

  Changed 6 years ago by cjb

  • keywords blocks?:8.2.0 added
  • priority changed from high to blocker

This looks like a serious regression, bumping to blocker.

follow-up: ↓ 3   Changed 6 years ago by gregorio

  • cc gregorio, jg, cjb added
  • keywords blocks-:8.2.0 added; blocks?:8.2.0 removed

Hi Chris and Tomeu,

If this only affects this wikipedia slice, I can't make it a release blocker. Do you know if it affects other .xo bundles?

If Wikipedia-10.xo used to work fine in 70x, then we should fix the root issue in the OS. If activities need to be updated to resolve this, we should make sure to document what they need to do and get the word out ASAP.

Regardless, I think we should fix it if we can. Do it if you can and if you need help prioritizing something else in or out, let me know what else is on your plate.

Thanks,

Greg S

in reply to: ↑ 2   Changed 6 years ago by cjb

Hi Greg,

Replying to gregorio:

If this only affects this wikipedia slice, I can't make it a release blocker. Do you know if it affects other .xo bundles?

It looks like we would crash on any .xo with a Unicode filename in. (I disagree with not making Sugar regressions like this one release blockers. Just because something only seems to affect one activity now, it still needs to be evaluated separately for how serious it is.)

If Wikipedia-10.xo used to work fine in 70x, then we should fix the root issue in the OS. If activities need to be updated to resolve this, we should make sure to document what they need to do and get the word out ASAP.

Yeah, I think we definitely want to update the OS rather than activities here.

  Changed 6 years ago by jg

I agree with Chris here; it seems to me that we need to fix this....

  Changed 6 years ago by marco

  • owner changed from marco to tomeu

  Changed 6 years ago by kimquirk

  • keywords relnote added

Don't believe this is a regression. If not, is it not a blocker. Can someone provide information that this worked (or didn't) in a previous release.

Also, there is a workaround to remove non-ascii characters - release note.

  Changed 6 years ago by cjb

  • cc mstone added
  • keywords blocks?:8.2.0 added; blocks-:8.2.0 removed

Hi,

Don't believe this is a regression.

Why not? I asserted that it is a regression above.

Can someone provide information that this worked (or didn't) in a previous release.

Yes, I believe the Wikipedia activity was installable in 70x.

Also, there is a workaround to remove non-ascii characters - release note

I suspect it would take more than a day of work to rename Wikipedia's several thousand files to non-Unicode somehow (what about characters that don't *have* an ASCII equivalent?) and to update the HTML links to those files everywhere somehow. I'm not going to be doing that.

I'd like to return this to blocks?:8.2.0, on the grounds that the assumption that it's not a severe regression is unsupported.

  Changed 6 years ago by cscott

Well, you can workaround it by removing the MANIFEST file entirely. Sugar will complain, but the activity will work.

On the other hand, it's not true that Wikibrowse is the only activity with non-ASCII filenames: the library bundles are full of them. This used to be a big problem in olpc-update before I fixed that to handle UTF-8 correctly. I believe it has been "policy" since that time that all filenames on our system should be valid UTF-8. So this seems like a system regression, and one that affects activity authors.

  Changed 6 years ago by mstone

Cjb and I disagree about whether this should block the release. However, we'd both be happy if someone (e.g. Tomeu) looked into it briefly.

  Changed 6 years ago by cscott

I believe the regression occurred in this commit, landed on 2008-06-13: http://dev.laptop.org/git?p=sugar-toolkit;a=commitdiff;h=cfdc17d6c936f2570b98c9978f493fca18d8276f

Reverting this should be fairly safe.

  Changed 6 years ago by marco

Huh no, we cannot revert that one. We will look into the problem.

  Changed 6 years ago by cscott

The root cause seems to be that bundle_path is a unicode string in the top three elements of that traceback. This gets stored into self._path and much much later (at the bottom of the traceback) when os.path.join(u'somestring', 'someotherstring') is called (first arg a unicode, second arg a nonunicode), python implicitly tries to convert the non-unicode string to unicode using the ASCII encoding and dies horribly.

Wasn't a problem before the aforementioned commit, because we didn't actually try to do anything with the MANIFEST contents pre-commit.

I *think* the proper solution is just to declare the relevant dbus function as taking a bytestring rather than a unicode string, since paths are just sequences of bytes (by convention utf-8 encoded), *not* sequences of unicode characters.

  Changed 6 years ago by marco

Note that this code is run inside the shell service, which doesn't import gtk and hence doesn't default to UTF-8 as encoding (most of the sugar code does).

>>> import os
>>> os.path.join('a', unicode('ààà'))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 0: ordinal not in  range(128)
>>> import gtk
>>> os.path.join('a', unicode('ààà'))
u'a/\xe0\xe0\xe0'

I haven't verified if this is relevant at all given your explanation, but I thought I'd better mention it.

  Changed 6 years ago by marco

If we want to go for a quick/non risky fix for 8.2.0 we can revert the bit of the patch which does manifest checking (it looks like it's only going to emit a warning anyway).

follow-up: ↓ 21   Changed 6 years ago by cscott

marco: nope, not relevant. The UTF-8 forcing is being done by dbus; see http://dbus.freedesktop.org/doc/dbus-specification.html#message-protocol-signatures which specifies only UTF-8 encoding for type STRING, and http://dbus.freedesktop.org/doc/dbus-python/doc/tutorial.html#byte-arrays-and-utf8-strings which discusses how the python bindings will decode the UTF-8 and give you a unicode by default unless you pass in the utf8_strings=True keyword option (in which case you get a UTF-8 encoded 'str' object, which is what we want). Another alternative would be to declare all sugar methods which take a filename as taking a byte array as argument (which the python binding will also return as a 'str'), but I think we can live with a UTF-8 restriction on filenames (although it's worth noting that I had to fix the olpc-update manifest format once upon a time to *not* assume valid UTF-8, because our library content was rife with misencoded filenames).

I'm working on a patch.

Changed 6 years ago by cscott

Two-line patch which should fix the problem.

  Changed 6 years ago by cscott

I thought of trying to convert the signature of 'filesystem path' arguments to 'ay', but it seems that the python dbus bindings make it difficult to call methods with arguments of this type; there is no automatic conversion from 'str' to 'ay', as far as I could tell from the documentation. So it seems we'll have to live with the 'filenames must be utf-8 encoded' limitation, which is kinda problematic: see #6799, #6777, #6606, #3498 for some previous times when encoding assumptions have bitten us (and I wasn't able to find the bug I was actually looking for in trac, which is when olpc-update was bitten by this same utf-8 assumption).

  Changed 6 years ago by cscott

Also, from my brief perusal, the datastore may have the same sorts of problems. We should really audit all our dbus services which take filenames as arguments and ensure that they decoding the utf8. The really problematic cases seem to be the ones where we are passing dictionaries through dbus with some keys whose values are filenames. These are all going to be converted to UTF-8 for passage through dbus (since they are converted to dbus type STRING, which is UTF8) which will fail if they have invalid UTF-8 characters in them. We really should be manually marshalling all filename arguments and converting them to dbus.ByteArray for passage over the dbus interface, but this would be a very invasive change.

  Changed 6 years ago by cscott

For a concrete example of the danger of assuming UTF-8 encoded filenames, see #6777. Since Spanish can be represented comfortably in ISO-8859-1, the Peru deployment created a poetry bundle with the bundle name 'poesía-a.xol', which is logical -- except they encoded 'poesía' in ISO-8859-1, which is invalid UTF-8 (the UTF-8 encoding is different for the 'í') and so "library stuff broke". If we continue to assume UTF-8 encoding of filenames sent over dbus, we will continue to have "stuff break" when naive users create files named in their native language on non-XO systems and then move those files/bundles to the XO.

All that said; that's probably a separate bug. I'm pretty sure the two-line patch above will address the problem sufficiently to prevent a regression in 8.2.0.

  Changed 6 years ago by cscott

  • keywords r? added

  Changed 6 years ago by marco

  • next_action changed from diagnose to review

in reply to: ↑ 15   Changed 6 years ago by cscott

Replying to cscott:

marco: nope, not relevant. The UTF-8 forcing is being done by dbus

Rereading (sorry I didn't read more carefully the first time), you were/are correct: the implicit UTF-8 encoding which gtk seems to force is probably responsible for masking most occurrences of this problem. So that gives us some confidence that fixing the few cases where we are *not* importing gtk (these dbus services, and probably the datastore), will suffice for now. But we really shouldn't be passing unicode strings to filename-manipulation methods...

  Changed 6 years ago by cscott

Hmm, there's a major performance problem with:

from sugar.bundle.activitybundle import ActivityBundle
ab = ActivityBundle('Wikipedia-10.xo')

This takes many minutes, even on my fast Thinkpad. On an XO, it's indistinguishable from a hang. =(

  Changed 6 years ago by cscott

Profiling the code reveals that we spend 588 seconds (on my fast Thinkpad) in sugar.bundle.bundle.is_file(). It apparently takes a few seconds to load the ZipFile in line 125, and we repeat this operation (unnecessarily) for each one of the 3,176 files in the bundle. This is unacceptable, and is about the least efficient way possible to check the (otherwise unused) MANIFEST file.

Commenting out 'self.read_manifest()' on line 69 of sugar.bundle.activitybundle.py fixes the performance problem (the code above completes in 4 seconds on an XO) and allows sugar-install-bundle of Wikipedia-10.xo to succeed. This is probably the smallest possible patch for this regression.

  Changed 6 years ago by cscott

Attached is a patch which fixes the performance problem; creating an ActivityBundle for Wikipedia-10.xo takes 5-6 seconds now on an XO.

  Changed 6 years ago by erikos

As nice as the performance win looks in the testcase:

 from sugar.bundle.activitybundle import ActivityBundle
 ab = ActivityBundle('Wikipedia-10.xo')

it has problems in real live:) You have self._unpacked in activitybundle and contentbundle for example but even making those self._packaged does not work. Would need to investigate more.

  Changed 6 years ago by cscott

Yes, I just noticed that I forgot to make corresponding changes in activitybundle and contentbundle. I'll update the patch.

  Changed 6 years ago by cscott

Updated the patch; this one actually works. =)

BTW, the way trac handles updated attachments sucks. There's no record in the history of the old version of this attachment -- which I guess is good, since my mistakes will be lost to history. =)

follow-up: ↓ 30   Changed 6 years ago by tomeu

  • keywords r+ added; r? removed

Both patches look pretty good to me, just two comments:

  • Releasing the zip file in del() will delay the actual release of the instance in case of a cycle, is this a problem in this case?
  • _packed reads like a flag to me, I'd prefer if this variable was named _zip_file instead.

Anyway, r+

  Changed 6 years ago by cscott

Pushed the two-line patch to sugar git.

in reply to: ↑ 28   Changed 6 years ago by cscott

Replying to tomeu:

- Releasing the zip file in del() will delay the actual release of the instance in case of a cycle, is this a problem in this case?

It's hard to imagine how we'd create a cycle with a ZipFile. I suppose we could pass it a file-like object which holds a reference to the ZipFile it was passed to, but that would be very broken. del should not be a problem.

- _packed reads like a flag to me, I'd prefer if this variable was named _zip_file instead.

I'll make that change before committing.

  Changed 6 years ago by cscott

  • next_action changed from review to package

Ok, pushed the performance regression patch, with Tomeu's _zip_file change, to sugar-toolkit git.

  Changed 6 years ago by marco

Scott, this one needs to go on the sucrose-0.82 branch.

  Changed 6 years ago by erikos

  • owner changed from tomeu to cscott

This missed the last rpm boat, scott can you land this in sucrose-0.82?

  Changed 6 years ago by cscott

Ok, cherry-picked these patches into sucrose-0.82 branch.

  Changed 6 years ago by marco

  • next_action changed from package to test in build

  Changed 6 years ago by cjb

Is this #8211?

  Changed 6 years ago by marco

Oh! Yeah I think you are probably right cjb... I didn't think about it because this one was initially about the unicode problem, then we discovered also the perf issue and Scott fixed it.

  Changed 6 years ago by cjb

  • keywords blocks:8.2.0 added; blocks?:8.2.0 relnote r+ removed

We believe this is fixed. I'll try to test 8.2-757 with it.

  Changed 6 years ago by cscott

I don't think these fixes made it into 757, but they should make it into 758.

  Changed 6 years ago by marco

  • next_action changed from test in build to add to build

Yeah they are not in 757, moving to "add to build".

  Changed 6 years ago by cscott

Can someone be more specific about what version of sugar-toolkit has this fix? I need enough information to write a proper changelog for the stable stream.

  Changed 6 years ago by cscott

  • next_action changed from add to build to add to release

  Changed 6 years ago by marco

The packages we need in 8.2 are the following (I tagged them -testing):

sugar-toolkit-0.82.5-1.olpc3 sugar-0.82.2-1.fc9

The fixes landed one in sugar-0.82.1, the other in sugar-toolkit 0.82.2

  Changed 6 years ago by cscott

  • next_action changed from add to release to test in release

Committed to stable repo; should be in build 758 and following; please test.

  Changed 6 years ago by sj

  • blocking 8211 added

  Changed 6 years ago by dsd

I'm not sure what needs testing here. The fact that Wikipedia-10 wouldn't install, or the fact it takes a while to install?

Either way, I downloaded and installed Wikipedia-10 on 8.2-759. After the download completed, it took 1-2 minutes to load the first time I started it from the Journal. It installed and started successfully.

  Changed 6 years ago by marco

The fact that it takes a while to install (failure was tested as part of the first patch landing already). I don't actually know if 1-2 minutes is what we expect here.

  Changed 6 years ago by cjb

Sounds plausible, since it's 100M split between several thousnd files in a .zip that jffs2 has further compressed. As long as we aren't freezing up hard or crashing, I'm happy.

  Changed 6 years ago by mchua

  • owner changed from cscott to frances

|Test case|

On an XO running 8.2-760 (or 761)...

  1. Open Browse and point it to http://dev.laptop.org/~cjb/eswiki/Wikipedia-10.xo
  2. Download the file and then start it (either from Browse or from the Journal)

Pass criteria: The test passes if the Activity launches and displays correctly (there may be a 1-2 min. load time delay) and the links, images, etc. within it seem to generally work.

reassigning to frances to run this test case.

  Changed 6 years ago by mchua

  • status changed from new to closed
  • resolution set to fixed

Test passes. Closing ticket.

Note: See TracTickets for help on using tickets.