Opened 4 years ago

Closed 3 years ago

#10140 closed task (worksforme)

check external device large journal display timing, 10.1.3 vs 11.2.0

Reported by: mikus Owned by: tomeu
Priority: normal Milestone:
Component: journal-activity Version: 1.5/1.0 Software Build os852 aka 10.1.2
Keywords: Cc:
Blocked By: Blocking:
Deployments affected: Action Needed: no action
Verified: no

Description

os121 I have a "permanent" SD card (with more than 50000 inodes in use on it) in my XO. Booted, went to Journal View, clicked on the SD device icon. Horizontal progress bar was displayed - it moved, but after a while it stopped. Nothing further happened. Took system dump. [Clicked on "self" icon (still in Journal View) - regular Journal content was displayed; system seemingly went back to normal operation.]

[Experienced same stall in Journal View with F11_on_XO-1 and a near-identical copy of the same SD card. Way back when, Journal View *was* able to display the content of my SD card. I have since then added files to the "permanent" SD card - so I don't know if it was the "more files", or a newer Journal version, that caused the present problem.]

Attachments (10)

Journstall.tar.bz2 (403.3 KB) - added by mikus 4 years ago.
os121 system dump while Journal stalled trying to display content of large SD card
Jour.png (24.6 KB) - added by mikus 4 years ago.
os852 xo1 Journal screenshot an hour after having clicked on icon for USB stick
logs.SHF8080225D.2010-09-03.13-16-38.tar.bz2 (334.2 KB) - added by mikus 4 years ago.
os852 xo1 System dump when Journal stalled trying to display cantent of USB stick
tmp.patch (1.7 KB) - added by Quozl 4 years ago.
change to the scan function to skip recursive symbolic links, and to avoid reporting most errors.
0001-fix-for-non-completion-of-scan-of-external-media-dev.patch (2.2 KB) - added by Quozl 4 years ago.
Mikus, please try this patch. It should cause the progress bar to be removed and the list of files should appear. The patch has to be applied to /usr/lib/python2.6/site-packages/jarabe/journal/model.py and Sugar restarted.
logs.SHF8080225D.2010-09-07.01-19-37.tar.bz2 (357.5 KB) - added by mikus 4 years ago.
os852 XO-1 - system dump while Journal trying to display volume containing bum symbolic link
model.py (18.5 KB) - added by mikus 4 years ago.
My file as patched for #10140
model.2.py (18.3 KB) - added by Quozl 4 years ago.
file patched by me
model.3.py (18.5 KB) - added by Quozl 4 years ago.
or try this one, a rewrite just posted for review to sugar-devel ... i've tried it on quite a few devices and it performs well even when some files on a disk are unreadable. see http://lists.sugarlabs.org/archive/sugar-devel/2010-September/026673.html for more detail.
10140.patch (7.4 KB) - added by erikos 4 years ago.
Rebased the patch on current head

Download all attachments as: .zip

Change History (41)

Changed 4 years ago by mikus

os121 system dump while Journal stalled trying to display content of large SD card

comment:1 Changed 4 years ago by mikus

os852 XO-1
Wanted to duplicate #10351 - plugged in USB stick, and in Journal clicked on that USB stick's icon. But "Journal View" stalled, without ever displaying the content of that USB stick (which happened to contain about 25000 files). I waited for a whole hour - Journal View was still showing the "progress bar".

Changed 4 years ago by mikus

os852 xo1 Journal screenshot an hour after having clicked on icon for USB stick

Changed 4 years ago by mikus

os852 xo1 System dump when Journal stalled trying to display cantent of USB stick

comment:2 follow-up: Changed 4 years ago by Quozl

Technical analysis; not a request for comment or a reply to the reported issue.

Reviewed the system dump. The dump is valid given the uptime 1:31; it is not post-reboot. The filesystem /media/CAVE is mounted. This system has ample swap space, so a memory depletion issue is unlikely. There was nothing in dmesg to suggest a memory depletion issue. The sugar-session process (2139) showed a VmPeak of 60328 kB, compared with a typical startup VmPeak value of 44268 kB.

sugar/shell.log shows a recursion loop may have been entered; the media contains a path, of /media/CAVE/olpc/isolation/1/uid_to_home_dir/10015/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/instance/tmp for which permission was denied.

sugar/shell.log first reference to /media/CAVE was at 1283514412, and the last timestamp in the log was at 1283519777, an offset of 89 minutes, which is entirely consistent with the uptime.

Diagnosis: the journal is continuing to read the media while exploring an infinite filesystem loop caused by symbolic links.

Interim conclusion: the journal scan logic should be taught to avoid infinite loops. If there is logic already present, then it is failing to trigger on the subject filesystem. The journal scan logic might also be taught to avoid scanning filesystems that contain more than a usual amount of files.

Reviewed the earlier system dump from four months ago. No evidence of filesystem infinite loop, but sugar/shell.log does show permission denied and no such file or directory errors reading the target media.

Alternate diagnosis: any error reading the target media might cause this symptom. This is testable.

comment:3 in reply to: ↑ 2 Changed 4 years ago by mikus

Replying to Quozl:

Diagnosis: the journal is continuing to read the media while exploring an infinite filesystem loop caused by symbolic links.

The Journal may have gotten into a loop -- but I think it really shouldn't have.

/media/CAVE is my "grab bag" USB stick, onto which a year and a half ago I happened to have saved the then content of /home/olpc (which included the '/home/olpc/isolation' directory maintained by Rainbow).

Following up on what you found in shell.log: 'ls' shows there exists in directory /media/CAVE/olpc/isolation/1/uid_to_home_dir/10015 the symbolic link "instance -> ." (which must have been created by whichever Activity was run under ID 10015, or created by Rainbow).

I'm going by the fact that 'ls -lRL' handles /media/CAVE/olpc/isolation/1/uid_to_home_dir/10015; so do 'du -L' and 'find'. If those commands aren't bothered by the bum symbolic link, Journal should be able to handle it, also.


If I remember correctly, when I went into a text console and looked at the output of 'top', I did NOT see anything that made me think that Journal (i.e., 'python') were still using CPU after the initial dozen seconds. The absence of CPU use by Journal makes me suspect that Journal completed its scan of the device, or else got an abnormal termination of that scan -- but NEVER switched its display away from the "progress bar" screen meant to indicate "working on the /media/CAVE content".

comment:4 Changed 4 years ago by Quozl

I've reproduced the problem using a USB HDD that contains an Ubuntu 10.04 for XO-1.5 filesystem. The progress bar does not go away.

Although this filesystem contained recursive symbolic links and generated errors during the scan, fixing those problems (using the attached temporary demonstration patch) did not solve the problem of the progress bar staying on screen.

So I can't say the problem is caused by the filesystem errors. There is another cause.

However, because of the errors, the scan is really really slow. My filesystem had 73386 files. With the errors being reported in the original code, the scan had not completed after half an hour. Without the errors, the scan is complete within a few seconds.

Changed 4 years ago by Quozl

change to the scan function to skip recursive symbolic links, and to avoid reporting most errors.

comment:5 follow-up: Changed 4 years ago by Quozl

Found the problem. The _pending_directories count is damaged by the errors, the ready signal is never emitted, and I've derived a fix. I'm going to work on a patch.

Changed 4 years ago by Quozl

Mikus, please try this patch. It should cause the progress bar to be removed and the list of files should appear. The patch has to be applied to /usr/lib/python2.6/site-packages/jarabe/journal/model.py and Sugar restarted.

comment:6 in reply to: ↑ 5 ; follow-up: Changed 4 years ago by mikus

Replying to Quozl:

Mikus, please try this patch.

os852 XO-1 Tried the patch with the same /media/CAVE USB stick as before. It did not work for me - Journal is still "stalled" when trying to display that device (after many minutes, the "progress" screen is still being displayed).

shows process 'Terminal' being called from time to time (associated with a redisplay of the slider on the "progress" screen ?) -- but no CPU being used by Journal (nor by 'python').?


BTW, I'm not familiar with tools to handle .patch files. Am enclosing the /usr/lib/python2.6/site-packages/jarabe/journal/model.py file that I installed.

Changed 4 years ago by mikus

os852 XO-1 - system dump while Journal trying to display volume containing bum symbolic link

Changed 4 years ago by mikus

My file as patched for #10140

comment:7 in reply to: ↑ 6 ; follow-up: Changed 4 years ago by Quozl

Replying to mikus:

BTW, I'm not familiar with tools to handle .patch files. Am enclosing the /usr/lib/python2.6/site-packages/jarabe/journal/model.py file that I installed.

That file would not have fixed the problem, the reference counter is not fixed.
Please try the file I attach next, model.py with md5sum d1da1ef89ad50f61edf80fe93defeb91.

Changed 4 years ago by Quozl

file patched by me

Changed 4 years ago by Quozl

or try this one, a rewrite just posted for review to sugar-devel ... i've tried it on quite a few devices and it performs well even when some files on a disk are unreadable. see http://lists.sugarlabs.org/archive/sugar-devel/2010-September/026673.html for more detail.

comment:8 in reply to: ↑ 7 ; follow-up: Changed 4 years ago by mikus

Replying to Quozl:
[note: model.2.py *did* display the files on my /media/CAVE USB stick]

try this one

XO-1 os852 (sugar 0.84) - model.3.py looks good - it displayed the files on my /media/CAVE USB stick - it also has a nice "progress slider" visual.

[BTW, I notice that the model.py that comes with sugar 0.88 (in Paraguay build) is different, and is different again with sugar 0.89+ (in F14 build).]

comment:9 in reply to: ↑ 8 Changed 4 years ago by Quozl

  • Milestone changed from Not Triaged to 10.1.3

Replying to mikus:

Replying to Quozl:

try this one

XO-1 os852 (sugar 0.84) - model.3.py looks good - it displayed the files on my /media/CAVE USB stick - it also has a nice "progress slider" visual.

Thanks for testing.

[BTW, I notice that the model.py that comes with sugar 0.88 (in Paraguay build) is different, and is different again with sugar 0.89+ (in F14 build).]

Yes, it was changed between 0.84 and 0.88, and then after 0.88. But for 10.1.3 the fix would have to be in 0.84 form. After the 0.84 fix is written, it can be fixed in 0.88 or 0.89+. I was told that part of the problem was fixed already in the latest development version, but that leaves other parts of the problem still there.

comment:10 Changed 4 years ago by Quozl

Further rewrites:

Just for the record, no testing requested yet.

comment:11 Changed 4 years ago by Quozl

  • Action Needed changed from never set to package
  • Version changed from Development build as of this date to 1.5/1.0 Software Build os852 aka 10.1.2

Patch for 0.84 reviewed by Sascha, proposed to Simon for merging.
http://lists.sugarlabs.org/archive/sugar-devel/2010-September/026818.html

comment:12 follow-up: Changed 4 years ago by Quozl

Separate patch for HEAD proposed.

comment:13 Changed 4 years ago by martin.langhoff

Great!

comment:14 in reply to: ↑ 12 Changed 4 years ago by erikos

Replying to Quozl:

Separate patch for HEAD proposed.

James, did this land in master already? If not, what rests to be done - 'just' a review?

comment:15 Changed 4 years ago by Quozl

Patch for master was proposed upstream but there was no interest. I've checked git, and it hasn't been merged.

comment:16 Changed 4 years ago by Quozl

  • Action Needed changed from package to add to build

I've checked git again, and the patch to master still hasn't been merged, so I've reposted it.

comment:17 Changed 4 years ago by martin.langhoff

  • Cc erikos added

Simon - should we consider this for inclusion? Or bump-out?

comment:18 Changed 4 years ago by mikus

Except for the latest 10.1.3 builds (which changed the base model.py), I've been running with Quozl's patch. It solves the problem for me of Journal View getting stuck when trying to display my (multi-GB) SD card content.

For people with smaller storage devices, the patch "prettifies" the progress indicator (rather than having it become "stuck").

I recommend inclusion.

comment:19 follow-up: Changed 4 years ago by Quozl

  • Milestone changed from 10.1.3 to 10.1.4

My patch for 0.84 was not accepted three months ago on the grounds that it was not in master. It still isn't in master, no maintainer has taken it. So until it is in master there's no point considering it for inclusion in 0.84; because we have an announced policy of not including any patch in 0.84 unless it is in master first.

However, this policy wasn't followed with patch 6876a4f which conflicts with mine, so I'm seeing mixed messages. I can't see how I can contribute to Sugar in a structured and meaningful way. It is very confusing and discouraging.

The merge was trivial, but there has not been the testing of the resultant source that there was for the patch, so I'm suggesting this be fixed in a later version. After all, who uses external media? Our target learners generally can't afford it.

comment:20 in reply to: ↑ 19 Changed 4 years ago by erikos

Replying to Quozl:

My patch for 0.84 was not accepted three months ago on the grounds that it was not in master. It still isn't in master, no maintainer has taken it. So until it is in master there's no point considering it for inclusion in 0.84; because we have an announced policy of not including any patch in 0.84 unless it is in master first.

It has been accepted in Master, both Silbe and myself reviewed it, and I asked you to push it on the mailing list and another time on IRC. So I don't agree with your arguments here.

http://lists.sugarlabs.org/archive/sugar-devel/2010-December/029114.html

comment:21 Changed 4 years ago by erikos

  • Milestone changed from 10.1.4 to 10.1.3

Changed 4 years ago by erikos

Rebased the patch on current head

comment:22 Changed 4 years ago by erikos

Pushed as: http://git.sugarlabs.org/sugar/mainline/commit/c7f625e951df6b679092058b267e86c77b9f04a8

James, please make sure to push it as well on Master.

comment:23 Changed 4 years ago by Quozl

My apologies, but there appear to be conflicting expectations.

http://wiki.sugarlabs.org/go/Development_Team/Code_Review says "After your patch has been acknowledged by a maintainer, (s)he will push your code to the repository, including any tags (Reviewed-By:, Tested-By:, etc.) your patch got." This method has worked fine for me.

http://wiki.sugarlabs.org/go/Development_Team/Source_Code says the maintainers of the sugar module are Aleksey Lim, Martin Dengler, and yourself.

I'm not a maintainer. So you are delegating to me. I'll give it a try, but I'm not sure if there's anything else I should do apart from "git push" and update Patchwork.

In future, how am I to know whether I should commit or not?

comment:24 Changed 4 years ago by Quozl

Right, I'm very unhappy ... and convinced I should not have done it.

git checkout origin/master
git checkout -b 10140-master origin/master
git am patchwork.mbox
git push origin master

Result was

Counting objects: 138, done.
Delta compression using up to 2 threads.
Compressing objects: 100% (55/55), done.
Writing objects: 100% (59/59), 9.81 KiB, done.
Total 59 (delta 26), reused 23 (delta 4)
remote: This method is deprecated and will be removed on the next release. Use 'publish' instead
remote: => Syncing Gitorious... [OK]
To gitorious@git.sugarlabs.org:sugar/mainline.git
   1d069eb..ded807d  master -> master

and somehow 20 commits were pushed instead of the one I had hoped to push!

http://git.sugarlabs.org/sugar/mainline

comment:25 Changed 4 years ago by erikos

  • Action Needed changed from add to build to test in build

Fixed another regression introduced by the merge: 7ffba033a5dc4c51cb5a9d01a96c167a3f217622

I will do another build. Will be in 359.

comment:26 follow-up: Changed 4 years ago by Quozl

I think you fixed it in fc7fab12b8f3ec5c380aa5e6b2cb5b3fc0de2f9c but with the commit message you had intended 7ffba033a5dc4c51cb5a9d01a96c167a3f217622.

Change also pushed to master.
http://git.sugarlabs.org/sugar/mainline/commit/c4faafd51c63bf19e14f23b710d310269dff4b07

For my notes, the git commands I used in previous reply were wrong, since I had patches in my master branch that were not meant to be pushed, and the correct commands were:

git checkout -b for-upstream origin/master
git am patchwork.mbox
git push origin for-upstream:master

I'm told this might also be reduced to:

git clone ...
git am patchwork.mbox
git push

comment:27 in reply to: ↑ 26 Changed 4 years ago by erikos

Replying to Quozl:

I think you fixed it in fc7fab12b8f3ec5c380aa5e6b2cb5b3fc0de2f9c but with the commit message you had intended 7ffba033a5dc4c51cb5a9d01a96c167a3f217622.

Change also pushed to master.
http://git.sugarlabs.org/sugar/mainline/commit/c4faafd51c63bf19e14f23b710d310269dff4b07

Thanks, finally :)

comment:28 Changed 3 years ago by Quozl

  • Action Needed changed from test in build to no action
  • Resolution set to fixed
  • Status changed from new to closed

Tested in os20, is fixed.

(Still the process trauma persists and I haven't tried any serious Sugar patches since. Many of my process questions remain unanswered.)

comment:29 Changed 3 years ago by mikus

I agree that in os20 (F14) the principal problem - the progress bar getting stuck - has been fixed.

But if I remember correctly, the patch that Quozl came up with allowed the Journal View of my SD card to be generated within a "reasonable" amount of time (say, less than 25 seconds). On os20, the Journal View of my SD card seems to take an "unreasonable" amount of time to generate (say, more than 50 seconds).

comment:30 Changed 3 years ago by Quozl

  • Action Needed changed from no action to test in build
  • Cc erikos removed
  • Milestone 11.2.0-M3 deleted
  • Resolution fixed deleted
  • Status changed from closed to reopened
  • Summary changed from External device large Journal does not display to check external device large journal display timing, 10.1.3 vs 11.2.0
  • Type changed from defect to task

My test was with a 160GB USB HDD containing a bootable Ubuntu build for XO-1.5, so it involved quite a lot of files. I did not test for response time regression compared to 10.1.3, and that's a good idea.

comment:31 Changed 3 years ago by Quozl

  • Action Needed changed from test in build to no action
  • Resolution set to worksforme
  • Status changed from reopened to closed

Tested.

10.1.3 os860180 sec
11.2.0 os20177 sec

No response time regression. 11.2.0 is as fast as 10.1.3 at this task. The fix was actually in 10.1.3.

Test set was the aforementioned 160GB USB HDD, 4.1GB used, 135GB free, 133581 lines from find -ls reported in 49 sec.

Mikus, either your card has changed over time (they do that), or the patch you tried was one of my earlier faulty ones.

Note: See TracTickets for help on using tickets.