Ticket #7587 (closed defect: fixed)

Opened 6 years ago

Last modified 6 years ago

Sugar should start even if NAND is full (or read-only)

Reported by: cscott Owned by: kimquirk
Priority: blocker Milestone: 8.2.0 (was Update.2)
Component: sugar Version: not specified
Keywords: blocks:8.2.0 r+ Cc: gregorio, mstone, marco
Action Needed: qa signoff Verified: no
Deployments affected: Blocked By: #317
Blocking: #7125

Description

Currently sugar crashes when it tries to write to ~olpc/.boottime. There may be other places where sugar fails if it tries to write to a file and cannot; it should not fail.

Attachments

sugar.patch (2.4 kB) - added by cscott 6 years ago.
Patch to 'sugar' package: remove session.info; make log rotation non-fatal.
sugar-base.patch (2.7 kB) - added by cscott 6 years ago.
Patch to 'sugar-base' package: make logging non-fatal; allow env var override of log directory.
sugar-toolkit.patch (1.1 kB) - added by cscott 6 years ago.
Patch to 'sugar-toolkit' package: make creation of log file non-fatal; allow env var override of log directory
rainbow.patch (1.1 kB) - added by cscott 6 years ago.
Patch to 'rainbow' package: make creation of log file non-fatal.

Change History

Changed 6 years ago by cscott

  • blocking 7125 added

(In #7125) Here's a list of tasks associated with this general bug, and trac #s for them:

  • the initscripts should be sure to unfreeze the dcon if/when X fails to start. This ensures that the system is obviously recoverable (you can recover by rebooting with the check key held down, but this is not obvious!). (#7586)
  • sugar should, ideally, start even if flash is full. It is currently failing when writing to ~olpc/.boot_time or some such, and crashing. (#7587)
  • once sugar starts, there should be a message indicating that the NAND is critically full. (#7588)
  • trying to save new content to the journal should also give an obvious message that the NAND is full. (#7589)
  • removing content from the journal should work even if NAND is full. (#7590)
  • automatically remove content from the journal is NAND is full? (controversial) (#7591)
  • Jffs2 is slow when it fills/root should have reserved space (#5317)

Changed 6 years ago by gregorio

  • cc gregorio added
  • keywords blocks:8.2.0 added
  • priority changed from normal to blocker

Changed 6 years ago by erikg

  • owner changed from marco to erikg
  • status changed from new to assigned

Changed 6 years ago by erikg

Plan:

  • On boot, register that the flash device is nearly full.
  • Union-mount a tmpfs over top of a ro root filesystem.
  • Instead of booting into Sugar proper, boot into a recovery-mode user interface which asks the user to pick which Activities and/or files they would like to remove. Allow reboot from this interface.

If the user doesn't remove the large files then they end up back at the removal interface on the next boot.

This pattern will not resolve the problem if the NAND fillup is related to system-level files (e.g. log files). In that case the issue is our responsibility and not the kids, and we'll just have to fix such problems as they arise.

Changed 6 years ago by erikg

Plan for code-level modifications to enable boot of Sugar on ro fs:

I first removed the stanza which writes ~/.boot_time from olpc-session. Now X crashes on startup as it tries to write .Xauthority files on the jffs2 partition. These will have to be moved to a tmpfs (see #317).

Additionally, the Sugar shell logger must be modified to nullroute logs when the flash is full.

Changed 6 years ago by marco

  • next_action changed from never set to code

Changed 6 years ago by cscott

  • owner changed from erikg to cscott
  • status changed from assigned to new
  • next_action changed from code to review

Stealing this bug back.

Trac #317 has fixed the 'X won't start' problems. Now there are just a few issues causing sugar to fail. (Trac #7631 is for the larger issue of "Journal won't start with NAND full"; this bug is just for sugar and activities.)

3 patches are attached, to sugar, sugar-base, and sugar-toolkit.

Changed 6 years ago by marco

  • keywords r? added

Changed 6 years ago by cscott

Patch to 'sugar' package: remove session.info; make log rotation non-fatal.

Changed 6 years ago by cscott

Patch to 'sugar-base' package: make logging non-fatal; allow env var override of log directory.

Changed 6 years ago by cscott

Patch to 'sugar-toolkit' package: make creation of log file non-fatal; allow env var override of log directory

Changed 6 years ago by cscott

Patch to 'rainbow' package: make creation of log file non-fatal.

Changed 6 years ago by cscott

  • cc mstone, marco added

mstone, sugar folk: review, please?

Changed 6 years ago by cscott

With the above patches (and the fixes for #317), sugar starts successfully and displays activities even when the NAND is completely full. The journal doesn't start, but that's a separate bug (#7631). The functionality of sugar seems sufficient to allow journal cleanup either from the Terminal activity or from a special "clean up the journal" activity, as tomeu has suggested.

I'd prefer to make the Journal gracefully handle the "full NAND" case, but again, that's a separate bug.

Changed 6 years ago by mstone

rainbow-0.7.17 built and tagged with a variant of the patch you supplied. Please review and test.

Changed 6 years ago by cscott

I should have mentioned that the enviroinment variable override for the log directory allows you to specify `export SUGAR_LOGS_DIR=/var/tmp/sugar-logs' in your ~olpc/.xsession for debugging failures in NAND-full situations. Normally you wouldn't want you logs in /var/tmp, because they'll be lost when you reboot -- but when you've intentionally filled your NAND for testing, logs in /var/tmp are very useful to find out what's going wrong.

Changed 6 years ago by cscott

  • blockedby 317 added

(In #317) In joyride-2208 (ohm 0.1.1-6.15.20080707git.olpc3, rainbow 0.7.16-1.fc9, and olpc-utils 0.81-1.olpc3). To test:

  • Open up the Terminal activity.
  • Type: "ls -a /var/tmp/olpc-auth". You should see .Xauthority, .ICEauthority, and .Xserverauth files.
  • Type: "echo $XAUTHORITY $ICEAUTHORITY $XSERVERAUTH". The locations of these files in /var/tmp should be reported.
  • Open up the Pippy activity. Type in the following short program, and press "Go":
    import os
    os.system('/bin/bash')
    
  • At the resulting shell prompt, repeat the above echo command. You should see locations for XAUTHORITY and friends which are in /home/olpc/isolation/<mumble>.
  • Type xdpyinfo. You should get some output (ie, xdpyinfo should be able to use the XAUTHORITY setting to connect to the X server.)
  • Close pippy.
  • In the Terminal application, type: "cat /dev/urandom > /home/olpc/space-filler" This may take quite a while, but will eventually report "no space available". Congratulations, your NAND is full!
  • Reboot. X should start up and you should see the sugar home screen (beyond X startup, we've entered the territory of trac #7587).

Changed 6 years ago by marco

  • keywords r+ added; r? removed

I checked in all the sugar patches, with minor changes, thanks. Keeping the review action since rainbow.patch should be handled by Michael.

Changed 6 years ago by mstone

  • next_action changed from review to package

As I suggested above, rainbow-0.7.17 added comparable rainbow patches to joyride last night. Your turn, Marco!

Changed 6 years ago by cscott

  • next_action changed from package to test in build

Should be in joyride-2240 and following. Please test. (Test case above from #317 is applicable.)

Changed 6 years ago by erikos

  • next_action changed from test in build to qa signoff

Changed 6 years ago by mchua

  • owner changed from cscott to kimquirk

Kim wanted to test the NAND-full bugs, so reassigning to kimquirk here. Potentially helpful: http://wiki.laptop.org/go/Tests/Journal/Nand-full (Thanks for the help in understanding NAND-full, cjb and mstone!)

Changed 6 years ago by kimquirk

  • status changed from new to closed
  • resolution set to fixed

This is working very well in 8.2-767.

Note: See TracTickets for help on using tickets.