Opened 7 years ago

Closed 6 years ago

Last modified 3 years ago

#5317 closed defect (wontfix)

JFFS2 should reserve space for root to prevent "the slows"

Reported by: tomeu Owned by: dsaxena
Priority: blocker Milestone: 12.1.0
Component: kernel Version:
Keywords: Cc: dwmw2, mstone, dilinger, jg, dgilmore, gnu, dsaxena, gregorio
Blocked By: Blocking: #7125
Deployments affected: Action Needed: design
Verified: no

Description

I copied a 240MB file from an usb stick to nand in the journal.

After rebooting, I get this messages followed by a stack trace:

Hello, (children of the) world!

JFFS2 error: (618) jffs2_build_inode_pass1: child dir "fcbb1d630d0ffe62571adc20e91d5ac9" (ino #231628) of dir ino #359 appears to be a hard link

JFFS2 notice: (618) ... orphans ...

Traceback (most recent call last):
  File "/init", line 124, in <module>
    lease_writer, run_init)
  File "/antitheft.py", line 30, in run
    return run_init_callback()
  File "/init", line 105, in run_init
    current = frob_symlink(boot_backup)
  File "/initutil.py", line 215, in frob_symlink
    os.symlink('pristine/'+current, '/sysroot/versions/running')
OSError: [Errno 28] No space left on device

Note that the process that filled the fs was running as the olpc user.

Attachments (1)

0001-JFFS2-add-reserved-pool-feature.patch (11.3 KB) - added by dedekind 7 years ago.
Reserved pool patch

Download all attachments as: .zip

Change History (21)

comment:1 Changed 7 years ago by cscott

  • Cc dwmw2 added
  • Component changed from distro to upgrade utility
  • Milestone changed from Never Assigned to Update.1
  • Owner changed from jg to cscott
  • Priority changed from normal to high

This seems to be a buglet in olpcrd. I think we currently remove and then recreate the 'running' symlink; I should probably just leave it alone if it is already correct. I'll have to think about how to handle the alt-boot case with full disk.

ext2 reserves some percentage of total space for the root user. Does jffs2 not do that?

comment:2 Changed 7 years ago by mstone

  • Cc mstone added

Changed 7 years ago by dedekind

Reserved pool patch

comment:3 in reply to: ↑ description Changed 7 years ago by dedekind

Replying to tomeu:

I copied a 240MB file from an usb stick to nand in the journal.

After rebooting, I get this messages followed by a stack trace:

We solved this in N800 with the attached patch.

Just mount JFFS2 with -o rpsize=8000 and it will reserve _about_ 8Megs of space for root. If you want other uids/gids to use that, add them too (see patch's header).

Because JFFS2 becomes dead slow when it has no space, I recommend you to have reserved pool like 8-16 Megs. Note, slowness when no space is normal for any flash fs because it has to do a lot of Garbage collection in those conditions.

comment:4 Changed 7 years ago by cscott

  • Cc dilinger added

dwmw2, dilnger, could you review? Is this reasonable for update.1? For update.2?

comment:5 follow-up: Changed 7 years ago by dwmw2

Most of the code there is adding this:

rpsize=<size> - size of reserved pool in KiB
rpuid=<uid> - UID of the user who is allowed to use the reserved pool
              (root by default)
rpgid=<gid> - GID of the user who is allowed to use the reserved pool
              (root by default)

I don't think we need the ability to muck with the uid/gid -- it's just the extra reserved space we want to tune. And in fact we want to tune the other, normal, thresholds too. I'd rather expose _those_ by sysfs and just add one more for root vs. non-root on ALLOC_NORMAL.

Will take a look shortly.

comment:6 Changed 7 years ago by cscott

  • Priority changed from high to blocker

We're seeing this on G1G1 machines, apparently.

comment:7 Changed 7 years ago by cscott

  • Cc jg dgilmore added

If we do a interim build 654, then olpcrd 0.38 should go in it.

comment:8 Changed 7 years ago by cscott

See trac #5719 for G1G1 workaround.

olpcrd 0.39, with a more robust version of the fix in olpcrd 0.38, is now in joyride.

I'd still like to see dwmw2's jffs2 fix go in Update.1 if possible.

comment:9 Changed 7 years ago by jg

  • Component changed from upgrade utility to kernel
  • Owner changed from cscott to dilinger

I would too....

comment:11 Changed 7 years ago by jg

  • Owner changed from dilinger to dwmw2

Assigning to Dave, to vet the jffs2 patch....

comment:12 Changed 7 years ago by cscott

  • Cc gnu added

See trac #6442; the olpcrd workaround is confirmed fixed, but I think we'd still like the jffs2 patch, to fix the slowness issues if nothing else.

comment:13 Changed 6 years ago by gnu

  • Blocking 7125 added

(In #7125) It shouldn't take X any disk space to start up, except a logfile that isn't fatal. This is probably some script or sugar thing; what is actually happening?

Last time we had a similar issue (#5317), it was in the initrd, and it was solved by avoiding deleting and rewriting a file if its contents were exactly the same as what we were about to write (the common case). This also saved wear and tear on the flash chips.

Fixing the cause is much preferable to debating which random thing to delete from the filesystem!

#5317 isn't closed yet because it also included a jffs2 patch that reserves some free space for root. That patch was apparently never applied to either the kernel or the nand mount options.

comment:14 Changed 6 years ago by dsaxena

  • Cc dsaxena added

comment:15 in reply to: ↑ 5 Changed 6 years ago by dsaxena

Replying to dwmw2:

Most of the code there is adding this:

rpsize=<size> - size of reserved pool in KiB
rpuid=<uid> - UID of the user who is allowed to use the reserved pool
              (root by default)
rpgid=<gid> - GID of the user who is allowed to use the reserved pool
              (root by default)

I don't think we need the ability to muck with the uid/gid -- it's just the extra reserved space we want to tune. And in fact we want to tune the other, normal, thresholds too. I'd rather expose _those_ by sysfs and just add one more for root vs. non-root on ALLOC_NORMAL.

Will take a look shortly.

David,

Which specific other thresholds do you mean here?

comment:16 Changed 6 years ago by dsaxena

  • Owner changed from dwmw2 to dsaxena
  • Status changed from new to assigned

comment:17 Changed 6 years ago by cscott

  • Action Needed set to never set
  • Summary changed from machine cannot boot after filling nand to JFFS2 should reserve space for root to prevent "the slows"

Retitled bug to reflect that *this particular* instance is focused on the JFFS2 'reserved space' issue. See #7125 for an enumeration of the subtasks associated with the more general problem.

comment:18 Changed 6 years ago by dsaxena

  • Action Needed changed from never set to design

So I'd like to suggest that this is not a blocker b/c reserving space for root will just make the out of space issue be triggered sooner as users have less storage to work with.

comment:19 Changed 6 years ago by dsaxena

  • Cc gregorio added
  • Resolution set to wontfix
  • Status changed from assigned to closed

I believe the issues that keep us from booting on a full file system are now gone and we now have methods to help cleanup a full disk. I am going to mark this as wontfix b/c with limited disk space we really can't afford to make even less of it available.

comment:20 Changed 3 years ago by dsd

  • Milestone changed from 8.2.0 (was Update.2) to 12.1.0

This patch has gone upstream: http://lists.infradead.org/pipermail/linux-mtd/2012-April/040629.html

and will be shipped in the next 12.1.0 build.

Additionally, the rest of the system in the present day is robust against full disk - Sugar boots and you can delete things from the journal.

Note: See TracTickets for help on using tickets.