Opened 10 years ago

Closed 9 years ago

Last modified 20 months ago

#2517 closed enhancement (fixed)

Optimize verification of live incremental upgrades

Reported by: cscott Owned by: mstone
Priority: high Milestone:
Component: upgrade utility Version:
Keywords: Cc: cscott, mstone
Blocked By: Blocking:
Deployments affected: Action Needed:
Verified: no


Live upgrades, based on "latest version" data received from antitheft server.

Attachments (1)

times.txt (4.2 KB) - added by mstone 10 years ago.
Profiling data on an update run

Download all attachments as: .zip

Change History (12)

comment:2 Changed 10 years ago by cscott

  • Cc cscott mstone added
  • Owner changed from cscott to mstone

Reassigning to mstone, as he's taking the lead for this bug. We'll going to press real hard to get some version of this into the MP build.

comment:3 Changed 10 years ago by kimquirk

  • Priority changed from normal to blocker

This should have been a blocking bug for Trial-3 since upgrades have to work for the code that gets loaded onto 40,000 laptops.

comment:4 Changed 10 years ago by mstone

  • Milestone changed from Trial-3 to Untriaged
  • Priority changed from blocker to high
  • Summary changed from Live incremental upgrades to Optimize verification of live incremental upgrades
  • Type changed from defect to enhancement

As of build 595, the feature "live incremental upgrades" works.

Unfortunately, it works slowly and uses lots of RAM: it takes about 30 minutes to perform an update (with full manifest verification). Preliminary measurements indicate that there is substantial room for improvement in both RAM and time usage.

Therefore, I'm changing the summary, type, priority, and milestone to indicate that this is now an optimization bug.

Changed 10 years ago by mstone

Profiling data on an update run

comment:5 Changed 10 years ago by mstone

Our 30-minute run time seems to be overwhelmingly dominated by a time spent in our JSON parser. Unfortunately, most other JSON parsers, including simplejson and python-cjson, are optimized with C extensions. We would like to avoid C extensions in order to reduce the risk of buffer overflows, which adds slightly to the cost of using one of these other parsers.

Hence: how much developer time are we willing to spend to get a 20-minute run time, a 15-minute run time, or a 10-minute run time?

comment:6 Changed 10 years ago by jg

  • Component changed from distro to upgrade utility
  • Milestone changed from Untriaged to First Deployment, V1.0

Quite a bit. You may not be able to avoid using C extensions, unfortunately. Even 10 minutes is *insane*. If upgrades take too long, people will start avoiding upgrading, which will be an even bigger security threat; they should be fast.

comment:7 Changed 10 years ago by cscott

I reimplemented this in 545 lines of C. Speed on XO B4, checking build 602:

  • SHA + RMD: 6m 6s (4m30s user)
  • SHA only: 4m46s (3m11s user)
  • RMD only: 2m54s (1m20s user)
  • no hash : 1m37s ( 3s user)

One and a half minutes of the time are spent in the kernel getting the data off the NAND. There's some busy-waiting going on here: dwmw2 indicated that we need to pause for 1ms to let the data arrive from NAND, and if we yielded the kernel, we wouldn't get back for 10ms.

The RIPEMD160 hash is about twice as fast as SHA256 in this implementation (libgcrypt). It's also much weaker than SHA256, but maybe we don't need the extra strength. Clearly we should disable one or the other of the algorithms; we can't afford to double our hashing time.

We might also look for better implementations of the hash functions. Libgcrypt has almost certainly not been tuned for our architecture; I don't have a good sense for how its speed compares to (say) libtomcrypt or openssl. Getting a little bit more performance out of RIPEMD, for example, would make manifest verification really look good.

comment:8 Changed 10 years ago by mstone

cscott and I have agreed that we will further control for filesystem corruption by using RIPEMD and stat() checks on every file in the new tree but that we will run full checking only on inodes not contained in the old pristine tree.

We did not decide what kind of checking should be performed on directories.

comment:9 Changed 10 years ago by cscott

Times with libtomcrypt, compiled with gcc-4.3 -march=geode -O9:

  • SHA+RMD: 5m17s (3m 5s user)
  • SHA only: 4m 1s (1m56s user)
  • RMD only: 3m 7s (1m11s user)
  • no hash: 2m 9s ( 2s user)

Note that my system times are significantly larger on this run, probably because I yum installed some stuff (gcc, etc) on my XO, so the filesystem size is larger and JFFS2 is having to work harder.

Compiling with -march=geode -O2 (since it's possible that -O9 creates suboptimal code) yields:

  • SHA+RMD: 5m12s (3m9s user)

So we might as well compile with -O2.

So, libtomcrypt yields a little benefit for RIPEMD-160, but a decent amount of SHA256 improvement. It's worth using.

comment:10 Changed 9 years ago by cscott

  • Resolution set to fixed
  • Status changed from new to closed

In build 607; closing the bug.

comment:11 Changed 20 months ago by Quozl

  • Milestone Update.1 deleted

Milestone Update.1 deleted

Note: See TracTickets for help on using tickets.