Ticket #7426 (closed defect: wontfix)

Opened 6 years ago

Last modified 4 years ago

Journal in f7 disappears after olpc-update to f9

Reported by: mikus Owned by: mchua
Priority: blocker Milestone: 9.1.0-cancelled
Component: sugar-datastore Version: Development build as of this date
Keywords: 8.2.0:? csafor8.2 cjbfor9.1.0 Cc: mikus@…, gregorio, dgilmore, mchua
Action Needed: test in release Verified: no
Deployments affected: Blocked By:
Blocking:

Description

Some days ago (G1G1, Q2D16), using a recent f7-based Joyride (2056), issued 'olpc-update --full -v --force --usb' to update the "alternate version" to a recent f9-based Joyride (2087). Having booted into and run the f9 version, I re-booted into the alternate. The f7 version came up into Sugar, but the Home view showed only the central "me" icon, without the Journal icon (and without any other Activity icons). Went to the text console (alt-ctl-F2), but could find NOTHING (particularly in /var/log, nor anything Journal-related in .sugar/default/logs) that would show me how come Journal had not been started. In effect, installing f9 (as the new primary version) had made f7 (now as the alternate version) unusable.

[Yesterday helped a friend go through a similar update (in his case from Joyride 2044 to Joyride 2104). Same result -- when the Joyride 2044 was subsequently booted, no Journal to be seen (nor Activities).]

Attachments

logs.CSN74804910.2008-07-13.11-57-51.tar.bz2 (28.7 kB) - added by mikus 6 years ago.
log on 708 after booting from Joyride
logs.CSN74804910.2008-07-13.17-09-45.tar.bz2 (32.0 kB) - added by mikus 6 years ago.
log on (manually updated Joyride) after booting from Update.1

Change History

  Changed 6 years ago by tomeu

  • owner set to tomeu
  • priority changed from normal to high
  • next_action changed from never set to communicate
  • component changed from not assigned to datastore
  • milestone changed from Never Assigned to 8.2.0 (was Update.2)

Ooops, looks like this is due to #6269.

In short, on the move to F9, xapian has been updated to a version where non-backward-compatible changes have been introduced. F9-based builds can read old indexes, but F7-based builds cannot read the new ones.

What should we do here? Build the old xapian packages for F9? Or just release note that after updating to a F9-based build you will loose the journal contents if you go back to F7?

Please leave this ticket to discuss this particular issue in F9 joyrides, and discuss the generic issue of backwards compatibility in #6269.

Changed 6 years ago by mikus

log on 708 after booting from Joyride

Changed 6 years ago by mikus

log on (manually updated Joyride) after booting from Update.1

  Changed 6 years ago by mikus

Repeated this (booting with 'O' key pressed on front panel) on my G1G1.

(1) Going Joyride 2146 -> Update.1 708:

System came up in Home view. Journal showed none of the entries that were there before (with 2146).

(2) Going Update.1 708 -> Joyride 2146:

System came up in Home view. Journal showed pane with message "Journal is empty".

  Changed 6 years ago by gregorio

  • cc gregorio added
  • keywords blocks:8.2.0 added
  • priority changed from high to blocker

  Changed 6 years ago by gregorio

Set to blocker because it prevents downgrade. GS

  Changed 6 years ago by gregorio

  • next_action changed from communicate to code

  Changed 6 years ago by tomeu

  • cc dgilmore added

Will forward port the f7 xapian to f9. Dennis, any objection to this? If not, I'll request the creation of the OLPC-3 branch for xapian.

  Changed 6 years ago by tomeu

  • keywords 8.2.0:? added

  Changed 6 years ago by dgilmore

tomeu, I think this is a flag day, note in release notes thing. I dont think its something we can support. as long as the move forward is ok, then we are ok. sticking with the old version will only hurt us.

  Changed 6 years ago by dgilmore

  • status changed from new to closed
  • resolution set to invalid

  Changed 6 years ago by dgilmore

  • status changed from closed to reopened
  • resolution deleted

  Changed 6 years ago by gregorio

Hi Denis,

This bug means we can't downgrade. The software is not solid enough to be sure that every upgrade will work. We must have the ability to downgrade.

I believe that Tomeu is saying that we need the older libraries to do that.

Let me know what we have to do to keep the ability to downgrade. One way or the other we need that. IMHO Its a deal breaker for this release!

Thanks,

Greg S

  Changed 6 years ago by dgilmore

buy forcing the old libraries we can cause issues for other users of sugar, We also increase greatly the maintenance burden since we are unable to lean upon others. first thing they will say is test with the newer version.

No OS ever has supported downgrading. we just can not support it. its insanity to try. if in a deployment some have update.1 and some have 8.2 we could end up with weird collabaration bugs. not to mention bugs with people using the same activity version on two vastly different code bases. Once we deploy this we have to be sure that every upgrade works as expected. having to revert is a failure by us. people testing should backup the data they want to keep on the XO regularly. We should not the issue in the Release Notes. but we cant support what you are proposing.

  Changed 6 years ago by gregorio

Hi Denis,

XO downgrades now. I guess we're the first :-) Cisco IOS downgrades and does it while still operating! so maybe we're second.

Not sure how we do it or how much NAND space it costs but its a valuable feature.

The question of different versions interacting e.g. 8.1.1 collaborating w/8.2.0 is unrelated. Question of quality is N/A, it is what it is.

Is the issue with xapian libraries only? Are you saying we cannot use any older versions of any libraries from Fedora? We have to take everything in Fedora 9 as is?

I'm not opposed to anything in particular, just want to understand.

There may be another option which is much more work for us (Sugar work I think). That is to have an old version of the Journal, if you downgrade you lose any changes made on 8.2.0 but you get back what you had. Not sure about that one, need Tomeu to elaborate or correct me.

give me choices and costs and we'll make a decision:

Option 1 - Lose revert feature Engineering cost - 0 coders, test cost high

Option 2 - Use Fedora 7 Xapian binaries Engineering cost - big long term if we need support, testers cost low, other?

Option 3 - Downgrade but lose any new data. Engineering cost - medium short term, low long term, test cost medium

Now that I think about it, xapian isn't the only downgrade issue is it? We can't downgrade the whole release from Fedora 9 to 7 can we?

We could before because we were on Fedora 7 and just needed to change the Sugar scripts. Let me know the whole story then we'll deal with it as best we can. Adios revert safety net, adios backward compatibility of activities, ay caramba!

Thanks,

Greg S

  Changed 6 years ago by mstone

I think that the best we can do is to attempt to provide a downgrade script in the release notes. Comments?

P.S. - We should think about creating space for new releases to provide downgrade scripts for old ones...

  Changed 6 years ago by dgilmore

the way booting into the alt image works is that we have two images installed so we use an additional ~300mb on the nand, the question of different versions collaborating is not unrelated, if only some machines get updated. they could be over the wire api changes that cause issues.

Option 1, test cost is not high, the only cost is to document that journal is not backwards compatible. (backups are important here, likely you can restore a backup and get back the old journal) Option 2, maintenance cost is extremely high, if we hit a bug that we need upstream help with. the first thing they will ask is to test with the latest version. and will likely not help patch the older version. moving forward long term cost only gets higher. by not moving forward you are setting us up to never move forward. Option 3 - booting into the alt-os image can cause all sorts of other issues, especially if activites were updated and only work with the newer sugar. sugar would need to know how to do some kind of dump restore to get the data into the alt-os image likely we would need to do something to track activities and allow for multiple versions of activities to be installed, and provide some way to detect what can be used safely in a particular build. alt-boot really should only be an option for developers. we should not be shipping updates that have issues that would want to have deployments alt-boot into the old os. its a great safety net if a test build fails to even boot. To give the kids the most amount of nand possible for user data they should get reflashed not updated. and then have there datastore restored from backup.

follow-up: ↓ 17   Changed 6 years ago by mstone

Actually, the two images are hardlinked together so, in most cases, much less than 300M is used. Furthermore, once the user is satisfied with the new OS, it is straighforward to delete the old one reclaiming any used space.

in reply to: ↑ 16   Changed 6 years ago by dgilmore

Replying to mstone:

Actually, the two images are hardlinked together so, in most cases, much less than 300M is used. Furthermore, once the user is satisfied with the new OS, it is straighforward to delete the old one reclaiming any used space.

in the case of a F-7 based build and a F-9 there is little to nothing the same. so they are using pretty much if not all of the full image size. yes it's trivial to free up the space. but do we make sure that info goes all the way down to the deployment teams. and the teachers in the schools?

  Changed 6 years ago by marco

  • next_action changed from code to design

  Changed 6 years ago by tomeu

What if we forward port the old xapian libs to F9 and make sure that the xapian index can be rebuilt for 9.1.0? I think that we can easily fallback to non-xapian in case the installed version fails to read it, and then rebuild the index.

follow-up: ↓ 21   Changed 6 years ago by marco

Tomeu, will metadata saving to json files be in place for 8.2?

<c_scott> marcopg_: so if tomeu can write a 'downgrade-xapian' script that regenerates the indices from the metadata, and we can put that into 8.2 called from olpc-configure, then we'd be in a real good place to upgrade xapian in 9.1 and have downgradability.

in reply to: ↑ 20   Changed 6 years ago by tomeu

Replying to marco:

Tomeu, will metadata saving to json files be in place for 8.2?

Yup.

{{{ <c_scott> marcopg_: so if tomeu can write a 'downgrade-xapian' script that regenerates the indices from the metadata, and we can put that into 8.2 called from olpc-configure, then we'd be in a real good place to upgrade xapian in 9.1 and have downgradability. }}}

Could be done, not sure if the best use of our resources though.

  Changed 6 years ago by kimquirk

We should provide an externally downloadable script and information in the release notes on how to run this to recover your journal.

Then it won't be blocking for 8.2.

follow-up: ↓ 24   Changed 6 years ago by cscott

My full suggestion was that we downgrade xapian in 8.2, to buy us time to consider a better fix, either in 8.2.1 or 9.1. With tomeu's "metadata in json files" patches in 8.2, writing a 'downgrade-xapian' script for 8.2 would be much easier than writing one for 8.1.

in reply to: ↑ 23   Changed 6 years ago by tomeu

Replying to cscott:

My full suggestion was that we downgrade xapian in 8.2, to buy us time to consider a better fix, either in 8.2.1 or 9.1. With tomeu's "metadata in json files" patches in 8.2, writing a 'downgrade-xapian' script for 8.2 would be much easier than writing one for 8.1.

I'm ok with this, but note that people that have been testing 8.2.0 builds till now will be unable to read their datastores.

Kim, Greg, can you give your opinion on this?

  Changed 6 years ago by marco

Tomeu, see comment 22, do you think that's possible?

follow-up: ↓ 27   Changed 6 years ago by kimquirk

What do you think is the work effort for downgrading xapian for 8.2.0 and working on the upgrade with a recovery script for a future release (I think this is scott's suggestion)?

in reply to: ↑ 26 ; follow-up: ↓ 28   Changed 6 years ago by tomeu

Replying to kimquirk:

What do you think is the work effort for downgrading xapian for 8.2.0 and working on the upgrade with a recovery script for a future release (I think this is scott's suggestion)?

Downgrading xapian for 8.2.0: expect this to be a matter of a couple of hours. What worries me is how we are going to deal with all the people that will have updated to a testing build and see their journals to be emptied. May not be so bad, don't really know.

"Repair journal' activity: a couple of days of work.

in reply to: ↑ 27   Changed 6 years ago by tomeu

Replying to tomeu:

"Repair journal' activity: a couple of days of work.

Probably won't be an activity but a script, due to Rainbow.

  Changed 6 years ago by marco

Then I think we should go for the repair journal script. It can be done after 8.2 and I'm worried that downgrading would get us a bunch of bug reports (and hungry testers).

Greg, Kim can we go down that way and make this not-a-blocker?

  Changed 6 years ago by marco

Heh I meant *angry* testers there. Downgrading the datastore will empty the journal of people that tested joyride so far.

  Changed 6 years ago by tomeu

Ok, we have one more option here: Olly Betts from http://oligarchy.co.uk has offered this patch that applied to xapian 1.0.7 will cause databases not to be upgraded to the new version.

http://oligarchy.co.uk/xapian/patches/xapian-flint-olpc-compat.patch

This means that we are able to ship a xapian that can interpret old and new databases without making it unreadable by older releases.

Kim, have a say? Would like to close this one ASAP.

  Changed 6 years ago by OllyBetts

Probably better to use:

http://oligarchy.co.uk/xapian/patches/xapian-flint-olpc-compat-v2.patch

The only difference is that this ensures that the old_version flag is initialised in the "create a new database" case.

Both versions pass Xapian's own testsuite.

Let me know if you have any questions about the patch.

  Changed 6 years ago by cscott

I like the idea of OllyBetts patch. We should get it into joyride ASAP, at least.

  Changed 6 years ago by cscott

I built these packages locally in mock and put them in my joyride public_rpms. If they seem to work we should get xapian-core forked in koji.

xapian-core-1.0.7-1.fc9.1.src.rpm

  Changed 6 years ago by cscott

  • keywords csafor8.2 added
  • next_action changed from design to test in build

OK, they're in joyride. Can someone suggest a good testcase?

Absent a better one, I guess "create some journal entries in 714, upgrade to joyride-2414, create some more journal entries, alt-boot back to 714, all journal entries should still be present"?

  Changed 6 years ago by tomeu

|TestCase|

Create some journal entries in 714, upgrade to joyride-2414, create some more journal entries, alt-boot back to 714, the journal shouldn't be empty.

  Changed 6 years ago by cscott

  • next_action changed from test in build to approve for release

xapian-core-1.0.7-1.fc9.1.src.rpm should be added to next stable release?

  Changed 6 years ago by mstone

  • next_action changed from approve for release to add to release

Approved.

  Changed 6 years ago by gregorio

Hi Guys,

Please close this one ASAP. I'm writing the release notes and this will keep showing up there until it gets to Closed state.

Thanks,

Greg S

  Changed 6 years ago by cscott

  • next_action changed from add to release to test in release

Please test build 761 resulting from build-repo commit a481b717.

  Changed 6 years ago by mchua

  • owner changed from tomeu to mchua
  • status changed from reopened to new

Assigning to myself to test with builds 714 and 763, and tomeu's test case.

follow-up: ↓ 44   Changed 6 years ago by adricnet

I've got this one live on my XO (G1G1 2007 C) now. Sometime after I started trying the RCs for this release 9switched over from JR) I stopped being able to access the Journal. To prove it, I just ran through this:

olpc-update -r candidate-714 loads fine in old UI, Journal is empty, so I made a video circle boot over to 765, no Journal, even when I mash F1 circle-boot back to 714, two Journal entries, Record and the video clip in the list

Please let me know if there is data I can collect to help pin this down.

  Changed 6 years ago by mchua

  • next_action changed from test in release to unknown

Since we don't have much time before the release, maybe we can get that "repair journal" script mentioned above instead, depending on the time cost. Greg, what do you want to do with this?

Are there any other builds that this might break for? I didn't see any Journal breakage when upgrading to 761 (the then-latest-build) from 656 or 708. What other upgrade paths should we look for this bug in?

Aside: I wonder if there's a way to force Release Note reading by making it so that updating to 765 will also add a "Release Notes" entry to your Journal (say, a Write doc, or a static version of the wiki page). That way people who do hit this bug will have a single item in their Journal that happens to include an explanation of what happened (and hopefully how to fix it). I don't know the feasibility/possibility of this at all, though.

in reply to: ↑ 42   Changed 6 years ago by tomeu

Replying to adricnet:

I've got this one live on my XO (G1G1 2007 C) now. Sometime after I started trying the RCs for this release 9switched over from JR) I stopped being able to access the Journal. To prove it, I just ran through this:

If that journal data was ever booted inside a 8.2 joyride earlier than 2412, then it is expected.

See that the testcase says to start with a journal in 714, then move to a build more recent than 2414, then go back.

  Changed 6 years ago by adricnet

Later that night At Mel's suggestion I updated to 711, did a Paint scribble, and then back to 765, and now I have a Journal (with only Gmail and Record in it?) and GMail as the only favorite in the ring.

  Changed 6 years ago by gregorio

  • next_action changed from unknown to test in release

Hi Guys,

I changed next action on this to test in release.

Let me know if its not working ASAP so I can add a release note on it.

Otherwise I'll assume it fixed.

Thanks,

Greg S

  Changed 6 years ago by mchua

  • milestone changed from 8.2.0 (was Update.2) to 8.2.1

Bumping to 8.2.1 as per qa meeting today.

  Changed 6 years ago by mchua

  • keywords blocks:8.2.0 removed

  Changed 5 years ago by mstone-xmlrpc

  • keywords cjbfor9.1.0 added
  • milestone changed from 8.2.1 to 9.1.0

Pushing out to 9.1.0, per edmcnierney's request.

  Changed 4 years ago by mchua

  • cc mchua added
  • status changed from new to closed
  • resolution set to wontfix

I'm going to close this ticket since it's so outdated that it refers to a build that's no longer being worked on, afaict. If this is in error, please reopen and assign the ticket to someone who's not-me, since I can't really test for this any more.

Note: See TracTickets for help on using tickets.