Ticket #6269 (closed defect: fixed)

Opened 6 years ago

Last modified 4 years ago

usb index backward compatibility

Reported by: erikos Owned by: tomeu
Priority: high Milestone: Future Release
Component: sugar-datastore Version:
Keywords: Cc: krstic, cscott, kim
Action Needed: never set Verified: no
Deployments affected: Blocked By:
Blocking:

Description

I played with the 'no usb drive journal' issue a bit since during upgrades I hit it a few times. One part was #6029 where a fix went in joyride already. The other part is the following:

Steps to reproduce:

A) indexed in 670, try to recognize in 690

  • plug in an usb-drive in a machine with an older image 670
  • plug it in another machine running 690
  • the device is mounted but not recognized by the journal

B) 670 - 653

  • plug it in another machine running 670
  • try to recognize in a machine with 653
  • the device is mounted but not recognized by the journal

You can remove the .olpc.store to get the usb device freshly indexed. And then you can display it. We do not have problems when a device is plugged in a machine running 690 indexed by 653 and the other way around so I think we are doing fine.

Attachments

journal690_nousb.log (8.3 kB) - added by erikos 6 years ago.
try to display usb device in 690 indexed in 670
journal670-653.log (2.7 kB) - added by erikos 6 years ago.
indexed in 670 try to display in 653

Change History

Changed 6 years ago by erikos

try to display usb device in 690 indexed in 670

Changed 6 years ago by erikos

indexed in 670 try to display in 653

  Changed 6 years ago by tomeu

  • status changed from new to closed
  • resolution set to wontfix

Right, from 669 to 681 the update.1 builds had, by mistake, a newer xapian release that wrote indexes in a format incompatible with the xapian release we already were using.

I chose not to update to a newer xapian because it wasn't clear nor the gains nor the risks. Perhaps we shouldn't have gone back given all the confusion it has caused to the testers?

  Changed 6 years ago by gnu

  • status changed from closed to reopened
  • resolution deleted

I'm glad that the problem is temporarily reverted -- but the explanation points up a much more serious bug. Next time we upgrade to a new xapian release, it will break. This bug should stay open so that we fix xapian and/or Sugar *before* that happens.

The proper fix is to never put crap "index" files onto other peoples' USB keys, nor should you try to read them from there. Plugging in a filesystem on USB should not cause the OLPC, or Sugar, to start writing any files on the new filesystem! And when writes occur, they should occur because the USER asked for something to be written there. OLPC may own the NAND, but you don't own the USB media. Didn't your momma teach you not to scribble in other peoples' books?

But since OLPC has already determined not to fix the real problem, at least you could make the code that reads your crap files able to read the crap files created by *former versions of itself*!

Of course, *forward* compatability is only half the problem, when dealing with portable media like USB sticks. You would also need *backward* compatability, i.e. the new code should be writing a crap file that will work properly when the USB stick is inevitably plugged into older laptops. You don't need to imagine what'd happen in a school where some students are trying a new release while others continue on a more stable release: their USB sticks would stop working. You have this bug report staring you in the face, but you closed it "wontfix".

The crapfile reading and writing code should also be bulletproof in the face of *random binary gibberish* and *subtly altered text* fed to it in these crap files; read-only USB sticks, and completely full USB sticks, and almost-completely-full USB sticks that fill up while writing the crap files, and terabyte USB drives that would take hours to index but are removed before they finish; and other edge conditions. If you insist on writing files in defiance of the user's wishes, the least you can do is make them reliable.

  Changed 6 years ago by marco

  • cc krstic added
  • component changed from journal-activity to datastore
  • summary changed from usb drive not recognized in journal in 690 when indexed by 670 to usb index backward compatibility

  Changed 6 years ago by OllyBetts

Since the index is apparently automatically rebuilt if it doesn't exist, the forward/backward compatibility could be solved by automatically deleting the Xapian index if you get a Xapian::DatabaseVersionError exception (which means "I don't understand this index version").

If you want to recover from corruption of the Xapian index, doing the same for Xapian::DatabaseCorruptError would work, though it might be a good idea to report the issue somehow to avoid papering over genuine problems.

As to which Xapian version to use, read the NEWS file to see the user-visible changes. This also covers big internal changes which shouldn't be user-visible, but are potentially destabilising (which would be user-visible).

There certainly have been a few bug fixes and several performance improvements since 1.0.2, but these may not be for cases which matter to you.

  Changed 6 years ago by AlbertCahalan

  • summary changed from usb index backward compatibility to usb index must not be read/written

It's not good enough to automatically delete crap files when Xapian fails to understand the version. The whole concept is broken:

The crap files will not be maintained by normal computers. The crap files are thus unusable, and so you must delete them every time anyway. The moment you read a crap file, you have made a grave error.

You can not write a crap file to media that is read-only or full.

It is pure insanity to write crap files for a 1 TB device; there could be many millions of legit files on the device. Pardon me for guessing that the Journal will get hit by the OOM killer in this case.

You must assume that people will try to create a virus by maliciously corrupting the crap files; the less data you parse the better.

BTW, for similar reasons it is also dangerous to use MIME types.

follow-up: ↓ 8   Changed 6 years ago by OllyBetts

I wasn't commenting on the wisdom of writing indexes to mounted devices. I was just suggesting a strategy for recovering from Xapian version upgrades/downgrades, etc.

follow-up: ↓ 9   Changed 6 years ago by tomeu

  • summary changed from usb index must not be read/written to usb index backward compatibility

Albert, please don't change the summary of any ticket without at least explaining why you are doing so.

"usb index must not be read/written" looks to me as a duplicate of #4406 "XO leaves trash on USB sticks".

This particular ticket is something we might have to do before we can completely remove the xapian index from removable devices (in case we decide to do so).

Btw, perhaps you and Gnu would like to send an email to the mailing list explaining why you sound so frustrated? We are surely doing many things wrong, and I think the project as a whole can grow if you expose your issues in a more constructive way.

But if you could explain calmly why you think something is wrong instead of just labeling it as "crap" or "trash", then I think your ideas would reach us much more efficiently.

Thanks!

in reply to: ↑ 6   Changed 6 years ago by tomeu

Replying to OllyBetts:

I wasn't commenting on the wisdom of writing indexes to mounted devices. I was just suggesting a strategy for recovering from Xapian version upgrades/downgrades, etc.

Just in case somebody may think that the negative remarks in this ticket and others are about the Xapian index, I would like to leave here my opinion that the problems we are having in the datastore are not caused by Xapian itself, but because of the (wrong) way we are using it.

Olly Betts and Richard Boulton have been very helpful when troubleshooting our use of Xapian and, as far as I have seen, none of the problems we are having are in Xapian.

Thanks again!

in reply to: ↑ 7   Changed 6 years ago by AlbertCahalan

Replying to tomeu:

This particular ticket is something we might have to do before we can completely remove the xapian index from removable devices (in case we decide to do so).

There can be no "in case". Putting an index on a USB stick is BROKEN. I suppose there is a quick fix:

1. When mounting, immediately delete any index. (the index can not be trusted)

2. When unmounting, delete the index. (the index will never be used again)

There is still the problem of large/full/readonly media, but at least the security/correctness problems go away. Backwards compatibility is no longer a problem.

Btw, perhaps you and Gnu would like to send an email to the mailing list explaining why you sound so frustrated? We are surely doing many things wrong, and I think the project as a whole can grow if you expose your issues in a more constructive way. But if you could explain calmly why you think something is wrong instead of just labeling it as "crap" or "trash", then I think your ideas would reach us much more efficiently.

I think that Gnu's "crap file" term is truly excellent and insightful. It fits in many ways.

  Changed 6 years ago by cscott

  • priority changed from normal to high
  • milestone changed from Never Assigned to Update.1

  Changed 6 years ago by tomeu

  • cc cscott added

Scott, could you please explain why you have moved this ticket to Update.1?

Please do this every time you move a ticket to the current milestone after code freeze (at least).

Thanks.

  Changed 6 years ago by cscott

  • cc kim added

Sorry. It's mentioned in http://wiki.laptop.org/go/Test_Group_Release_Notes as an issue with Update.1 builds, and the reporter seems to mention problems interoperating with 653. A milestone had never been assigned. Update.1 seemed the right place. If we decide that this problem is not going to be fixed for update.1, then a release note tag seems like it would need to be placed on this issue.

Again, I'm not perfect: this just represents my best guess that this is an actual problem with update.1 builds, and requires a decision to fix or not-fix by the release manager.

  Changed 6 years ago by cscott

I will mention that, in theory, trac would have us assign these bugs to an 'update.1 release candidate' *version* instead of *milestone*, but in practice we (a) don't even have a version in trac for update.1 builds, and (b) don't ever seem to look at searches based on version anyway. I'm sure we're missing lots of potential blockers for update.1 this way: suggestions for better processes are welcome (but devel@ seems the better forum for that discussion). I could also have put this bug on the update.1 roadmap without assigning it to the update.1 milestone -- would that have been preferable? (And in actuality I don't have edit permissions for the roadmap.)

  Changed 6 years ago by tomeu

Thanks for the explanation.

This ticket exists because we have no strategy for coping with format changes in the xapian index used by the datastore. This should not cause any issue in the field if we don't upgrade to a higher version of the xapian engine.

This ticket is referred in the Test Group Release Notes because, by mistake, the xapian engine got upgraded from 1.0.2 to 1.0.4 during the builds 669-680.

In a future implementation of the DS, I hope we store the metadata in a less fragile way and use indexes just for speeding up lookups.

  Changed 6 years ago by tomeu

Forgot to say that I don't think this ticket should be Update.1 because of the reasons given above.

follow-up: ↓ 17   Changed 6 years ago by cscott

OK, could we get testing to confirm that this problem doesn't occur when upgrading from 650, 653, or 656 to the latest update.1 candidate, and then close as 'fixed'? Or do we want to leave it open as a future concern for update.2?

in reply to: ↑ 16   Changed 6 years ago by tomeu

Replying to cscott:

OK, could we get testing to confirm that this problem doesn't occur when upgrading from 650, 653, or 656 to the latest update.1 candidate, and then close as 'fixed'? Or do we want to leave it open as a future concern for update.2?

All testing is welcome, but note that the xapian version is exactly the same between 650, 653, 656 and 697, and should be like this until this issue is addressed.

I think this ticket should only be closed as fixed when we implement a robust enough scheme for storing metadata and coping with format changes. If this requirement is not considered important enough, then can be closed as wontfix.

But I don't see why this should be implemented for Update.1.

If no important (refactoring) work is scheduled for update.2 in the datastore, then the target for this ticket should be update.3.

  Changed 6 years ago by tomeu

  • milestone changed from Update.1 to Future Release

Taking out from the milestone as no work is scheduled on the DS for u1-u2.

My recommendation is that the fulltext index is just used for speeding up searches and can be remade at any moment (upgrades, corruption, etc).

  Changed 4 years ago by martin.langhoff

  • status changed from reopened to closed
  • next_action set to never set
  • resolution set to fixed

Old. USB index and compat have been completely reworked, and 10.1.3 has backwards compat for deployments upgrading.

Note: See TracTickets for help on using tickets.