Ticket #4406 (closed defect: fixed)

Opened 7 years ago

Last modified 4 years ago

XO leaves trash on USB sticks

Reported by: gnu Owned by: tomeu
Priority: high Milestone: Future Release
Component: sugar-datastore Version: Development build as of this date
Keywords: Cc: william.waddington@…, ffm, Eben, krstic, christianmarc
Action Needed: never set Verified: no
Deployments affected: Blocked By:
Blocking:

Description

B4, Build 611, Q2D01.

I thought Macintoshes were bad, dropping crumbs of garbage all over your USB drive whenever you plug it into a Mac. Now the OLPC is doing the same ugly thing -- but four times as bad.

On a real Linux machine (that doesn't trash USB keys), I put a single file (SimCity.xo) in the root of an otherwise empty 2GB USB key, unmounted it, and moved it to the XO. I clicked around in the Journal, trying to get it to read the USB stick. Various things did provoke the USB access light to light, though I never did see the file appear anywhere. Ultimately I rebooted the XO (it got me into a corner of the Journal that wouldn't let me out, and I couldn't unmount the USB stick, not from the dev console, and couldn't navigate to the "Unmount" pop-up because it wouldn't show me that screen). I removed the stick during the reboot, and examined it on a Linux machine. It ended up containing:

bash-3.1$ ls -la /media/COMPUSA total 16 drwxr-xr-x 3 gnu root 4096 Dec 31 1969 . drwxr-xr-x 6 root root 4096 Oct 23 03:44 .. drwxr-xr-x 3 gnu root 4096 Oct 23 2007 .olpc.store bash-3.1$ ls -la /media/COMPUSA/.olpc.store/ total 16 drwxr-xr-x 3 gnu root 4096 Oct 23 2007 . drwxr-xr-x 3 gnu root 4096 Dec 31 1969 .. drwxr-xr-x 2 gnu root 4096 Oct 23 2007 index -rwxr-xr-x 1 gnu root 107 Oct 23 2007 metainfo bash-3.1$ ls -la /media/COMPUSA/.olpc.store/index total 136 drwxr-xr-x 2 gnu root 4096 Oct 23 2007 . drwxr-xr-x 3 gnu root 4096 Oct 23 2007 .. -rwxr-xr-x 1 gnu root 2140 Oct 23 2007 config -rwxr-xr-x 1 gnu root 0 Oct 23 2007 flintlock -rwxr-xr-x 1 gnu root 12 Oct 23 2007 iamflint -rwxr-xr-x 1 gnu root 14 Oct 23 2007 position.baseA -rwxr-xr-x 1 gnu root 14 Oct 23 2007 position.baseB -rwxr-xr-x 1 gnu root 16384 Oct 23 2007 position.DB -rwxr-xr-x 1 gnu root 14 Oct 23 2007 postlist.baseA -rwxr-xr-x 1 gnu root 14 Oct 23 2007 postlist.baseB -rwxr-xr-x 1 gnu root 16384 Oct 23 2007 postlist.DB -rwxr-xr-x 1 gnu root 14 Oct 23 2007 record.baseA -rwxr-xr-x 1 gnu root 14 Oct 23 2007 record.baseB -rwxr-xr-x 1 gnu root 16384 Oct 23 2007 record.DB -rwxr-xr-x 1 gnu root 14 Oct 23 2007 termlist.baseA -rwxr-xr-x 1 gnu root 14 Oct 23 2007 termlist.baseB -rwxr-xr-x 1 gnu root 16384 Oct 23 2007 termlist.DB -rwxr-xr-x 1 gnu root 14 Oct 23 2007 value.baseA -rwxr-xr-x 1 gnu root 14 Oct 23 2007 value.baseB -rwxr-xr-x 1 gnu root 16384 Oct 23 2007 value.DB

What a load of binary trash!

The laptop shouldn't write to external USB drives at all -- unless the user writes some data to them. And when it does, it should write whatever files the user created -- no more, and no fewer. Whatever design, or lack thereof, that results in some XO program creating 19 files and two directories in place of a single file, should be redesigned.

Change History

  Changed 7 years ago by AlbertCahalan

  • priority changed from normal to high

I strongly agree, for another reason: this stuff will get messed up by non-OLPC usage; it is thus useless at best and possibly dangerous.

Writing the data is a mere annoyance. Reading it, with the expectation that it will be useful (not obsolete or malicious), is much worse.

  Changed 7 years ago by jg

  • owner changed from marco to bcsaller
  • component changed from sugar to datastore
  • milestone changed from Never Assigned to Future Release

I think that some of us have other beliefs about what is needed for a child, which does not include needing to read file names that a child cannot read yet.

On the other hand, I tend to agree with you that writing anything until some file is written is incorrect, and 19 files is indeed an issue, if only for memory and performance reasons.

But we can't fix this at this date; we'll have to revisit it in a later release when we deal with the differential datastore.

  Changed 7 years ago by zoltanthegypsy

  • cc william.waddington@… added

I just tripped over this when trying to experiment with USB booting on my G1G1 XO.

Granted I don't understand the reasoning here, but it seems like inappropriate behavior to scribble on a drive without warning.

I also notice that plugging in a 1G stick w/700MB of files on it brings on a frenzy of activity that ties up the XO for a very long time.

What's the current thinking on this?

Thanks, Bill

  Changed 7 years ago by ffm

  • cc ffm added

follow-up: ↓ 6   Changed 7 years ago by tomeu

  • cc Eben, krstic added

Certainly, if we can support the desired user experience without writing nothing on disk, that would be better.

Eben, can you add to the HIG or to the sugar specifications an explanation of how we want usb sticks to behave? Then Ivan will be able to use this as input for the new datastore spec.

in reply to: ↑ 5   Changed 7 years ago by AlbertCahalan

Replying to tomeu:

Certainly, if we can support the desired user experience without writing nothing on disk, that would be better.

If not, you'll have to pick some other user experience.

My desired user experience is that I plug the XO into the back of my head, and suddenly I know Kung Fu! (plus general relativity, rocket science, the secret to success, and the question) It's pointless for me to file a bug about this (gen2 hardware of course) because such wishes have a disconnect from the reality that we face.

Writing crap files is not an option. Reading crap files is even more strongly not an option.

  Changed 7 years ago by Eben

I don't see any reason to be writing all of this onto the drive itself. We need to index it in order to browse it in a Journal-like interface, but it seems we should also just cache the index locally instead of on the device itself. The exception to my desire not to write on USB media except for explicit copies is #1848, which I think will really enhance the experience of using external storage with the laptops.

  Changed 7 years ago by tomeu

The xapian DB not only contains the fulltext index but also the metadata.

Do we want to store in removable devices just files or also journal entries with all the metadata associated to them?

follow-up: ↓ 10   Changed 7 years ago by Eben

  • cc christianmarc added

This is one of those "really hard questions". In light of the new Journal designs, I'm tempted to say that either:

  1. We only store the object (the file), so as to allow basic transfer of well known file formats to others and also to the non-XO world. This is in keeping with the notion that I could either invite someone to join in a collaboration on a drawing (in which case they get the associated state and a matching activity ID) or give someone a drawing I made (in which case the get only the object, and no activity state or associated collaboration. This seems to be the simplest answer.
  1. A slightly more appropriate solution in light of the new Journal design is to treat the objects (files) separately from an "activity closure" (I'm just introducing this term now). Basically, resuming from the activity-centric view will restore the state of the object and the activity as it was last edited. On the contrary, opening a file from the object-centric view would simply start a new activity instance with the selected file, with a new activity ID and no associated state. Likewise, we can carry this over to external devices by treating copied objects (from object-centric view) as simple files, while wrapping up an activity closure as, perhaps, a zip bundle containing the object file(s), a state blob, and perhaps a .info or similar to tell Sugar how to put it back together later.

As a side note, we should attempt to preserve metadata as already supported by various file formats (ID3, EXIF, etc). Would this require duplication of metadata information? Could we install proxies for metadata which automatically set such fields when activities modify a particular key which matches the well defined schema? Sorry for the diversion, but it seems to relate at least in part to the above topic.

in reply to: ↑ 9   Changed 7 years ago by tomeu

Replying to Eben:

Likewise, we can carry this over to external devices by treating copied objects (from object-centric view) as simple files, while wrapping up an activity closure as, perhaps, a zip bundle containing the object file(s), a state blob, and perhaps a .info or similar to tell Sugar how to put it back together later.

We already have something like this implemented. We'll need to serialize journal entries for other scenarios like uploading to a website, transferring with tubes, etc.

My only concern is how we could represent such an entry in the UI. I think we could lay the items inside the zip file in such a way so that extracting the svg icon and the metadata would be efficient. Perhaps by storing those inside the zip file without compression.

follow-up: ↓ 12   Changed 7 years ago by tomeu

I would also like to add that the issue in my opinion is not xapian storing 19 files and two dirs. I think that writing all that info inside only one file is equally good or bad for the user.

The problems with the current approach are:

  • need to scan the whole drive every time,
  • the metadata is stored in a binary format that breaks between xapian releases.

The index can be recreated (and cached outside the usb stick), but if we store the metadata in the device, using a simpler plain text format would be better.

in reply to: ↑ 11 ; follow-up: ↓ 13   Changed 7 years ago by Eben

Replying to tomeu:

I would also like to add that the issue in my opinion is not xapian storing 19 files and two dirs. I think that writing all that info inside only one file is equally good or bad for the user.

I think I agree on this point. It seems that as long as everything is tucked within a single hidden top level directory, things can be equally as clean. In either case, we should attempt to minimize the size of anything we write here.

The problems with the current approach are: * need to scan the whole drive every time, * the metadata is stored in a binary format that breaks between xapian releases. The index can be recreated (and cached outside the usb stick), but if we store the metadata in the device, using a simpler plain text format would be better.

I'm not sure if I can be of any help regarding the above, from a user experience perspective (apart from, of course, recommending we prevent things from breaking when possible, but that goes without saying). If you have more specific questions on experience, let me know.

in reply to: ↑ 12 ; follow-up: ↓ 14   Changed 7 years ago by tomeu

Replying to Eben:

Replying to tomeu:

I would also like to add that the issue in my opinion is not xapian storing 19 files and two dirs. I think that writing all that info inside only one file is equally good or bad for the user.

I think I agree on this point. It seems that as long as everything is tucked within a single hidden top level directory, things can be equally as clean. In either case, we should attempt to minimize the size of anything we write here.

The problems with the current approach are: * need to scan the whole drive every time, * the metadata is stored in a binary format that breaks between xapian releases. The index can be recreated (and cached outside the usb stick), but if we store the metadata in the device, using a simpler plain text format would be better.

I'm not sure if I can be of any help regarding the above, from a user experience perspective (apart from, of course, recommending we prevent things from breaking when possible, but that goes without saying). If you have more specific questions on experience, let me know.

Well, the intended user experience impacts directly these questions.

If we want to be able to do fulltext search inside usb sticks, then we'll need to index the names of the files plus all the metadata we want to extract from the files, including part or all of the text inside the document. In order to make sure we have the fulltext updated, we'll need to scan the device after every mount.

If we want to transfer metadata along with files then we need a way to store it. Either we use zip files that encapsulate the file plus the metadata or we store it hidden somewhere in the usb stick. The first option is less convenient when the files are seen from outside sugar, the second is what we have today.

One more thing: the current DS stores a lot of unneeded info in the index, we could greatly reduce the amount of data in the index, the amount of time spent indexing and the number of files inside .olpc.store.

in reply to: ↑ 13   Changed 7 years ago by Eben

Replying to tomeu:

Replying to Eben:

Replying to tomeu:

I would also like to add that the issue in my opinion is not xapian storing 19 files and two dirs. I think that writing all that info inside only one file is equally good or bad for the user.

I think I agree on this point. It seems that as long as everything is tucked within a single hidden top level directory, things can be equally as clean. In either case, we should attempt to minimize the size of anything we write here.

The problems with the current approach are: * need to scan the whole drive every time, * the metadata is stored in a binary format that breaks between xapian releases. The index can be recreated (and cached outside the usb stick), but if we store the metadata in the device, using a simpler plain text format would be better.

I'm not sure if I can be of any help regarding the above, from a user experience perspective (apart from, of course, recommending we prevent things from breaking when possible, but that goes without saying). If you have more specific questions on experience, let me know.

Well, the intended user experience impacts directly these questions. If we want to be able to do fulltext search inside usb sticks, then we'll need to index the names of the files plus all the metadata we want to extract from the files, including part or all of the text inside the document. In order to make sure we have the fulltext updated, we'll need to scan the device after every mount.

I guess we'll have to weight the tradeoff between indexing time and the convenience of full-text search on external devices. If scanning takes a long time, maybe we should stick to the titles (and metadata?) for now. If we store the index locally, I assume that we could be intelligent about creating the index such that we don't later rescan files which haven't changed since creation of the already cached index.

If we want to transfer metadata along with files then we need a way to store it. Either we use zip files that encapsulate the file plus the metadata or we store it hidden somewhere in the usb stick. The first option is less convenient when the files are seen from outside sugar, the second is what we have today.

I think that the zip/bundle approach is the only correct way to do this. This "activity closure" is, strictly theoretically, a wholly self-contained entity capable of launching as an activity, with the associated files, and restoring the associated state and metadata. (This is what an "action entry" in the new Journal represents.) The difference in implementation will likley be that we store a reference to the activity that should be used to open it, rather than the actual bundle which would waste considerable space, with the goal of implicitly obtaining the required activity when such a closure is launched.

I don't know enough about filesystems and xattr to respond reasonably on storing metadata for the ordinary objects/files transferred (those not in a "closure"). I think we want to remain consistent with what's available on other systems in this regard. Do these comments give a clear enough picture of the intent?

follow-up: ↓ 16   Changed 7 years ago by gnu

Here's an example of why computers should not leave trash on USB sticks. The Peru OLPC team was trying to make a USB key that could be used to install new XO's. They unfortunately built it on a Macintosh, and the Mac added extra trash files to the USB key. These trash files confused the installer.

On a Mac there is no way to get rid of the trash files. On an OLPC there is no way to get rid of the trash files. The user has zero control over the naming and placement of the files that they put onto a USB stick. This is a bug.

This bug isn't that the UI is broken, not that we failed to make "activity closures", not that the trash is in a binary format that breaks. The bug is that the trash is there AT ALL. Put your trash on your own NAND filesystem if you want to keep trash around that describes a USB key.

IRC transcript:

<cscott> yanni wondered if perhaps some of the bundles he put on the key were corrupt

<m_stone> cscott: I've tried it with basic corruption; which is to say, with a '.xo' that was a zero-length file.

<cscott> backing up -- what's our current status?

<m_stone> It manages to notice that and keep going without problems.

<cscott> they created this key on a mac, and there were .foo.xo files (note leading dot) on it which caused errors but seemed to be properly skipped.

<m_stone> cscott: the current status is that the customization-1.zip material is having difficulty installing some of the bundles cjb prepared, and that the (unsigned) initramfs with your patches appeared to go through cleanly.

We have had the opportunity for five months (since this bug was filed) to make sure that nobody will ever say "They created this key on an OLPC, so there were trash files which caused errors to the application we later plugged the key into". So far the team is still arguing about the color of the bikeshed rather than the fact that USB sticks are not zoned to have ANY bikesheds on them.

I think the involvement of the UI designer (Eben) is inappropriate. This bug is not about the UI. This bug is about what gets written to the USB keys. Eben should play no part in deciding that issue. There may be a separate bug (which, if so, someone should file) along the lines of "GUI for USB keys is inappropriate or clumsy", or "GUI can't do X for files that it has no metadata about". Those possible bugs are not this bug.

in reply to: ↑ 15 ; follow-up: ↓ 17   Changed 7 years ago by tomeu

Replying to gnu:

We have had the opportunity for five months (since this bug was filed) to make sure that nobody will ever say "They created this key on an OLPC, so there were trash files which caused errors to the application we later plugged the key into". So far the team is still arguing about the color of the bikeshed rather than the fact that USB sticks are not zoned to have ANY bikesheds on them.

The problem here is that a major redesign of the DS is scheduled, but this task hasn't been resourced yet. So, while we wait, we interchange ideas about how things could be done. Do you feel offended by this?

I think the involvement of the UI designer (Eben) is inappropriate. This bug is not about the UI. This bug is about what gets written to the USB keys. Eben should play no part in deciding that issue. There may be a separate bug (which, if so, someone should file) along the lines of "GUI for USB keys is inappropriate or clumsy", or "GUI can't do X for files that it has no metadata about". Those possible bugs are not this bug.

User experience guides implementation, not gnu's distaste of system-created hidden files.

in reply to: ↑ 16   Changed 7 years ago by AlbertCahalan

Replying to tomeu:

Replying to gnu:

I think the involvement of the UI designer (Eben) is inappropriate. This bug is not about the UI. This bug is about what gets written to the USB keys. Eben should play no part in deciding that issue. There may be a separate bug (which, if so, someone should file) along the lines of "GUI for USB keys is inappropriate or clumsy", or "GUI can't do X for files that it has no metadata about". Those possible bugs are not this bug.

User experience guides implementation, not gnu's distaste of system-created hidden files.

With the word being "guides", not "rules", OK.

Subject to the contraint "the system will not read any non-user files from untrusted media", the UI designer should feel encouraged to make UI suggestions. The developers can respond as appropriate with "yes", "sorry, impossible", "sorry, too slow", etc.

Except as explicitly indicated by the user, the system must not make any attempt to interpret any untrusted data beyond the filesystem structure. To do otherwise would invite viruses/worms.

Of course, there is no point in writing files that you must not ever read.

  Changed 4 years ago by dsd

  • status changed from new to closed
  • next_action set to never set
  • resolution set to fixed

Sugar-0.84 in release 10.1.2 no longer writes a database to the USB disk

Note: See TracTickets for help on using tickets.