Ticket #6729 (closed defect: fixed)

Opened 6 years ago

Last modified 6 years ago

Read doesn't save the PDFs it opens

Reported by: cjb Owned by: homunq
Priority: blocker Milestone: 8.2.0 (was Update.2)
Component: read-activity Version:
Keywords: blocks:8.2.0 Cc: tomeu, marco, mstone, gregorio, skierpage, rwh, homunq, morgs
Action Needed: design Verified: no
Deployments affected: Blocked By: #7615
Blocking:

Description

Read only saves metadata on opened files in the datastore, rather than the files themselves. I don't know why, but there's a comment in the source saying it's intentionally so. This breaks Read sharing over the mesh -- when you download a file from someone else, it should be stored on your laptop afterwards.

Reinier, any ideas?

Attachments

0001-bug-6729-hash-and-check-for-dupes-before-saving-do.patch (2.7 kB) - added by homunq 6 years ago.
0001-bug-6729-support-in-datastore-for-fix.patch (0.7 kB) - added by homunq 6 years ago.
all6729.patch (23.5 kB) - added by homunq 6 years ago.
all6729.2.patch (3.0 kB) - added by homunq 6 years ago.

Change History

  Changed 6 years ago by cjb

  • cc mstone added

  Changed 6 years ago by rwh

Read saves metadata only to make sure the PDF doesn't end up on NAND several times. However, when a document is received through sharing it should of course be saved. I'm working on this and expect a working patch tomorrow. As cjb noted the file currently only stays around in /tmp (mem), and it should be removed from there as well.

  Changed 6 years ago by rwh

I just spend quite a bit of time being bitten by rainbow while trying to solve this :-)

Anyway, Read saves the file it gets in /home/olpc/isolation/1/uid_to_home_dir/<uid>/tmp (from tempfile.gettempdir). However, the DS runs as olpc.olpc, and does not have access to this directory, if I'm not mistaken. Therefore I can't move the file to the DS. For now I'll try to save the tempfile in .../instance, but I think the DS should definitely be able to access .../tmp too.

  Changed 6 years ago by tomeu

Hi Reinier, thanks a lot for looking into this.

As documented in the below link by Bert and others, the instance dir is in fact the right one:

http://wiki.laptop.org/go/Low-level_Activity_API#Writable_Directories

  Changed 6 years ago by cjb

Hi, did the fix get committed here?

follow-up: ↓ 26   Changed 6 years ago by skierpage

I noticed that every time I start Browse, visit OLPC Library > images > world maps and click Africa (a PDF), another full copy of the PDF is added to ~olpc/.sugar/default/datastore/store by Read activity 45. If I read this bug correctly, that behavior is a downside of this bug fix.

  Changed 6 years ago by gregorio

  • cc gregorio, skierpage, rwh added
  • next_action set to never set

Hi Skierpage et al,

That looks like a big problem to me that it will fill up the NAND!

Is someone going to work on that? Should we re-open the bug?

Thanks,

Greg S

  Changed 6 years ago by gregorio

  • next_action changed from never set to design

  Changed 6 years ago by homunq

  • cc homunq added
  • owner changed from rwh to homunq
  • status changed from new to assigned

  Changed 6 years ago by homunq

  • summary changed from Read doesn't save the PDFs it opens to Read saves multiple copies of the PDFs it opens

  Changed 6 years ago by skierpage

I brought this up on the devel mailing list in downloaded files and the Journal with some ideas for fixing. Tomeu Vizoso replied These problems are known of, and hasn't been solved yet because of lack of man power... search in trac and review that the tickets in there contain all the relevant information.

  Changed 6 years ago by homunq

  • blockedby 7615 added

  Changed 6 years ago by homunq

I am testing a patch, now that 7615 is resolved... should be done by thursday.

  Changed 6 years ago by marco

  • keywords blocks?:8.2.0 added

  Changed 6 years ago by gregorio

  • keywords blocks:8.2.0 added; blocks?:8.2.0 removed

Thanks for your help!

I'm marking it a must fix for now. Hope its over by next week.

Thanks,

Greg S

  Changed 6 years ago by homunq

  • blockedby 7934 added

Changed 6 years ago by homunq

  Changed 6 years ago by homunq

  • next_action changed from design to review

These patches fix the basic problem. The fix is somewhat brain-dead, however, as you only hash files that are downloaded by sharing, and then only AFTER downloading them. A deeper fix would either have to change the network protocol, or do something creative not to. I think it is worth it, but probably not right now; open to hear disagreement, though.

  Changed 6 years ago by homunq

patch also fixes #7934

  Changed 6 years ago by morgs

  • next_action changed from review to communicate

homunq: read patch doesn't apply, please attach the full diff with origin/master, or additional patches

  Changed 6 years ago by morgs

Also, please put in a test case so this gets picked up by the review report

Changed 6 years ago by homunq

Changed 6 years ago by homunq

  Changed 6 years ago by morgs

homunq, the problem that skierpage mentioned above in comment 7 (Africa PDF) occurs whether or not you open Read, since Browse makes a new journal entry anyway. Comments?

  Changed 6 years ago by morgs

  • next_action changed from communicate to design

homunq, see #7898 for another description of the problem reported here. It can't be solved in Read, since you don't even need to open Read to trigger it. Just click on the same PDF twice, in Browse. FWIW I don't think this has anything to do with the original problem reported and solved in this ticket.

  Changed 6 years ago by morgs

  • cc morgs added

  Changed 6 years ago by morgs

  • blockedby 7934 removed

in reply to: ↑ 7   Changed 6 years ago by morgs

  • status changed from assigned to closed
  • resolution set to fixed
  • summary changed from Read saves multiple copies of the PDFs it opens to Read doesn't save the PDFs it opens

Replying to skierpage:

I noticed that every time I start Browse, visit OLPC Library > images > world maps and click Africa (a PDF), another full copy of the PDF is added to ~olpc/.sugar/default/datastore/store by Read activity 45. If I read this bug correctly, that behavior is a downside of this bug fix.

Actually, that's nothing to do with Read - if you click on a PDF in Browse, then the download will create a new datastore entry with a new copy of the file before you even think of opening Read.

Every OS I've seen will create a duplicate file if you download the same thing twice.

There is a chance to improve this with the next generation datastore, though, if we use hard links and copy-on-write. I'm logging a new ticket to track that, so it doesn't get confused with Read - see #8155.

In the mean time, I'm closing this ticket to prevent confusion with existing Read issues.

follow-up: ↓ 28   Changed 6 years ago by skierpage

I agree with #comment:7 that it's Browse's download of the PDF download making the copies.

homunq's all6729.2.patch attempts to avoid making additional copies within Read's _download_result_cb(). This isn't invoked when I open a PDF from Browse, but it might be invoked when you join a friend's Read activity (I'm guessing, I have no friends ;-) ). So that patch might still be worthwhile to avoid multiple copies when you join someone else's activity.

It appears Read can has code to display a file given a URL. So instead of the "download in Browse-save in journal-switch to journal-start Read" dance, Browse could instead store the URL of the PDF you click on in the Journal, or even directly start the Read activity with the URL. Working with URLs rather than files would be a big change, and touches on the discussion on the devel list around July 7th and in bug #6958 about seamless launching from URLs.

in reply to: ↑ 27   Changed 6 years ago by morgs

Replying to skierpage:

It appears Read can has code to display a file given a URL. So instead of the "download in Browse-save in journal-switch to journal-start Read" dance, Browse could instead store the URL of the PDF you click on in the Journal, or even directly start the Read activity with the URL. Working with URLs rather than files would be a big change, and touches on the discussion on the devel list around July 7th and in bug #6958 about seamless launching from URLs.

It uses HTTP to download a PDF from peers in a shared session, which is not necessarily the best file transfer mechanism. Point taken though, we'll see what comes of #6958.

Note: See TracTickets for help on using tickets.