Ticket #5532 (closed defect: fixed)

Opened 10 months ago

Last modified 4 months ago

Sugar shell consuming vast amounts of memory

Reported by: kimquirk Owned by: marco
Priority: high Milestone: Update.1
Component: sugar Version:
Keywords: Cc: morgs, marco, rwh, tomeu
Action Needed: Verified: no
Blocked By: Blocking:

Description

Build 653

Testing for this release showed that after 1-2 days activities refused to load.

This has been traced to a memory leak in the sugar shell. May need to come up with a short term solution; and a better longer term solution.

Attachments

icon_release.py (1.1 kB) - added by marco 10 months ago.
Break the reference cycle using the destroy signal
mesh_leak.patch (5.9 kB) - added by tomeu 10 months ago.
fake_buddies.py (1.0 kB) - added by tomeu 10 months ago.
script for creating fake buddies in the mesh view
heap_key.diff (1.8 kB) - added by rwh 10 months ago.
find_leak.patch (3.1 kB) - added by tomeu 10 months ago.
hacks in sugar-shell to try to find the leaks
fix_mesh_leak.py (10.2 kB) - added by tomeu 10 months ago.
proposed partial fix
fix_leaks_lib.patch (2.0 kB) - added by tomeu 10 months ago.
fix_leaks_ps.patch (6.0 kB) - added by tomeu 10 months ago.
fix_leaks_shell.patch (3.1 kB) - added by tomeu 10 months ago.

Change History

  Changed 10 months ago by marco

Tomeu, I checked out the (hippo/gtk) destroy thing. It works for the case where you are connecting to the widget/item which is being destroyed (usually self) but obviously it doesn't work if you are connecting to a widget/item your class owns (the model case).

Changed 10 months ago by marco

Break the reference cycle using the destroy signal

  Changed 10 months ago by marco

I attached a testcase for another possibility we might consider. It's not great, but at least it keeps the hacks inside the icon implementation.

  Changed 10 months ago by marco

The medium term solution for BuddyIcon is probably to rework the way we interface with the PS.

  Changed 10 months ago by morgs

  • cc morgs added

  Changed 10 months ago by tomeu

  • cc tomeu added
  • owner changed from marco to tomeu

We were leaking a set of objects that were instantiated at every BuddyAppeared signal. That includes dbus proxies, and model and UI objects.

We were leaking because of three causes:

  • Palette was hold by a reference in PaletteGroup and by a bound method referred in PaletteObserver.
  • sugar.presence.buddy.Buddy was hold by the dbus signals it listens to and by the object cache in PresenceService.
  • BuddyMenu was hold by a signal in HomeModel it listens to.

The attached patch fixes all three and now we only grow a dbus.ObjectPath per buddy. This is very small compared to the previous situation, but I still want to know if it's a leak or just a cache inside dbus-python.

Marco, can you give a look to the attached patch? Perhaps there are simpler solutions to those.

  Changed 10 months ago by tomeu

  • cc marco added; tomeu removed

Changed 10 months ago by tomeu

  Changed 10 months ago by rwh

  • cc rwh added

Changed 10 months ago by tomeu

script for creating fake buddies in the mesh view

Changed 10 months ago by rwh

Changed 10 months ago by tomeu

hacks in sugar-shell to try to find the leaks

  Changed 10 months ago by marco

Tomeu, when you have something which can be reviewed please let me know. I think we should try to get something in before the freeze, even if small leaks are left.

Changed 10 months ago by tomeu

proposed partial fix

  Changed 10 months ago by tomeu

The patch attached reduces greatly the amount of memory leaked when buddies appear and disappear.

We still leak big chunks of memory when activities appear and disappear from the mesh view. I'm trying to hunt that one down.

I hope I can find that one soon, if not, I think applying this patch will be enough for update.1.

  Changed 10 months ago by marco

With this patch applied we still rely on the cyclic gc to release the objects, correct?

  Changed 10 months ago by marco

Both gtk widgets and hippo items already support explicit destruction. I think we can reuse it and cleanup code a bit. I can come up with a patch for that.

We should document how signals should be connected/disconnected because that's probably the more common cause of leaks.

I suggest we split the patch in two (UI and PS) so that they can be cleaned up and checked in separately.

Changed 10 months ago by tomeu

Changed 10 months ago by tomeu

Changed 10 months ago by tomeu

  Changed 10 months ago by tomeu

The last three patches fix the worst leaks, those caused by buddies appearing and disappearing.

Each patch can be applied separately and fixes separate reference leaks, but all of them need to be applied so objects in the cycle are correctly collected.

  Changed 10 months ago by marco

review+ for fix_leaks_lib.patch

follow-up: ↓ 16   Changed 10 months ago by marco

Review for fix_leaks_ps.patch:

+        self._s1 = None

I think we should merge _add_items in the constructor. Please also get rid of the commented out code while you are at it.

+            shell_model = self._shell.get_model()

We are doing this in so many places... I think we should store the model in an instance variable.

+        self._s1 = home_model.connect('active-activity-changed',

I think we should use descriptive names even if gets more verbose. I'm generally using hid as a suffix to indicate it's a handler id.

         self._layout.remove(icon)
         del self._activities[activity_model.get_id()]
+        icon.destroy()

I think destroyed items are removed automatically, so the remove would be unnecessary, can you verify please?

+    def __destroy_cb(self, activity_icon):
+        for key, icon in self._icons.iteritems():
+            icon.destroy()
+
+        self._icon.destroy()

I don't understand why this is necessary. Killing an item should also explicitly destroy all of its children automatically. If you just self._icons = None do we still leak?

(Please make sure to pylint all the patches, /me trying to get used to always do it)

  Changed 10 months ago by marco

We should have collabora review fix_leaks_ps.patch

in reply to: ↑ 14 ; follow-up: ↓ 17   Changed 9 months ago by tomeu

Replying to marco:

Review for fix_leaks_ps.patch: {{{ + self._s1 = None }}} I think we should merge _add_items in the constructor. Please also get rid of the commented out code while you are at it.

That code is not mine, but I think that having a separate _add_friends() method makes the code more readable. The constructor would be quite big and we wouldn't have a way to tag that block of code if we don't add a comment.

{{{ + shell_model = self._shell.get_model() }}} We are doing this in so many places... I think we should store the model in an instance variable.

If the problem you see is duplicated code, then I would prefer to add the auxiliary methods _get_shell_model() and _get_home_model() to BuddyMenu instead of adding more references. In my opinion, those classes should be singletons and accessible from any class in the shell code.

{{{ + self._s1 = home_model.connect('active-activity-changed', }}} I think we should use descriptive names even if gets more verbose. I'm generally using hid as a suffix to indicate it's a handler id.

Ok

{{{ self._layout.remove(icon) del self._activities[activity_model.get_id()] + icon.destroy() }}} I think destroyed items are removed automatically, so the remove would be unnecessary, can you verify please?

If we don't do self._layout.remove(icon), we still leak. By reading the code, looks like CanvasBox listens for 'destroy' in its children, but SpreadLayout doesn't, so this is needed for the moment.

{{{ + def destroy_cb(self, activity_icon): + for key, icon in self._icons.iteritems(): + icon.destroy() + + self._icon.destroy() }}} I don't understand why this is necessary. Killing an item should also explicitly destroy all of its children automatically. If you just self._icons = None do we still leak?

Sorry, this was a left over from a try to eliminate the leak when activities come and go. I haven't managed yet to fix this leak.

(Please make sure to pylint all the patches, /me trying to get used to always do it)

Any idea how to reduce the false positives? Looks like pylint cannot resolve the inheritance from sugar classes to GObjects.

pylint.sh in sugar always tells me my code is perfect.

in reply to: ↑ 16   Changed 9 months ago by marco

Replying to tomeu:

{{{ + shell_model = self._shell.get_model() }}} We are doing this in so many places... I think we should store the model in an instance variable.

If the problem you see is duplicated code, then I would prefer to add the auxiliary methods _get_shell_model() and _get_home_model() to BuddyMenu instead of adding more references. In my opinion, those classes should be singletons and accessible from any class in the shell code.

Sounds like a good idea to me.

{{{ self._layout.remove(icon) del self._activities[activity_model.get_id()] + icon.destroy() }}} I think destroyed items are removed automatically, so the remove would be unnecessary, can you verify please?

If we don't do self._layout.remove(icon), we still leak. By reading the code, looks like CanvasBox listens for 'destroy' in its children, but SpreadLayout doesn't, so this is needed for the moment.

Hmmm we will have to look into it at some point, but for now it's fine.

Any idea how to reduce the false positives? Looks like pylint cannot resolve the inheritance from sugar classes to GObjects.

afaik the only way is to disable those errors in the pylint config. Obviously that's going to hide some real errors though :(

  Changed 9 months ago by tomeu

<daf> tomeu: patch looks ok
<daf> tomeu: though you could give _s1 a better name :)

  Changed 9 months ago by tomeu

pushed to git

  Changed 9 months ago by tomeu

  • cc tomeu added
  • owner changed from tomeu to marco

Marco, can you include it in a rpm? Thanks!

  Changed 9 months ago by marco

  • milestone changed from Update.1 to Update.2

I don't see regressions but this obviously will require more testing. Reassigning to u.2 to keep looking into shell memory usage.

We should also come up with guidelines on how to avoid these sort of issues (both for activities and the shell).

  Changed 4 months ago by mstone

  • status changed from new to closed
  • resolution set to fixed
  • milestone changed from Update.2 (8.2.0) to Update.1

Leaks, if they still exist, are slow enough to be undetectable in casual tests of update.1-70x builds (e.g. by letting the build run overnight at 1cc).

Note: See TracTickets for help on using tickets.