Ticket #3359 (closed task: fixed)

Opened 7 years ago

Last modified 6 years ago

Time sync for XOs.

Reported by: cscott Owned by: cscott
Priority: normal Milestone: Update.1
Component: school server Version:
Keywords: Cc: cjb, wad, JordanCrouse
Action Needed: Verified: no
Deployments affected: Blocked By:
Blocking:

Description

Apparently we lose as much as a second every time we suspend/resume. Since we expect to be doing this a lot, we need to have a reliable time service to sync with.

Change History

Changed 7 years ago by kimquirk

  • milestone changed from Untriaged to First Deployment, V1.0

What does this mean for laptops with no access to a school server?

Changed 7 years ago by wad

It means their notion of time is not too good. Realistically, all laptops on the internet can access a time server. The default configuration on the laptop should point to the fedora default pool of time servers (we can contribute one as a good-will gesture ?), and the registration process can change that to point to time, the school ntp server ( http://wiki.laptop.org/go/School_Service_Names ).

The school server (build 128) should already support this service and service name.

Someone should check the configuration on the client to ensure that it doesn't bail if the adjustment is too drastic --- the Debian defaults used to bail if more than a second out of adjustment...

Changed 7 years ago by gnu

Is the essence of the problem that the kernel copies the RTC into its idea of the time, but the RTC only has 1-second resolution? If that's it, there's a pretty good fix:

It should be possible to set the kernel clock accurately, by watching the realtime clock tick to another second. This could even be done incrementally, second by second, sneaking up on the instant that it ticks. (Gen-2 should use a realtime clock with a counter and alarm that ticks off much smaller units.)

So on wakeup, you set the time to the RTC time and 0 microseconds (say 04:06:05.0000. Set an RTC alarm to occur at the next second (04:06:06); you'll get an interrupt right after it ticks. Your kernel time will be maybe 04:06:05.314000. Now you know that time was too conservative; so bump forward your idea of what time it is to be 04:06:06.00000. You're almost synchronized with the tick, but you had some interrupt latency, so you're still a little slow. Set an internal timer to wake you up 99% of the way through the next second (at 04:06:06.99000), and when you get that interrupt, sit in a tight loop and watch it tick over to 04:06:07. Advance your internal clock to 04:06:07.00000 right then, and you'll be accurate to within the time it takes to do one cycle of a tight loop reading the RTC. In less than three seconds, you've probably recovered the time "lost" in suspend -- and without ever making the clock run backward. Think of this as adjtime for suspended clocks.

Then you won't need NTP to sync it to somewhere else. (Of course, this kind of clock fiddling will give NTP some concern about the stability of your local oscillator. Check in with Dave Mills, truechimer wizard, for how to make this work best.)

Changed 7 years ago by cscott

  • owner changed from wad to cscott

Changed 7 years ago by bemasc

There has been some question about ntp, since it's a complex machine not well-suited to environments with extreme lag. It could also be too much load on the timeservers.

Given our low accuracy requirements (within a few minutes would be fine), we should also consider rdate. That's the simplest possible network time setting program, and basically sufficient for us.

Changed 7 years ago by cscott

  • milestone changed from Update.2 to Update.1

This is on the update.1 roadmap.

Is there a server pool for rdate which we should be using?

Changed 7 years ago by cscott

rdate doesn't seem to have a well-known server pool, and there are some slight security issues (attackers deliberately missetting clocks) which ntp has already thought through.

I'll set up ntp.conf with 'time' as a first tier server; if we ever get schoolservers widely deployed, that should keep down load. And I'll invoke ntpdate, not full-fledged ntp, which should also help.

Changed 7 years ago by cscott

  • summary changed from Time server for XOs. to Time sync for XOs.

Committed for joyride-1461.

    We run 'ntpdate' when we get network connectivity, against both 'time'
    (which should be the schoolserver) and 0.fedora.pool.ntp.org (our
    'vendor' ntp pool).

http://dev.laptop.org/git?p=users/cscott/pilgrim;a=commitdiff;h=068d1f5eae1fbf5d15635f8f8b81a5b5bc611ff8

Changed 7 years ago by cscott

  • owner changed from cscott to ApprovalForUpdate

This is a pilgrim patch; assign back to me after approval for commit to pilgrim update.1 branch.

Changed 7 years ago by jg

  • owner changed from ApprovalForUpdate to dgilmore

I still want to understand why the laptops are losing that much time on suspend/resume and *FIX* the root cause. Having rdate is useful, and necessary, but *NOT* an actual fix. So do *NOT* close this bug after rdate is in update.1...

The fix itself is approved.

Changed 7 years ago by cscott

  • owner changed from dgilmore to cscott

Please open another bug for suspend/resume-related time issues.

Changed 7 years ago by dsd

Given that this is included in joyride and update1, and that another bug will be opened for the suspend/resume time handling issues, can this bug be closed?

Changed 7 years ago by jg

cscott, please enter the bug for losing time....

Changed 7 years ago by JordanCrouse

  • cc JordanCrouse added

Changed 7 years ago by cscott

Opened 6053 for the 'root cause'. This bug is awaiting testing before it is closed.

Changed 6 years ago by cscott

  • status changed from new to closed
  • resolution set to fixed
Note: See TracTickets for help on using tickets.