RubyConf 2006—Day 1 (Friday evening, 20 October 2006)
20 / Oct 2006This article was modified from its original published form. The most recent modification was on2014-09-25.
Series: RubyConf 2006
Dinner tonight was at Old Chicago with Hal Fulton, Ara Howard, Patrick Hurley, Tim Pease, and various others whose names I can’t remember offhand. Great dinner, and I was able to fully explain the problem with why the Ruby extension situation on Windows is so bad.
I also started talking about the big problem that I have with
Transaction::Simple
and
haven’t figured out how to solve in a general way (details below). They
weren’t quite understanding it, so before Matz’s Roundtable, I showed them a
test case that I had come up with while talking with Francis Cianfrocca.
The Roundtable was pretty short; not too many questions were asked this year, and the discussion didn’t continue for an hour as it did the year before. I was shot down when asking for “become” behaviour (related to the Transaction::Simple bug). After the Roundtable, I managed to snag Matz to talk about the problem which led me to request this. I showed him the test case:
require 'rubygems'
require 'transaction/simple'
class Child
attr_accessor :parent
end
class Parent
include Transaction::Simple
attr_reader :children
def initialize
@children = []
end
def < <(child)
child.parent = self
@children << child
end
end
parent = Parent.new
puts "parent.object_id: #{parent.object_id}"
parent << Child.new
puts "parent.children[0].parent.object_id: #{parent.children[0].parent.object_id}"
puts "starting transaction"
parent.start_transaction
parent << Child.new
puts "parent.children[1].parent.object_id: #{parent.children[1].parent.object_id}"
puts "aborting transaction"
parent.abort_transaction
puts "aborted transaction"
puts "parent.object_id: #{parent.object_id}"
puts "parent.children[0].parent.object_id: #{parent.children[0].parent.object_id}"
parent << Child.new
puts "parent.children[1].parent.object_id: #{parent.children[1].parent.object_id}"
producing the output:
parent.object_id: 3265800
parent.children[0].parent.object_id: 3265800
starting transaction
parent.children[1].parent.object_id: 3265800
aborting transaction
aborted transaction
parent.object_id: 3265800
parent.children[0].parent.object_id: 3265500
parent.children[1].parent.object_id: 3265800
This bug affects PDF::Writer
’s table generation and contributes
significantly to the high memory usage. What’s happening is that when you call
Parent#start_transaction
, Transaction::Simple
creates a transaction
checkpoint with Marshal.dump
. When you call Parent#rewind_transaction
or
or Parent#abort_transaction
, the transaction checkpoint is reverted. This
reversion is extremely robust except for this one item. What we really need is
something like self = Marshal.restore(checkpoint)
.
Obviously, that won’t work and this leads to the problem that is illustrated
above. After long discussion with Tim Pease, Patrick Hurley, and Matz, we came
up with a workaround that can work for the example bug and for PDF::Writer
.
It’s not super-efficient, though. Essentially, I will modify
Transaction::Simple
to have callback methods for post-processing after a
transactional operation. Something like this:
# Assuming the Parent and Child classes from the broken test case.
class Parent
def post_restore_hook
@children.map! { |child|
child.parent = self unless self.object_id == child.parent.object_id
child
}
end
end
parent = Parent.new
puts "parent.object_id: #{parent.object_id}"
parent << Child.new
puts "parent.children[0].parent.object_id: #{parent.children[0].parent.object_id}"
puts "starting transaction"
parent.start_transaction
parent << Child.new
puts "parent.children[1].parent.object_id: #{parent.children[1].parent.object_id}"
puts "aborting transaction"
parent.abort_transaction
parent.post_restore_hook # would be called automatically in the real case
puts "aborted transaction"
puts "parent.object_id: #{parent.object_id}"
puts "parent.children[0].parent.object_id: #{parent.children[0].parent.object_id}"
parent << Child.new
puts "parent.children[1].parent.object_id: #{parent.children[1].parent.object_id}"
Which produces the output:
parent.object_id: 3265800
parent.children[0].parent.object_id: 3265800
starting transaction
parent.children[1].parent.object_id: 3265800
aborting transaction
aborted transaction
parent.object_id: 3265800
parent.children[0].parent.object_id: 3265500
parent.children[1].parent.object_id: 3265500
This isn’t great: it doesn’t feel very Ruby to me, but it does get the job
done. It’s also not very efficient. After thinking about this for the better
part of an hour, Matz has suggested that there might be a very ugly hack
that’s possible that he’ll look at for me, which may be able to implement
everything in Transaction::Simple
.
- 2014-09-25: The code examples in this post have been modernized, and the output including
post_restore_hook
has been cleaned up.[ back ]