RubyConf 2006—Day 1 (Friday evening, 20 October 2006)


This article was modified from its original published form. The most recent modification was on2014-09-25.


Series: RubyConf 2006

Dinner tonight was at Old Chicago with Hal Fulton, Ara Howard, Patrick Hurley, Tim Pease, and various others whose names I can’t remember offhand. Great dinner, and I was able to fully explain the problem with why the Ruby extension situation on Windows is so bad.

I also started talking about the big problem that I have with Transaction::Simple and haven’t figured out how to solve in a general way (details below). They weren’t quite understanding it, so before Matz’s Roundtable, I showed them a test case that I had come up with while talking with Francis Cianfrocca.

The Roundtable was pretty short; not too many questions were asked this year, and the discussion didn’t continue for an hour as it did the year before. I was shot down when asking for “become” behaviour (related to the Transaction::Simple bug). After the Roundtable, I managed to snag Matz to talk about the problem which led me to request this. I showed him the test case:

require 'rubygems'
require 'transaction/simple'

class Child
  attr_accessor :parent
end

class Parent
  include Transaction::Simple

  attr_reader :children
  def initialize
    @children = []
  end

  def < <(child)
    child.parent = self
    @children << child
  end
end

parent = Parent.new
puts "parent.object_id: #{parent.object_id}"
parent << Child.new
puts "parent.children[0].parent.object_id: #{parent.children[0].parent.object_id}"
puts "starting transaction"
parent.start_transaction
parent << Child.new
puts "parent.children[1].parent.object_id: #{parent.children[1].parent.object_id}"
puts "aborting transaction"
parent.abort_transaction
puts "aborted transaction"
puts "parent.object_id: #{parent.object_id}"
puts "parent.children[0].parent.object_id: #{parent.children[0].parent.object_id}"
parent << Child.new
puts "parent.children[1].parent.object_id: #{parent.children[1].parent.object_id}"

producing the output:

parent.object_id: 3265800
parent.children[0].parent.object_id: 3265800
starting transaction
parent.children[1].parent.object_id: 3265800
aborting transaction
aborted transaction
parent.object_id: 3265800
parent.children[0].parent.object_id: 3265500
parent.children[1].parent.object_id: 3265800

This bug affects PDF::Writer’s table generation and contributes significantly to the high memory usage. What’s happening is that when you call Parent#start_transaction, Transaction::Simple creates a transaction checkpoint with Marshal.dump. When you call Parent#rewind_transaction or or Parent#abort_transaction, the transaction checkpoint is reverted. This reversion is extremely robust except for this one item. What we really need is something like self = Marshal.restore(checkpoint).

Obviously, that won’t work and this leads to the problem that is illustrated above. After long discussion with Tim Pease, Patrick Hurley, and Matz, we came up with a workaround that can work for the example bug and for PDF::Writer. It’s not super-efficient, though. Essentially, I will modify Transaction::Simple to have callback methods for post-processing after a transactional operation. Something like this:

# Assuming the Parent and Child classes from the broken test case.
class Parent
  def post_restore_hook
    @children.map! { |child|
      child.parent = self unless self.object_id == child.parent.object_id
      child
    }
  end
end

parent = Parent.new
puts "parent.object_id: #{parent.object_id}"
parent << Child.new
puts "parent.children[0].parent.object_id: #{parent.children[0].parent.object_id}"
puts "starting transaction"
parent.start_transaction
parent << Child.new
puts "parent.children[1].parent.object_id: #{parent.children[1].parent.object_id}"
puts "aborting transaction"
parent.abort_transaction
parent.post_restore_hook # would be called automatically in the real case
puts "aborted transaction"
puts "parent.object_id: #{parent.object_id}"
puts "parent.children[0].parent.object_id: #{parent.children[0].parent.object_id}"
parent << Child.new
puts "parent.children[1].parent.object_id: #{parent.children[1].parent.object_id}"

Which produces the output:

parent.object_id: 3265800
parent.children[0].parent.object_id: 3265800
starting transaction
parent.children[1].parent.object_id: 3265800
aborting transaction
aborted transaction
parent.object_id: 3265800
parent.children[0].parent.object_id: 3265500
parent.children[1].parent.object_id: 3265500

This isn’t great: it doesn’t feel very Ruby to me, but it does get the job done. It’s also not very efficient. After thinking about this for the better part of an hour, Matz has suggested that there might be a very ugly hack that’s possible that he’ll look at for me, which may be able to implement everything in Transaction::Simple.


  • 2014-09-25: The code examples in this post have been modernized, and the output including post_restore_hook has been cleaned up.[ back ]

Tags// ,