18 July 2013

Message to Ruby Rogues mailing list

TL;DR autoload looks great and is used all over rails, but it’s not threadsafe and Matz said not to use it, prefer require or require_relative. Which one is true; is somebody wrong on the internet?

Content-Encoding: links, quotes and discussion
Content-Disposition: confused

Some time ago I got in the habit of always calling require with an absolute path

require File.expand_path('../foo', __FILE__)

to ensure that all require arguments are full-paths and hence the same file is never loaded twice by a require statement. I also do this when loading files, but there’s no benefit I’m aware of.

In reading Jose Valim’s Crafting Rails Applications (awesome!) I notice he uses autoload a lot, which I thought was frowned upon due to race-conditions or something.. but then all over the rails code base, I see

autoload :Foo, 'relative/path/to/foo'

Since it always uses relative paths, as long as my code has a single entry point, such as in a gem, shouldn’t this be a really a really common technique? I mean, unless I want to load a file that doesn’t have a class in it, I should always use it, and then we don’t need something like https://github.com/rspec/rspec-core/blob/master/lib/rspec/core.rb#L1

require_rspec = if defined?(require_relative)
  lambda do |path|
    require_relative path
  lambda do |path|
    require "rspec/#{path}"

which is basically how rails autoloads files in https://github.com/rails/rails/blob/b025fca0c5/activesupport/lib/active_support/dependencies/autoload.rb where it just tries to guess the autoload path if not given

# This module allows you to define autoloads based on
# Rails conventions (i.e. no need to define the path
# it is automatically guessed based on the filename)
# module MyLib
# extend ActiveSupport::Autoload
# autoload :Model

so, autoload is great.. but wait, the old discussions are fairly serious that autoload isn’t threadsafe and shouldn’t be used

Currently autoload is not safe to use in a multi-threaded application. To put it more bluntly, it’s broken.

The current logic for autoload is as follows:

  1. A special object is inserted into the target constant table, used as a marker for autoloading
  2. When that constant is looked up, the marker is found and triggers autoloading
  3. The marker is first removed, so the constant now appears to be undefined if retrieved concurrently
  4. The associated autoload resource is required, and presumably redefines the constant in question
  5. The constant lookup, upon completion of autoload, looks up the constant again and either returns its new value or proceeds with normal constant resolution

The problem arises when two or more threads try to access the constant. Because autoload is stateful and unsynchronized, the second thread may encounter the constant table in any number of states:

  1. It may see the autoload has not yet fired, if the first thread has encountered the marker but not yet removed it. It would then proceed along the same autoload path, requiring the same file a second time.
  2. It may not find an autoload marker, and assume the constant does not exist.
  3. It may see the eventual constant the autoload was intended to define.

Of these combinations, (3) is obviously the desired behavior. (1) can only happen on native-threaded implementations that do not have a global interpreter lock, since it requires concurrency during autoload’s internal logic. (2) can happen on any implementation, since while the required file is processing the original autoload constant appears to be undefined.

other refs

Summary of discussion on the Ruby Rogues mailing list

So, to boil it all down, our consensus recommendations:

  1. When writing code that only uses rails * use autoload :ClassName, relative_path * this is okay since rails makes autoload not broken, and lazy loading is good. * otherwise, autoload isn’t threadsafe and should not be used
  2. Else
    1. in a gem or library that is in the load path, (e.g. lib)
      • use vanilla :require since in all rubies the argument will searched for via the load paths. e.g. require 'foo/bar' in lib/foo.rb and in lib/foo/bar.rb use require 'bar/baz' to require 'lib/foo/bar/baz.rb'
        • to achieve lazy loading, put the require statement in a method or block to be evaluated when needed
      • using :require_relative can speed up require time as it essentially uses the absolute path (equivalent to File.expand_path("../#{argument}", __FILE__)
        • some prefer library authors not use require_relative since it makes it impossible to mock/override the require in the test environment by manipulating the load path order. e.g. $:.unshift '.'; require 'foo_gem'
          • avoid using require File.expand_path('../foo', __FILE__) for the above reason
        • others prefer :require_relative whenever possible as it is faster https://rubyforge.org/pipermail/rspec-users/2011-November/020760.html
    2. in your own codebase e.g. your web app


Other comments from the discussion

Updates coming…