• Shortcuts : 'n' next unread feed - 'p' previous unread feed • Styles : 1 2

» Publishers, Monetize your RSS feeds with FeedShow:  More infos  (Show/Hide Ads)


Date: Tuesday, 18 Aug 2009 22:01

Seed Fu is one of the most common bootstrapping solutions for Rails. Bootstrapping is a technique for storing initial data with your application’s code. Seed Fu let’s you have dedicated fixture files for development and production: Use less data for development mode, or create default test users.

Bootstrapping a new developer database makes a great example. After building their database, a new developer may need to create a user:

1
$ script/runner User.create(:email => 'me@mydomain.com', :password => 'apass', :password_confirmation => 'apass')

Imagine you need to set up several users with settings and relationships, and this quickly becomes difficult to document. Seed Fu has you build fixture files in db/fixtures/ or db/fixtures/#{RAILS_ENV} that look like this:

1
2
3
4
5
User.seed(:email) do |s|
  s.email    = 'me@mydomain.com'
  s.password = 'apass'
  s.password_confirmation = s.password
end

And makes it easy for a new developer (or production deployment) to bootstrap:

1
$ rake db:seed

The argument to seed(:email) defines the columns checked before writing the row. If there is already a row with the email address me@mydomain.com, Seed Fu will update that row instead of inserting a new row. This let’s you run seed to update a database that already has content.

The bad news is Seed Fu was terribly slow on large datafiles and consumed RAM without freeing it, which means it never completed seeds of many large fixtures. The good news is I’ve got yer fix right here:

It’s waiting on the maintainer proper to merge upstream (though I haven’t heard back yet). Let’s see what changed.

Go Faster

On my 2Ghz Core 2 Duo, 7200 rpm hard-drive, 2G ram laptop:

1
2
3
real  114m1.482s
user  75m53.490s
sys  6m35.414s

And on a production server:

1
2
3
real  49m51.865s
user  27m4.381s
sys  1m6.247s

For importing 1223431 rows into a truncated database…269 seeds a second on my laptop, 753 seeds a second on production. Seed Fu is still checking for existing records and using ActiveRecord to add seeds. So what changed?

The biggest change is dropping ActiveRecord validations. Validations are slow monsters. The next logical step would be to stop using ActiveRecord all together, or at least toy with disabling callbacks, but that feels one step too far. Disabling validations means keeping valid data in your seeds becomes your responsibility. It’s a trade-off, but worth it.

Two smaller and 100% backwards compatible speed-ups are in this commit. The first walks the short constraints array instead of the longer data array when finding limiting conditions:

1
2
3
4
     def condition_hash
-      @data.reject{|a,v| !@constraints.include?(a)}
+      @constraints.inject({}) {|a,c| a[c] = @data[c]; a }
     end

And the second avoids hitting method missing after the first call:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
-    def set_attribute(name, value)
-      @data[name.to_sym] = value
-    end
-

     def method_missing(method_name, *args) #:nodoc:
-      if (match = method_name.to_s.match(/(.*)=$/)) && args.size == 1
-        set_attribute(match[1], args.first)
+      if args.size == 1 and (match = method_name.to_s.match(/(.*)=$/))
+        self.class.class_eval "def #{method_name} arg; @data[:#{match[1]}] = arg; end"
+        send(method_name, args[0])
       else
         super
       end

Method_missing is great for spreading some nice looking sugar around, but it was being hit several times for each seed! By creating a method and calling it directly later, we shave off more time.

Use Less IO, Memory

The seed file I used with a 1.2 million rows was 165M. Gzipped it is 16M. That means Less IO for our slow disks, and fewer obnoxious files in source control. Seed Fu now reads .rb.gz just like .rb files.

Seed Fu’s major failing point was that is grew to eat all RAM when dealing with gigabytes or even megabytes of fixtures. At one point, forking for each fixture looked like the only solution. It was clumsy and not very elegant.

Instead the better solution was to break up execution of the large seed files. Seed Fu reads a .rb.gz or .rb file into memory as a string. If it hits:

1
# BREAK EVAL

It evaluates everything it has just collected and starts again from after the comment. The memory usage on the 1.2 million row import was about 60M of RAM (not unheard of for a Rails process), but it stayed there the whole import.

Add a Generator For Large Fixtures

165M fixture files are not being written by hand. Chances are, if you run into issues with SeedFu and speed, you have data coming from a 3rd party. To keep the bootstrapping for your app as easy as rake db:seed, you need to create Seed Fu fixtures from XML, CSVs, Web Services, any kind of source.

Say hello to SeedFu::Writer! Use the writer to generate large fixtures that take advantage of # BREAL EVAL and the more concise seed_many syntax. Take a look:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
seed_writer = SeedFu::Writer::SeedMany.new(
  :seed_file  => SEED_FILE,
  :seed_model => 'City',
  :seed_by    => [ :city, :state ]
)

FasterCSV.foreach( CITY_CSV,
  :return_headers => false,
  :headers => :first_row
) do |row|

  # Do some logic on row...

  # Write the seed
  #
  seed_writer.add_seed({
    :zip => row['zipcode'],
    :state => row['state'],
    :city => row['city'],
    :latitude => row['latitude'],
    :longitude => row['longitude']
  })

end

seed_writer.finish

See more detail at the bottom of my Seed Fu fork’s github page. SeedFu::Writer::SeedMany takes several arguments upon initialization:

  • :seed_file - Where to write output (probably db/fixtures/my_fixture.rb).
  • :seed_model - Which model to seed.
  • :seed_by - An array of which columns to constrain the seed by.
  • :quiet - Setting this to true will quiet standard out.
  • :chunk_size - How many seeds to write before breaking evaluation. Default is 100.

And SeedFu::Writer::Seed takes an additional argument to add_seed to set the :seed_by columns for that particular seed.

Seed Fu is Better, Now It’s Your Turn

This is a nice step for Seed Fu that begins to make it a real solution for 100s of megabytes of data. The writer gets us closer to a repeatable cycle for importing 3rd party data (just store your conversion scripts with the app code).

Do you have an alternative plan for large data-sets in Rails? What would you like to see Seed Fu do next?

Author: "mixonic" Tags: "performance, Rails, rails, seed-fu, web ..."
Send by mail Print  Save  Delicious 
Date: Wednesday, 12 Aug 2009 12:27

Google wants your site to be faster. They want it so much, they made videos! You don’t want to watch videos though, you want to make your site faster, faster. Take an hour of time and make your website 4-5x faster using these 5 high-impact techniques:

  • Set HTTP Cache Headers
  • Gzip Web Server Output
  • Use Multiple, Cookie-less Domains for Assets
  • Bundle Javascript & CSS
  • Crush Image Assets

I’m going to use Rails for these examples, but you should have something comparable in any web framework. Rails got a bad rap on speed for a long time, but where the server-side code failed, the architecture won. HTTP and Rails are already in bed together, here’s how to join in.

Set HTTP Cache Headers

HTTP is your friend. Rails already handles adding timestamps to your assets URLs (that’s the ?19438273834 after the image URI. That’s why you always use image_tag), but you need to set up the HTTP headers yourself. In Apache 2 this looks like:

1
2
3
4
5
6
7
8
9
10
<VirtualHost *:80>
  # Your config...
  ExpiresActive On
  <FilesMatch "\.(ico|gif|jpe?g|png|js|css)$">
          ExpiresDefault "access plus 1 year"
          Header unset ETag
          FileETag None
          Header unset Last-Modified
  </FilesMatch>
</VirtualHost>

In Nginx:

1
2
3
4
5
6
server {
  # Your config...
  location ~* (css|js|png|jpe?g|gif|ico)$ {
    expires max;
  }
}

A warning to the wise- this works fairly seamlessly under Rails, but if you are on another framework be sure you have some kind of rolling argument to assets for new deploys. If not, the expires max will mean browsers will not try to pull down the new content you just deployed.

Gzip Web Server Output

Web pages, Javascript, and CSS are all text, and compress very nicely under gzip. This is a no-brainer to turn on, and it will have an immediate impact on your site. In Apache 2:

1
2
3
4
5
6
7
<VirtualHost *:80>
  # Your config...
  AddOutputFilterByType DEFLATE text/html text/plain text/xml application/xml application/xhtml+xml text/javascript text/css application/x-javascript
  BrowserMatch ^Mozilla/4 gzip-only-text/html
  BrowserMatch ^Mozilla/4\.0[678] no-gzip
  BrowserMatch \\bMSIE !no-gzip !gzip-only-text/html
</VirtualHost>

In Nginx:

1
2
3
4
5
6
7
8
server {
  # Your config...
  gzip             on;
  gzip_min_length  1000;
  gzip_proxied     expired no-cache no-store private auth;
  gzip_types       text/plain application/xml text/css application/javascript;
  gzip_disable     msie6;
}

Both of these configurations bypass IE6. That’s a darn shame, if only we could make IE6 load pages faster somehow…liiike….

Use Multiple, Cookie-less Domains for Assets

This technique requires you to update your DNS entries, but it is well worth the effort. There are 2 Rails-side parts to this. One is to always use image_tag, javascript_include_tag, etc. Don’t write your own tags. Then update your asset_host configuration:

1
2
3
4
5
6
7
8
# In config/environments/production.rb
ActionController::Base.asset_host = Proc.new { |source, request|
  if request and request.ssl?
    "https://www" # Just use one domain during SSL.  This avoids mixed content errors.
  else
    "http://www#{source.hash % 4}"
  end + ".coolapp.com"
}

Now configure A records for each asset domain:

1
2
3
4
www0.coolsite.com 12.34.56.78
www1.coolsite.com 12.34.56.78
www2.coolsite.com 12.34.56.78
www3.coolsite.com 12.34.56.78

Lastly, configure your server. The basic configuration (again, Apache 2):

1
2
3
4
5
6
7
8
9
<VirtualHost *:80>
  ServerName www.coolsite.com
  ServerAlias www0.coolsite.com
  ServerAlias www1.coolsite.com
  ServerAlias www2.coolsite.com
  ServerAlias www3.coolsite.com
  
  # rewrite rules, etc
</VirtualHost *:80>

Of course, that leaves the whole app available on any of those domains. We can use a require condition before routing to the app so only the assets are available:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
<VirtualHost *:80>
  ServerName www.coolsite.com
  ServerAlias www0.coolsite.com
  ServerAlias www1.coolsite.com
  ServerAlias www2.coolsite.com
  ServerAlias www3.coolsite.com
  
  # other config...

  # Redirect all non-static requests to cluster
  RewriteCond %{DOCUMENT_ROOT}/%{REQUEST_FILENAME} !-f
  RewriteCond %{HTTP_HOST} ^www.coolsite.com$ # Only if they are asking for www
  RewriteRule ^/(.*)$ balancer://coolsite_cluster%{REQUEST_URI} [P,QSA,L]
</VirtualHost *:80>

Set up multiple asset hosts now! If you have not done it, this is the most important technique on this page. It will make your website 3-4 times faster in most browsers. Yes, even Firefox will feel faster. Go, do it!

Bundle Javascript & CSS

Fewer downloads means less wait time for the browser. Bundle your Javascript and CSS into a single file for production. Super easy in Rails:

1
2
<%= javascript_include_tag 'mootools.js', 'lightbox.js', 'application.js', :cache => 'cache-application' %>
<%= stylesheet_link_tag 'lightbox.css', 'application.css' :cache => 'cache-application' %>

Crush Image Assets

Ok, this is the hardest thing on this list, and my example is the most Rails specific yet. You will need to install two applications: pngcrush and jpegtran. Both were easily installable in Gentoo, so check Ports or Apt or what-have-you. Now add this file to your project:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
# lib/tasks/assets.rake
namespace :assets do

  desc "Crush png images"
  task :crush_pngs do
    Dir['public/**/*.png'].each do |file|
      `pngcrush -rem alla -reduce -brute "#{file}" "#{file}.crushing"`
      `mv "#{file}.crushing" "#{file}"`
    end
  end

  desc "Crush jp(e)g images"
  task :crush_jpgs do
    ( Dir['public/**/*.jpg'] + Dir['public/**/*.jpeg'] ).each do |file|
      `jpegtran -copy none -optimize -perfect -outfile "#{file}.crushing" "#{file}"`
      `mv "#{file}.crushing" "#{file}"`
    end
  end

  desc "Crush images"
  task :crush_images do
    %w( assets:crush_pngs assets:crush_jpgs ).each do |task|
      Rake::Task[task].invoke
    end
  end

end

Now you can

1
rake assets:crush_images

Review the changes, commit and push it live. These utilities are loss-less; They only strip fatty metadata not needed by browsers. Run it every time you make a large number of image changes.

Be Fast & Prove It

Before you start using these techniques, check out Firebug, Yslow, and Google Page Speed. Run the excellent IE focused WebPagetest. Gather some hard numbers before you make a change, then do some follow-up tests. Be amazed. Be fast. Get back to building your great site.

Do you have a dead simple, quick and high-impact web technique? Share it with us!

Author: "mixonic" Tags: "Rails, rails, speed, web"
Send by mail Print  Save  Delicious 
Date: Thursday, 24 Jul 2008 20:01

On June 12th MooTools 1.2 was released, to great rejoicing. It’s a release that really sets MooTools apart with better Fx, more browser compatibility effort, and a jaw dropping element storage feature. That means it’s time to update the table sort script I’ve blogged here before (The Joy of a Minimal, Complete Javascript Table Sort and The Joy of an Optimized, Complete Javascript Table Sort). This release does more than just port, and also adds a few features:

  • Sorts more out of the box:
    • strings
    • numbers
    • decimal currency (12.34, 4.50)
    • dates (YYYY-MM-DD, YYYY-M-D)
    • relative dates (1 day ago, 38 years ago)
    • disk memory (1.75 MB, 34 KB, 8 TB)
  • Passes the matcher into the conversion_function for re-use.
  • Classes set for forward and reverse sorting th tags.
  • A “don’t sort” class.
  • It’s on github!
  • Integration with the brand new pagination library.

Oh yeah, and that. There’s now a pagination library supporting all the same things like expanding rows. It’ll do multiple pagination link areas and drop offset and cut-off numbers into the DOM. Let’s take a look.

The Old: Reviewing SortingTable

SortingTable doesn’t need any wonky DOM rewrite, that’s one of it’s best perks. Just use a thead and tbody, and make proper use of th.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
<table cellpadding="0" cellspacing="0" id="sort_this">
  <thead>
    <tr>
      <th>a header</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>a value
    </tr>
    <tr>
      <td>another value</td>
    </tr>
  </tbody>
</table>

To sort this table you could use the most simplistic of initializations:

1
new SortingTable( 'sort_this' );

These are the default settings that are assumed:

1
2
3
4
5
6
7
new SortingTable( 'sort_table', {
  zebra: true,                        // Stripe the table, also on initialize
  paginator: false,                   // Pass a paginator object
  dont_sort_class: 'nosort',          // Class name on th's that don't sort
  forward_sort_class: 'forward_sort', // Class applied to forward sort th's
  reverse_sort_class: 'reverse_sort'  // Class applied to reverse sort th's
});

So if you wanted to not sort a given column, just add class=”nosort” and it’ll be ignored. You could also change that to any other class. The same goes for the classes forward_sort and reverse_sort.

The passing of conversion_matcher into the conversion_function makes it more DRY to use regex on a td for sort. Take a look at the date sorter:

1
2
3
4
5
6
7
8
      // YYYY-MM-DD, YYYY-m-d
      { matcher: /(\d{4})-(\d{1,2})-(\d{1,2})/,
        conversion_function: function( row ) {
          var cell = $(row.row.getElementsByTagName('td')[this.sort_column]).get('text');
          cell = this.conversion_matcher.exec( cell );
          return new Date(parseInt(cell[1]), parseInt(cell[2], 10) - 1, parseInt(cell[3], 10));
        }
      },

We get to re-use that regex right in the conversion_function.

That’s it for incremental changes to SortingTable. PaginatingTable is the thing to introduce.

The New: Introducing PaginatingTable

To paginate a table you’ll need to add one DOM element:

1
<ul id="sort_table_pagination"></ul>

And then paginating the simple table provided above would look like:

1
new PaginatingTable( 'sort_table', 'sort_table_pagination' );

That’s the simplest way to go. A more complex pagination might involve two paginators as well as displaying offsets and cutoffs.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
<ul id="sort_table_pagination"></ul>
Now showing items <span id="offset"></span> - <span id="cutoff"></span>
<table cellpadding="0" cellspacing="0" id="sort_this">
  <thead>
    <tr>
      <th>a header</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>a value
    </tr>
    <tr>
      <td>another value</td>
    </tr>
  </tbody>
</table>
<ul id="sort_table_bottom_pagination"></ul>

To start this up:

1
2
3
4
5
6
new PaginatingTable( 'sort_table', ['sort_table_pagination', 'sort_table_bottom_pagination'], {
  per_page: 3,         // Only 3 items per page, please
  current_page: 2,     // Start on page 2
  offset_el: 'offset', // Use this id for offset numbers
  cutoff_el: 'cutoff'  // And this id for cutoffs
});

Swank. This will paginate, but you don’t get any sorting or zebra striping. That’s still tied to SortingTable, and you’ll need to use them together.

PaginatingTable also provides update_pages on a given instance- so if you add rows to a table you can have the pagination update itself.

Will It Blend? SortingTable and PaginatingTable Together

SortingTable will accept a PaginatingTable object and use it to reset to the first page when you sort. Just pass it in:

1
2
3
new SortingTable( 'sort_table', {
  paginator: new PaginatingTable( 'sort_table', 'sort_table_pagination' )
});

If you have expanding rows, you’ll need to tell both SortingTable and PaginatingTable about it:

1
2
3
4
new SortingTable( 'sort_table', {
  details: true
  paginator: new PaginatingTable( 'sort_table', 'sort_table_pagination', { details: true } )
});

You can check out the example to see this in action, and get the javascript from github. A big thanks to all the people that wrote conversion functions and made tweaks to this script, and kept it crawling from an example into a more capable library. Enjoy!

Author: "mixonic"
Send by mail Print  Save  Delicious 
Date: Thursday, 10 Jul 2008 23:55

Rspec is a tasty testing suite for Rails. It’s stubbing can be enhanced by using Mocha, a stubbing framework. There is a spattering of documentation to get you started, but a few controller level items were challenging to test:

  • Included modules
  • Filtered parameters
  • Before filters
  • Response codes
  • Facebook redirects

And I had to pick up a few model testing tricks too:

  • Association testing
  • ActionMailer testing

Let’s look at some good approaches for each of these.

Testing Tricks for Rails Controllers

Included modules

1
2
3
  it "should include AuthenticatedSystem" do
    controller.class.included_modules.should include(AuthenticatedSystem)
  end

Filtered parameters - This basically runs the filtering code over some parameters, then we test the output. Not the most ideal way, but the best I’ve come up with.

1
2
3
4
  it "should filter credit_cards" do
    controller.send(:filter_parameters, 'credit_card' => 'nogood')\
      ['credit_card'].should == '[FILTERED]'
  end

Before filters

1
2
3
  it "should have a before_filter for login_required" do
    controller.class.before_filters.should include( :login_required )
  end

Response codes - This is not hard, but I always seem to forget. Test the code, not the status message:

1
2
3
  it "should return 200 success" do
    response.code.should == '200'
  end

Facebook redirects - Diving into Facebooker and the Facebook platform has been an…uh…engaging experience. One thing that took me a while to realize: Facebooker alters redirect_to to use facebook’s own redirection tag. Don’t test them like normal redirects.

1
2
3
4
5
6
  it "redirects on facebook based signup" do
    controller.stubs(:request_is_for_a_facebook_canvas?).returns(true)
    create_user
    # Test against the body.  Maybe this could even use has_tag.
    response.body.should =~ /<fb:redirect url="\/home" \/>/
  end

Testing Tricks for Rails Models

Model testing is really very straight ahead once you start feeling comfortable. Some strange points for me were around model associations and ActionMailer.

Model associations - The technique I use is a mashup of association reflection to_hash and a deep_merge method for ruby. deep_merge in general is pretty convenient when testing. Add this to spec/spec_helper.rb:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
# Associations to hash
module ActiveRecord
  module Reflection
    class AssociationReflection
      def to_hash
        {
          :macro => @macro,
          :options => @options,
          :class_name => @class_name || @name.to_s.singularize.camelize
        }
      end
    end
  end
end

# Hash#deep_merge
# From: http://pastie.textmate.org/pastes/30372, Elliott Hird
# Source: http://gemjack.com/gems/tartan-0.1.1/classes/Hash.html
# This file contains extensions to Ruby and other useful snippits of code.
# Time to extend Hash with some recursive merging magic.


class Hash

  # Merges self with another hash, recursively.
  # 
  # This code was lovingly stolen from some random gem:
  # http://gemjack.com/gems/tartan-0.1.1/classes/Hash.html
  # 
  # Thanks to whoever made it.

  def deep_merge(hash)
    target = dup

    hash.keys.each do |key|
      if hash[key].is_a? Hash and self[key].is_a? Hash
        target[key] = target[key].deep_merge(hash[key])
        next
      end

      target[key] = hash[key]
    end

    target
  end


  # From: http://www.gemtacular.com/gemdocs/cerberus-0.2.2/doc/classes/Hash.html
  # File lib/cerberus/utils.rb, line 42

  def deep_merge!(second)
    second.each_pair do |k,v|
      if self[k].is_a?(Hash) and second[k].is_a?(Hash)
        self[k].deep_merge!(second[k])
      else
        self[k] = second[k]
      end
    end
  end


#-----------------

   # cf. http://subtech.g.hatena.ne.jp/cho45/20061122
   def deep_merge2(other)
      deep_proc = Proc.new { |k, s, o|
         if s.kind_of?(Hash) && o.kind_of?(Hash)
            next s.merge(o, &deep_proc)
         end
         next o
      }
      merge(other, &deep_proc)
   end


   def deep_merge3(second)

      # From: http://www.ruby-forum.com/topic/142809
      # Author: Stefan Rusterholz

      merger = proc { |key,v1,v2| Hash === v1 && Hash === v2 ? v1.merge(v2, &merger) : v2 }
      self.merge(second, &merger)

   end

   def keep_merge(hash)
      target = dup
      hash.keys.each do |key|
         if hash[key].is_a? Hash and self[key].is_a? Hash
            target[key] = target[key].keep_merge(hash[key])
            next
         end
         #target[key] = hash[key]
         target.update(hash) { |key, *values| values.flatten.uniq }
      end
      target
   end

end

That creates both to_hash and deep_merge. The tests look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
  it "should belong to an owner, which is a User" do
    assoc = Home.reflect_on_association(:owner).to_hash
    assoc.deep_merge({
      :macro => :belongs_to,
      :options => { :class_name => 'User' }
    }).should == assoc
  end

  it "should have many tree_types, through trees" do
    assoc = Home.reflect_on_association(:tree_types).to_hash
    assoc.deep_merge({
      :macro => :has_many,
      :options => { :through => :trees }
    }).should == assoc
  end

This allows for very exact tests on model associations, and useful error messages when they fail. It’s a little too verbose, and there may be something on github that has already moved down this line.

ActionMailer - Definitely test your models separate from your notifier. For instance, test an email is sent upon User creation:

1
2
3
4
  it "should send an email on user creation" do
    UserNotifier.expects(:deliver_new_user_notification)
    User.create( @valid_user_params )
  end

That’s all that’s needed. None of that flushing the unsent email cache you might have seen around the web googling for this. Now test the UserNotifier:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
shared_examples_for "mysite.com email" do

  it "should have a prefix on the subject" do
    @email.subject.should =~ /[My Site] /
  end

  it "should be from noreply" do
    @email.from.should == ['noreply@mysite.com']
  end

  it "should be multi-part" do
    @email.parts.length.should be(2)
  end

end

describe UserNotifier do

  describe "when sending a new user e-mail" do

    before(:each) do
      @user = User.create( @valid_user_params )
      @email = UserNotifier.create_forgot_password(@user)
    end

    it_should_behave_like "mysite.com email"

    it "should be sent to the user's email address" do
      @email.to.should == [@user.email]
    end

    it "should contain the activation_code" do
      @email.body.should =~ /#{@user.activation_code}/
    end

  end

end

Oh slick. Of course you should flavour to taste, but these tricks are what got me rolling. I hope to post some Facebooker testing tricks as soon as I have some important things like notifications figured nicely out. What did you have to figure out?

Author: "mixonic"
Send by mail Print  Save  Delicious 
Date: Wednesday, 25 Jun 2008 12:46

“Cross site request forgery” is also known as CSRF, XSRF or just request forgery (more at wikipedia and sans.org). It’s a method of attack toward web applications- Rails 2.0 introduced a defence and Rails 2.1 enabled that defence by default. Call form_for…

1
<% form_for @friend, :action => 'create' do |f| %>

and Rails spits out more than just the form, it also generates a secret authenticity_token:

1
2
3
4
<form action="/friends" method="post">
<div style="margin:0;padding:0">
  <input name="authenticity_token" type="hidden" value="e8c827c47577e013cc4c06a99cab63da95b71915" />
</div>

AJAX submission of this particular form will include the authenticity token. So here’s the rub: what if the form is generated by something else?

1
new Element('form', { 'action': '/friends'});

Or what if there is no form?!

1
new Request.Json( ... ).post();

In these cases Rails would raise an ActionController::InvalidAuthenticityToken exception. To avoid an exception the CSRF check can be altogether disabled for a controller:

1
skip_before_filter :verify_authenticity_token

Of course this is a compromise between convenience and security. For those who want security and AJAXy goodness (well, with MooTools at least), there is a better way.

Using AuthenticityToken with MooTools

MooTools has a wonderful object oriented codebase. First Rails needs to pass the authenticity_token into Javascript, then we can use the inheritance from MooTools to pass the authenticity_token with every request.

The authenticity_token can be passed in the header of the Rails application layout. It’s just one line:

1
<%= javascript_tag "const AUTH_TOKEN = #{form_authenticity_token.inspect};" if protect_against_forgery? %>

And now pass that token with every POST request (GET requests don’t check CSRF):

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
// Include authenticity_token in Request.JSON fires
//
// move the send function
Request.prototype._send = Request.prototype.send
Request.implement({

  auth_token: function(){
    return AUTH_TOKEN;
  },

  // This is more verbose than is ideal, but I don't see a better place
  // to hook this functionality in
  send: function(options){
    var type = $type(options);
    if (type == 'string' || type == 'element') options = {data: options};

    var old = this.options;
    options = $extend({data: old.data, url: old.url, method: old.method}, options);

    switch ($type(options.data)){
      case 'element': options.data = $(options.data).toQueryString(); break;
      case 'object': case 'hash': options.data = Hash.toQueryString(options.data);
    }

    // If this isn't a get request add the authenticity_token
    if (options.method != 'get' || options.method != 'GET')
      options.data = options.data+'&authenticity_token='+this.auth_token();
    
    // Call the original send
    this._send(options)
  }

});

Look at that, now every AJAX call from MooTools will attach our CSRF. All the ease of normal javascript, all the security of the native Rails defences.

Author: "mixonic"
Send by mail Print  Save  Delicious 
Date: Friday, 20 Jun 2008 18:07

When you’re done poking through this, check out the final version of this script at The Joy of Cows On Tables

A few days ago we walked through writing our own mootools-based table sort in the Joy of a Minimal, Complete Javascript Table Sort. A poster going by “hello there” raised a good point about performance:

“would be nice to see the example with several hundred rows - performance in sorting is a huge issue, and looking at the zillions of libraries out there… many of them konk out completely, taking 5 seconds to sort a table with 1,000 rows.”

Right. Javascript should be used to enhance a user’s experience. 5 seconds of wait times for a table sort is completely asinine. Let’s look at some quick ways to optimize our code, and uncover a slick way to double the speed of our sorting.

The Easy Stuff

I’m going to use The Firebug Firefox plugin for my analysis. These changes should have pretty a universal effect though. Let’s look at some sorting times on 1000 rows:

Table Sort Performance

Ok, so about a second and a half on my 2.0 Ghz C2D and 2G RAM laptop. Not as bad as the 5 seconds we were worried about, but not really great. Some easy targets in optimization stand out:

  1. removeClass - I’d bet dollars to donuts this can be tweaked.
  2. .length() is checked every loop.

.removeClass() is a function in mootools. It looks like this:

1
2
3
4
5

  removeClass: function(className){
    this.className = this.className.replace(new RegExp('(^|\\s)' + className + '(?:\\s|$)'), '$1').clean();
    return this;
  },

We run removeClass at least 1000 times after a sort on our table. The className our code always passes to removeClass is “alt”. We can avoid the initialization of 1000 RegExp objects if we save one initialized RegExp somewhere. This is an easy change that’ll save us as much as 300ms of time:

1
2
3
4
5
6
7
8

var SortingTable = new Class({
 
  // And here it is
  removeAltClassRe: new RegExp('(^|\\s)alt(?:\\s|$)'),

  initialize: function( table, options ) {
    this.options = $merge({

Now that same RegExp needs to be used where we before called removeClass. For example, in stripe_table():

1
2
3
4
5
6
7
8
9

      counter++;
    }
    // tr.removeClass( 'alt' );
    // Now use our already existing RegExp
    tr.className = tr.className.replace( this.removeAltClassRe, '$1').clean();
    if ( !(( counter % 2 ) == 0) ) {
      tr.addClass( 'alt' );   
    }

One down.

.length() was our other easy tweak. Instead of looping with:

1
2
3

   while (this.rows.length > 0) {
     var row = this.rows.shift();

We can consolidate those lines down to:

1
2

   while ( row = this.rows.shift() ) {

Neat.

With those two small tweaks, things have been speed up slightly. Performance on the 1000 row table hovers around 2.1 seconds at best. We can do better, but things are going to get weird.

The Good Stuff

Russel over at lindsay.ie.au found something neat out. The native sort() method is far faster if you don’t pass it a function to sort with. Internally, sort() calls toString() on every array element it sorts, so if we overload toString, we can take advantage of that huge speed boost:

Remember this from the middle of the sort_by_header function?

1
2
3
4
5
6

    this.rows.each(function(row){
      row.compare_value = this.conversion_function( row );
    }.bind( this ));
    this.rows.sort( this.compare_rows.bind( this ) );
  }

If we overload toString for the elements on this.rows, we won’t need to pass a function into sort.

1
2
3
4
5
6
7

    this.rows.each(function(row){
      row.compare_value = this.conversion_function( row );
      row.toString = function(){ return this.compare_value }
    }.bind( this ));
    this.rows.sort();
  }

Now sort should be super fast.

And it is very fast, it takes about 1.2 seconds to sort 1000 rows in Firefox. The difference on Internet Explorer under VMware isn’t as large, but it is noticeable. The big fault is that we’re now tied to how sort() sorts. Alphabetically.

That means we can’t sort numbers properly. We’ll end up with

1
2
3
4
5
6
7
8

mixonic@pandora ~/Projects/table $ js
js> [ 0, 1, 2, 11 ].sort();
0,1,11,2
js> // So instead, lets pad numbers into strings
js> [ '000', '001', '002', '011' ].sort();
000,001,002,011
js>

As “minroi_aoi” mentioned in the last post, sorting with numbers was funky. The solution was to pass real integers out of the conversion function instead of strings. getText() always returns a string. parseInt() is the javascript function to convert them to integers:

1
2
3
4
5
6
7
8

     // Numbers
      { matcher: /^\d+$/,
        conversion_function: function( row ) {
          var cell = $(row.row.getElementsByTagName('td')[this.sort_column]).getText();
          return parseInt(cell);
        }
      },

As we saw above though, we need a padded string now, not an integer. Our number function will have to look like this:

1
2
3
4
5
6
7
8

      // Numbers
      { matcher: /^\d+$/,
        conversion_function: function( row ) {
          var cell = $(row.row.getElementsByTagName('td')[this.sort_column]).getText();
          return '0000000000'.substr(0,10-cell.length).concat(cell);
        }
      },

And if you want to sort integers longer than 10 digits, you’d need to expand the pad string and the offset. There is a tradeoff here: storing the strings for sort takes up more memory than just the number would. In this script, that memory is only taken up while sorting, after that the memory is freed.

All of that left us with this new script:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149

//
// new SortingTable( 'my_table', {
//   zebra: true,     // Stripe the table, also on initialize
//   details: false   // Has details every other row
// });
//
// The above were the defaults.  The regexes in load_conversions test a cell
// begin sorted for a match, then use that conversion for all elements on that
// column.
//
// Requires mootools Class, Array, Function, Element, Element.Selectors,
// Element.Event, and you should probably get Window.DomReady if you're smart.
//

var SortingTable = new Class({

  removeAltClassRe: new RegExp('(^|\\s)alt(?:\\s|$)'),

  initialize: function( table, options ) {
    this.options = $merge({
      zebra: true,
      details: false
    }, options);
    
    this.table = $(table);
    
    this.tbody = $(this.table.getElementsByTagName('tbody')[0]);
    if (this.options.zebra) {
      SortingTable.stripe_table( this.tbody.getElementsByTagName( 'tr' ) );
    }

    this.headers = new Hash;
    var thead = $(this.table.getElementsByTagName('thead')[0]);
    $each(thead.getElementsByTagName('tr')[0].getElementsByTagName('th'), function( header, index ) {
      var header = $(header);
      this.headers.set( header.getText(), { column: index } );
      header.addEvent( 'mousedown', function(evt){
        var evt = new Event(evt);
        this.sort_by_header( evt.target.getText() );
      }.bind( this ) );
    }.bind( this ) );

    this.load_conversions();
  },

  sort_by_header: function( header_text ){
    this.rows = new Array;
    var trs = $A(this.tbody.getElementsByTagName( 'tr' ));
    while ( row = trs.shift() ) {
      row = { row: row.remove() };
      if ( this.options.details ) {
        row.detail = trs.shift().remove();
      }
      this.rows.unshift( row );
    }
    
    var header = this.headers.get( header_text );
    if ( this.sort_column >= 0 && this.sort_column == header.column ) {
      // They were pulled off in reverse
    } else {
      this.sort_column = header.column;
      if (header.conversion_function) {
        this.conversion_function = header.conversion_function;
      } else {
        this.conversion_function = false;
        this.rows.some(function(row){
          var to_match = $(row.row.getElementsByTagName('td')[this.sort_column]).getText();
          if (to_match == ''){ return false }
          this.conversions.some(function(conversion){
            if (conversion.matcher.test( to_match )){
              this.conversion_function = conversion.conversion_function;
              return true;
            }
            return false;
          }.bind( this ));
          if (this.conversion_function){ return true; }
          return false;
        }.bind( this ));
        header.conversion_function = this.conversion_function.bind( this );
        this.headers.set( header_text, header );
      }
      this.rows.each(function(row){
        row.compare_value = this.conversion_function( row );
        row.toString = function(){ return this.compare_value }
      }.bind( this ));
      this.rows.sort();
    }

    var index = 0;
    while ( row = this.rows.shift() ) {
      row.row.injectInside( this.tbody );
      if (row.detail){ row.detail.injectInside( this.tbody ) };
      if ( this.options.zebra ) {
        row.row.className = row.row.className.replace( this.removeAltClassRe, '$1').clean();
        if (row.detail){
          row.detail.className = row.detail.className.replace( this.removeAltClassRe, '$1').clean();
        }
        if ( ( index % 2 ) == 0 ) {
          row.row.addClass( 'alt' );
          if (row.detail){ row.detail.addClass( 'alt' ); }
        }
      }
      index++;
    }
    this.rows = false;
  },

  load_conversions: function() {
    this.conversions = $A([
      // YYYY-MM-DD, YYYY-m-d
      { matcher: /\d{4}-\d{1,2}-\d{1,2}/,
        conversion_function: function( row ) {
          var cell = $(row.row.getElementsByTagName('td')[this.sort_column]).getText();
          var re = /(\d{4})-(\d{1,2})-(\d{1,2})/;
          cell = re.exec( cell );
          return new Date(parseInt(cell[1]), parseInt(cell[2], 10) - 1, parseInt(cell[3], 10));
        }
      },
      // Numbers
      { matcher: /^\d+$/,
        conversion_function: function( row ) {
          var cell = $(row.row.getElementsByTagName('td')[this.sort_column]).getText();
          return '00000000000000000000000000000000'.substr(0,32-cell.length).concat(cell);
        }
      },
      // Fallback 
      { matcher: /.*/,
        conversion_function: function( row ) {
          return $(row.row.getElementsByTagName('td')[this.sort_column]).getText();
        }
      }
    ]);
  }

});

SortingTable.stripe_table = function ( tr_elements  ) {
  var counter = 0;
  $$( tr_elements ).each( function( tr ) {
    if ( tr.style.display != 'none' && !tr.hasClass('collapsed') ) {
      counter++;
    }
    tr.className = tr.className.replace( this.removeAltClassRe, '$1').clean();
    if ( !(( counter % 2 ) == 0) ) {
      tr.addClass( 'alt' );   
    }
  }.bind( this ));
}

Now a bit over twice as fast as before.

Again, you can pull this script down as javascript or see an example.

Don’t forget to find the final version of this script in The Joy of Cows On Tables

Author: "mixonic" Tags: "Javascript, javascript, optimize, perfor..."
Send by mail Print  Save  Delicious 
Date: Friday, 20 Jun 2008 18:03

When you’re done poking through this, take a peek at how you can double the speed of this script in The Joy of an Optimized, Complete Javascript Table Sort, and then check out the final version of it at The Joy of Cows On Tables

Ah table sorting. There are few problems that have been solved as many times as you have been. Unfortunately, some of the nicest solutions, such as mootable, are also pretty overbearing. Check out this feature list:

  1. Total re-styling of your table.
  2. Editable table cells.
  3. Loading table contents from JSON.
  4. Loading table contents from JSON over XHR.
  5. Server-side sorting using the above.
  6. Client-side sorting.
  7. Re-ordering of columns, column options.
  8. Nice fade effects.
  9. Event hooks.

Whoa, too much. Over at ICA we needed something way more lightweight. I was pretty much looking for this:

  1. Client-side sort of various formats (like mm/dd/yy).
  2. Zebra or striped tables.
  3. Support sorting with hidden rows on the table.
  4. Be fairly fast.
  5. Use mootools (which we already use).
  6. Use a table already on the DOM.

Let’s take a look at how to make a javascript table sort that follows best practices, is relatively minimal, and fast. Nothing here is completely new stuff, but hopefully walking through it will help you write a better table sort next time you need just a table sort, and not all the overhead of a library. I’ll be using mootools sort of aggressively, but the core ideas and practices here are portable to any environment.

Here we go!

HTML Assumptions and Javascript Style

Our code is going to make a few assumptions. Assumptions are always a tradeoff. They can be a detriment if you don’t know what the assumptions are, but are a great enhancement to your consistency and coding speed if you know what they are. We’re going to assume the table we want to sort’s HTML looks something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

<table cellpadding="0" cellspacing="0" id="sort_this">
  <thead>
    <tr>
      <th>a header</th>
      ...
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>a value
      ...
    </tr>
    <tr>
      <td>another value</td>
      ...
    </tr>
  </tbody>
</table>

Take note of a few things:

  1. We gave the table an id, “sort_this”
  2. We used thead and tbody sections
  3. We used “th” tags for the headers

All of that is really just good HTML, the semantic use of th and td, for example, is just good HTML table markup.

Javascript can be kludged onto a page in a thousand different ways, we’re going to stick with three tenets:

  1. Be unobtrusive (keep javascript out of our HTML).
  2. Use objects and instances to keep our code reusable (and able to work with multiple tables).
  3. Use options as a hash for readability.

Let’s look at a basic mootools javascript object:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

var SortingTable;
SortingTable = new Class({

  initialize: function( table, options ){
    ...
  },

  sort_by_header: function( text ){
    ...
  }

});

SortingTable.stripe_table = function( tr_elements ){

}

You can see how we added instance and class functions. “initialize” is an instance function run when we instantiate our object, “sort_by_headers” is an instance function we call on an object.

“stripe_table” is added in a different manner, that’s because “stripe_table” is a class function, we want to be able to use it without instantiating “SortingTable” at all. When we use SortingTable, it should look like this:

1
2

new SortingTable( 'sort_this' );

Or maybe if we have options:

1
2

new SortingTable( 'sort_this', { zebra: false } );

We could create an instance and call sort_by_methods:

1
2
3

var sorting_table = new SortingTable( 'sort_this', { zebra: false } );
sorting_table.sort_by_header( 'name' );

Or call a class method without an instance:

1
2

SortingTable.stripe_table( $$( '#sort_this tbody tr' ) );

Alright, we have a basic set of common practices we can base our code off. Let’s dive into a very basic table sort script.

Basic Javascript Table Sorting

Now this is only a starting point, the table sort below is so basic it isn’t very user friendly. It is a good place to start understanding how we sort things in general, and deal with the tables on the DOM.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52

var SortingTable;
SortingTable = new Class({

  initialize: function( table, options ) {
    this.table = $(table);
    this.tbody = $(this.table.getElementsByTagName('tbody')[0]);
    
    this.headers = new Hash;
    var thead = $(this.table.getElementsByTagName('thead')[0]);
    $each(thead.getElementsByTagName('tr')[0].getElementsByTagName('th'), function( header, index ) {
      var header = $(header);
      this.headers.set( header.getText(), { column: index } );
      header.addEvent( 'mousedown', function(evt){
        var evt = new Event(evt);
        this.sort_by_header( evt.target.getText() );
      }.bind( this ));
    }.bind( this ) );
  },

  sort_by_header: function( header_text ){
    this.rows = new Array;
    var trs = this.tbody.getElements( 'tr' );
    while ( trs.length > 0 ) {
      var row = { row: trs.shift().remove() };
      this.rows.unshift( row );
    }

    var header = this.headers.get( header_text );
    if ( this.sort_column >= 0 && this.sort_column == header.column ) {
      // They were pulled off in reverse
    } else {
      this.sort_column = header.column;
      this.rows.sort( this.compare_rows.bind( this ) );
    }

    while (this.rows.length > 0) {
      var row = this.rows.shift();
      row.row.injectInside( this.tbody );
    }
    this.rows = false;
  },

  compare_rows: function( r1, r2 ) {
    r1.compare_value = $(r1.row.getElementsByTagName('td')[this.sort_column]).getText();
    r2.compare_value = $(r2.row.getElementsByTagName('td')[this.sort_column]).getText();
    if ( r1.compare_value > r2.compare_value ) { return  1 }
    if ( r1.compare_value < r2.compare_value ) { return -1 }
    return 0;
  }

});

Oh the bitter pill of javascript. Let’s boil it down, there really isn’t too much going on here. In the initialize (which remember, is run as soon as we instantiate):

  1. We find some nodes on the DOM so we don’t need to find them again later: this.table, this.tbody and this.thead.
  2. In $each, we walk all of the “th” tags in thead and do two things:
    1. add the innerText as a key in this.headers, with a value that includes the index, or which column we’re dealing with. This is a map for later.
    2. add a “mousedown” event handler to the “th” tag, firing sort_by_header with the th’s innerText

Stay with me. Look at the initialize again:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

  initialize: function( table, options ) {
    this.table = $(table);
    this.tbody = $(this.table.getElementsByTagName('tbody')[0]);
    
    this.headers = new Hash;
    var thead = $(this.table.getElementsByTagName('thead')[0]);
    $each(thead.getElementsByTagName('tr')[0].getElementsByTagName('th'), function( header, index ) {
      var header = $(header);
      this.headers.set( header.getText(), { column: index } );
      header.addEvent( 'mousedown', function(evt){
        var evt = new Event( evt );
        this.sort_by_header( evt.target.getText() );
      }.bind( this ));
    }.bind( this ) );
  },

See the lines that close out the each and addEvent functions? They’re using .bind( this ) to attach the internal “this” of the function back to our instantiated object. It’s a nice trick that let’s us have a “mousedown” event that is attached to our object.

Deep breath here, let’s step through the sort_by_header section.

1
2
3
4
5
6
7
8

  sort_by_header: function( header_text ){
    this.rows = new Array;
    var trs = this.tbody.getElements( 'tr' );
    while ( trs.length > 0 ) {
      var row = { row: trs.shift().remove() };
      this.rows.unshift( row );
    }

So this creates an internal array of rows, then proceeds to walk all the tr’s on tbody. For each one, it is doing two things: Shifting one tr off the array of trs, and using .remove() to drop it off the DOM. Each row is stuffed in an object and added to the top of the this.rows array.

It’s important that it’s added to the beginning of the rows array, because that means if all we needed to do was reverse the rows, we can just replay this array one by one and attach it’s rows to the DOM again. Reverse without calling .reverse(), nice.

1
2
3
4
5
6
7
8

    var header = this.headers.get( header_text );
    if ( this.sort_column >= 0 && this.sort_column == header.column ) {
      // They were pulled off in reverse
    } else {
      this.sort_column = header.column;
      this.rows.sort( this.compare_rows.bind( this ) );
    }

Here we pull the header object out of our headers hash using the passed text (which is the innerText from the attached event in initialize), and then compare it to our last sort_column. It they are the same, we can move on and just re-insert the rows. If they’re different, we need to call this.rows.sort().

.sort() is a javascript method for sorting an array, it’s native to the language. By default, .sort() will sort rows in alphabetical order. To sort in any other way, you can pass it a function, and it’ll use answers of 1, -1 and 0 to figure out the row order. In this example, we’re telling sort to use this.compare_rows, and also reminding it that compare_rows should be run on our current object.

Learn more about sort at w3schools.

1
2
3
4
5
6
7

    while (this.rows.length > 0) {
      var row = this.rows.shift();
      row.row.injectInside( this.tbody );
    }
    this.rows = false;
  },

This section is the meat- we take out sorted or reversed array, shift it’s contents off the top one by one, and add then to this.tbody. Shifting them off gets them out of memory as we move along so we don’t leave arrays sitting in RAM.

That’s the brunt of table sorting in Javascript right there. sort_by_header ripped rows off the table, reversed or sorted them, and then reinserted them onto the DOM.

Our actual table sort logic is the following:

1
2
3
4
5
6
7
8
9
10

  compare_rows: function( r1, r2 ) {
    r1.compare_value = $(r1.row.getElementsByTagName('td')[this.sort_column]).getText();
    r2.compare_value = $(r2.row.getElementsByTagName('td')[this.sort_column]).getText();
    if ( r1.compare_value > r2.compare_value ) { return  1 }
    if ( r1.compare_value < r2.compare_value ) { return -1 }
    return 0;
  }

});

compare_rows get’s two arguments from sort, two rows objects to compare. The text of the td cells is fetched and compared. Pretty straight ahead here, the only trickery is finding out what column to use by reaching into this.sort_column. I like having the whole row in there to compare, it opens to door to having secondary sorting by another column (like iTunes’ “Album By Artist” sorting).

Huzzah! we can nearly rejoice. I’d go back and look over the code we just walked through, if you can understand what’s going on up there, this next block of hackery should make perfect sense.

Why Our Simple Sort Sucks

There are some problems with the simple script above:

  1. It pulls the a given td cell off the DOM and does conversion on it multiple times, that makes it slow.
  2. It isn’t flexible enough to sort mm/dd/yy.

Those are two pretty damning faults, so let’s clean them up before adding any new features.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95

var SortingTable;
SortingTable = new Class({

  initialize: function( table, options ) {
    this.table = $(table);    
    this.tbody = $(this.table.getElementsByTagName('tbody')[0]);

    this.headers = new Hash;
    var thead = $(this.table.getElementsByTagName('thead')[0]);
    $each(thead.getElementsByTagName('tr')[0].getElementsByTagName('th'), function( header, index ) {
      var header = $(header);
      this.headers.set( header.getText(), { column: index } );
      $(header).addEvent( 'mousedown', function(evt){
        var evt = new Event(evt);
        this.sort_by_header( new evt.target.getText() );
      }.bind( this ));
    }.bind( this ) );

    this.load_conversions();
  },

  sort_by_header: function( header_text ){
    this.rows = new Array;
    var trs = this.tbody.getElements( 'tr' );
    while ( trs.length > 0 ) {
      var row = { row: trs.shift().remove() };
      this.rows.unshift( row );
    }

    var header = this.headers.get( header_text );
    if ( this.sort_column >= 0 && this.sort_column == header.column ) {
      // They were pulled off in reverse
    } else {
      this.sort_column = header.column;
      if (header.conversion_function) {
        this.conversion_function = header.conversion_function;
      } else {
        this.conversion_function = false;
        this.rows.some(function(row){
          var to_match = $(row.row.getElementsByTagName('td')[this.sort_column]).getText();
          if (to_match == ''){ return false }
          this.conversions.some(function(conversion){
            if (conversion.matcher.test( to_match )){
              this.conversion_function = conversion.conversion_function;
              return true;
            }
            return false;
          }.bind( this ));
          if (this.conversion_function){ return true; }
          return false;
        }.bind( this ));
        header.conversion_function = this.conversion_function.bind( this );
        this.headers.set( header_text, header );
      }
      this.rows.each(function(row){
        row.compare_value = this.conversion_function( row );
      }.bind( this ));
      this.rows.sort( this.compare_rows.bind( this ) );
    }

    while (this.rows.length > 0) {
      var row = this.rows.shift();
      row.row.injectInside( this.tbody );
    }
    this.rows = false;
  },

  compare_rows: function( r1, r2 ) {
    if ( r1.compare_value > r2.compare_value ) { return  1 }
    if ( r1.compare_value < r2.compare_value ) { return -1 }
    return 0;
  },
  
  load_conversions: function() {
    this.conversions = $A([
      // YYYY-MM-DD, YYYY-m-d
      { matcher: /\d{4}-\d{1,2}-\d{1,2}/,
        conversion_function: function( row ) {
          var cell = $(row.row.getElementsByTagName('td')[this.sort_column]).getText();
          var re = /(\d{4})-(\d{1,2})-(\d{1,2})/;
          cell = re.exec( cell );
          return new Date(parseInt(cell[1]), parseInt(cell[2], 10) - 1, parseInt(cell[3], 10));
        }
      },
      // Fallback 
      { matcher: /.*/,
        conversion_function: function( row ) {
          return $(row.row.getElementsByTagName('td')[this.sort_column]).getText();
        }
      }
    ]);
  }

});

this.load_conversions(); is the big new thing in initialize. It’s that function at the end of the class that has and array of hashes each with a “matcher” and “conversion_function”.

Really, the main difference is in the sorting section of sort_by_header, in the meat and bones:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26

      this.sort_column = header.column;
      if (header.conversion_function) {
        this.conversion_function = header.conversion_function;
      } else {
        this.conversion_function = false;
        this.rows.some(function(row){
          var to_match = $(row.row.getElementsByTagName('td')[this.sort_column]).getText();
          if (to_match == ''){ return false }
          this.conversions.some(function(conversion){
            if (conversion.matcher.test( to_match )){
              this.conversion_function = conversion.conversion_function;
              return true;
            }
            return false;
          }.bind( this ));
          if (this.conversion_function){ return true; }
          return false;
        }.bind( this ));
        header.conversion_function = this.conversion_function.bind( this );
        this.headers.set( header_text, header );
      }
      this.rows.each(function(row){
        row.compare_value = this.conversion_function( row );
      }.bind( this ));
      this.rows.sort( this.compare_rows.bind( this ) );

Ok, don’t get thrown. Javascript’s weird features mean loops are pretty messy. Notice the first time the rows are walked we use “.some(“. Some is a mootools array function that acts like each until the function returns true, then it breaks the loop. This is what’s going on here:

  1. See if we have a conversion_function. If we don’t…
    1. Walk through td’s in the column.
    2. If it’s innerText is blank, go to the next element.
    3. Walk through the available conversions.
    4. If a conversion matches the matcher, assign it to this.conversion_function and break the loop
    5. Save the conversion_function on the header object so we don’t need to find it later.
  2. Walk all our row objects and run the conversion_function on each, save it onto the row.
  3. Run sort with compare_rows.

compare_rows, you can see, now expects to sort with the compare_value:

1
2
3
4
5
6

  compare_rows: function( r1, r2 ) {
    if ( r1.compare_value > r2.compare_value ) { return  1 }
    if ( r1.compare_value < r2.compare_value ) { return -1 }
    return 0;
  },

By adding new conversions to load_conversions, you can support sorting of all kinds of different formats and sub-columns. And you’ll only be running that conversion once on a cell (until you sort another column and come back, this doesn’t do aggressive caching of the whole table in memory).

The Sweet Smell Of Success

All that’s needed now is a sprinkling of zebra or striped tables, and stuffing some extra baggage onto the row objects, and we’ll support those last two features:

  1. Zebra or striped tables.
  2. Hidden/expandable rows.

It looks something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145

//
// new Star.Table( 'my_table', {
//   zebra: true,     // Stripe the table, also on initialize
//   details: false,  // Has details every other row
// });
//
// The above were the defaults.  The regexes in load_conversions test a cell
// begin sorted for a match, then use that conversion for all elements on that
// column.
//
// Requires mootools Class, Array, Function, Element, Element.Selectors,
// Element.Event, and you should probably get Window.DomReady if you're smart.
//

var SortingTable;
SortingTable = new Class({

  initialize: function( table, options ) {
    this.options = $merge({
      zebra: true,
      details: false
    }, options);
    
    this.table = $(table);
    
    this.tbody = $(this.table.getElementsByTagName('tbody')[0]);
    if (this.options.zebra) {
      SortingTable.stripe_table( this.tbody.getElements( 'tr' ) );
    }

    this.headers = new Hash;
    var thead = $(this.table.getElementsByTagName('thead')[0]);
    $each(thead.getElementsByTagName('tr')[0].getElementsByTagName('th'), function( header, index ) {
      var header = $(header);
      this.headers.set( header.getText(), { column: index } );
      header.addEvent( 'mousedown', function(evt){
        var evt = new Event(evt);
        this.sort_by_header( evt.target.getText() );
      }.bind( this ));
    }.bind( this ) );

    this.load_conversions();
  },

  sort_by_header: function( header_text ){
    this.rows = new Array;
    var trs = this.tbody.getElements( 'tr' );
    while ( trs.length > 0 ) {
      var row = { row: trs.shift().remove() };
      if ( this.options.details ) {
        row.detail = trs.shift().remove();
      }
      this.rows.unshift( row );
    }

    var header = this.headers.get( header_text );
    if ( this.sort_column >= 0 && this.sort_column == header.column ) {
      // They were pulled off in reverse
    } else {
      this.sort_column = header.column;
      if (header.conversion_function) {
        this.conversion_function = header.conversion_function;
      } else {
        this.conversion_function = false;
        this.rows.some(function(row){
          var to_match = $(row.row.getElementsByTagName('td')[this.sort_column]).getText();
          if (to_match == ''){ return false }
          this.conversions.some(function(conversion){
            if (conversion.matcher.test( to_match )){
              this.conversion_function = conversion.conversion_function;
              return true;
            }
            return false;
          }.bind( this ));
          if (this.conversion_function){ return true; }
          return false;
        }.bind( this ));
        header.conversion_function = this.conversion_function.bind( this );
        this.headers.set( header_text, header );
      }
      this.rows.each(function(row){
        row.compare_value = this.conversion_function( row );
      }.bind( this ));
      this.rows.sort( this.compare_rows.bind( this ) );
    }

    var index = 0;
    while (this.rows.length > 0) {
      var row = this.rows.shift();
      row.row.injectInside( this.tbody );
      if (row.detail){ row.detail.injectInside( this.tbody ) };
      if ( this.options.zebra ) {
        row.row.removeClass( 'alt' );
        if (row.detail){ row.detail.removeClass( 'alt' ); }
        if ( ( index % 2 ) == 0 ) {
          row.row.addClass( 'alt' );
          if (row.detail){ row.detail.addClass( 'alt' ); }
        }
      }
      index++;
    }
    this.rows = false;
  },

  compare_rows: function( r1, r2 ) {
    if ( r1.compare_value > r2.compare_value ) { return  1 }
    if ( r1.compare_value < r2.compare_value ) { return -1 }
    return 0;
  },
  
  load_conversions: function() {
    this.conversions = $A([
      // YYYY-MM-DD, YYYY-m-d
      { matcher: /\d{4}-\d{1,2}-\d{1,2}/,
        conversion_function: function( row ) {
          var cell = $(row.row.getElementsByTagName('td')[this.sort_column]).getText();
          var re = /(\d{4})-(\d{1,2})-(\d{1,2})/;
          cell = re.exec( cell );
          return new Date(parseInt(cell[1]), parseInt(cell[2], 10) - 1, parseInt(cell[3], 10));
        }
      },
      // Fallback 
      { matcher: /.*/,
        conversion_function: function( row ) {
          return $(row.row.getElementsByTagName('td')[this.sort_column]).getText();
        }
      }
    ]);
  }

});

SortingTable.stripe_table = function ( tr_elements  ) {
  var counter = 0;
  $$( tr_elements ).each( function( tr ) {
    if ( tr.style.display != 'none' && !tr.hasClass('collapsed') ) {
      counter++;
    }
    tr.removeClass( 'alt' );   
    if ( !(( counter % 2 ) == 0) ) {
      tr.addClass( 'alt' );   
    }
  }.bind( this ));
}

Nice.

That was a lot of ground to cover, as in, way more than I had any intention of covering :-). I’ve made a lot of assumptions (remember those?) about your javascript fu in this post, but if you have any questions just ask!

You can pull this code down as javascript or take a look at some running examples, including how to use hidden rows.

Also look at optimization steps and an updated script at The Joy of an Optimized, Complete Javascript Table Sort, and then check out the final script in The Joy of Cows On Tables

Author: "mixonic" Tags: "Javascript, javascript, sort, table"
Send by mail Print  Save  Delicious 
Date: Wednesday, 18 Jun 2008 14:30

Ok, so I load up Firefox 3 and go to change some settings, you know, try reverting all the tweaks I’ve made to Firefox 2. I enter the incantation “about:config” into the URL bar and get this:

What Warranty?]

What? Void my warranty? I get a warranty with this open source software? That’s a new one to me.

No Warranty

Oh wait, yeah, I don’t. Please Mozilla, stick with the warnings about understanding the settings and don’t use inappropriate legalese. The whole “void your warranty” phrase is the fear-mongering motto of anti reverse engineering corporates everywhere. Firefox is open source, it should embrace everything the opposite, encourage me to explore and experiement.

To top it off, making me click on I’ll be careful, I promise! is just a little too cute and condescending. I’m feeling a lack of UI polish so far in Firefox 3. I’ll cover a few other things after I organize my thoughts, but this was just too good to wait on.

Author: "mixonic"
Send by mail Print  Save  Delicious 
Date: Friday, 25 Apr 2008 19:23

My Mephisto install was 0.7.3- 0.8 doesn’t have any huge feature changes but there are some bugfixes and small enhancements, as well as a performance boost (speed, memory use) from using rails 2.0. My install was from a tarball, and I don’t want to move my production server to a git checkout (yet?). Like other Mephisto users I’ve got theme customizations and maybe even some things in the public directory.

I need to upgrade, not reinstall. The Mephisto folks didn’t write a real upgrade guide, but this is Rails and it’s not too hard to figure out. Almost any rails app could be updated this way:

  • Back Everything Up
  • Create & Apply A Patch
  • Stop Your Server & Clean Up
  • Patch Code & Migrate The Database
  • Crack Open A Beer

Back Everything Up

Back up the current code and database.

1
2
cp -a my_blog my_blog.bck
mysqldump -uroot --opt mephisto_production > pre0.8.sql

There’s your safety net.

Create A Patch

The trick here is to create a patch between the version of Mephisto you’re currently running and 0.8. For me that was 0.7.3, so that’s what I’ll use below. Fetch the 0.7.3 and the 0.8 source:

1
2
wget http://s3.amazonaws.com/mephisto-blog/mephisto-0.7.3.tar.gz
wget http://github.com/technoweenie/mephisto/tarball/master.tar.gz

And this is weird right here. Justin has linked the 0.8 release from the Mephisto download page as the git master tarball. This means you don’t know what rev of git you will actually get when you click on it.

So there isn’t even really an 0.8 release, there’s just a link to a dynamic current master. Bad form.

I ended up with 9072b487bf45c5e41e33c66b32d94aea84732d1b, you might get something else. You could get the rev I used by fetching it directly:

1
wget http://github.com/tarballs/technoweenie-mephisto-9072b487bf45c5e41e33c66b32d94aea84732d1b.tar.gz

Now untar the two packages. You get a nice “mephisto-0.7.3” directory for 0.7.3, and something messy like “technoweenie-mephisto-9072b487bf45c5e41e33c66b32d94aea84732d1b” for 0.8. Make a diff between the two:

1
diff -Nur mephisto-0.7.3 technoweenie-mephisto-9072b487bf45c5e41e33c66b32d94aea84732d1b > ~/mephisto-0.7.3_to_0.8.patch

Or you can download the 0.7.3 to 0.8 patch from me, it weighs in at a hefty 4.9M.

Stop Your Server & Clean Up

Everyone has a custom Rails setup, so just stop your server the proper way. After that give it a good:

1
rake tmp:clear

To flush out the caches and such.

Patch Code & Migrate The Database

Now that the server is turned off and cache is cleared, try patching.

1
2
cd /var/ww/my_blog
patch -p1 < ~/mephisto-0.7.3_to_0.8.patch

Mostly you’ll just see “patching file ….” flying by, but you may get a changed file question like this:

1
2
patching file public/.htaccess
Reversed (or previously applied) patch detected!  Assume -R? [n]

You can likely answer “n” to both the Assume and the Apply question:

1
2
3
4
5
6
7
patching file public/.htaccess
Reversed (or previously applied) patch detected!  Assume -R? [n] n
Apply anyway? [n] n
Skipping patch.
1 out of 1 hunk ignored -- saving rejects to file public/.htaccess.rej
patching file public/install.html
...

I also had some custom filters added to environment.rb, and got this:

1
2
3
4
5
patching file config/environment.rb
Hunk #3 FAILED at 44.
1 out of 3 hunks FAILED -- saving rejects to file config/environment.rb.rej
patching file config/environments/development.rb
...

A failed hunk means patching that file didn’t work, usually because you have local modifications. In case the solution is easy: You want to use the version of config/environment.rb that came with 0.8, and put your customizations into config/initializers/custom.rb. Copy the good config/environment.rb from the 0.8 tarball you extracted earlier right over the local file. You can still find your customizations in config/environment.rb.orig and copy them from there to the new file for environment customizations, config/initializers/custom.rb.

Install Gems (Well, I Had To)

Before I could migrate I had to install tzinfo.

1
sudo gem install tzinfo

Maybe you will too.

Migrate The Database

Compared to applying the patch this is easy:

1
2
cd /var/www/my_blog
RAILS_ENV=production rake db:migrate

My database updated with no problems at all.

Crack Open A Beer

Start up your mongrel, thin, eventd_mongrel or other server and crack open that frosty brew. You’ve just upgraded yourself to the best release of the self-proclaimed best blogging system ever.

So what kind of beer are you enjoying today?

Author: "mixonic"
Send by mail Print  Save  Delicious 
Date: Thursday, 24 Apr 2008 17:09

As a side project I’ve helped some friends launch Liquidware, a simple storefront for their Arduino modules and other open/hobby hardware. They created a decent amount of buzz about their launch by saving up a bunch of content and pushing it out the same weekend. Blog posts, video, images, all that good stuff:

That was good noise, and it got them enough traffic to sell out of a few stocked items (woohoo!). But after the launch bubble a steady stream of sales needs to come from Google. I’ve been experimenting with the SEO on Liquidware’s site and learned a few smart tricks for your next storefront.

Plan Your HTML For SEO

Most of this comes for free if you’re a good standards-adherent web designer. Use alt tags for images, title tags for href, try and keep your content earlier on the page than navigation. Google looks for keywords in a couple of places, you should use all of these:

  • Title tag
  • Keywords meta-tag
  • Description meta-tag
  • Headers, especially h1
  • Link tags
  • URL
  • Content blocks

The nice thing about using an easily customizable cart (like Substruct, which we used) if that much of this can be automated. Product pages at liquidware.com use the first few lines of the product description as the description meta-tag content. They add their category and name to the keywords content, title tag and h1 header. For that matter, the name of every product is also in the URL.

Test & Tune Your Keyword Content

Once you put up your site at a demo or production URL run it through Google’s keyword tool. Liquidware gets these results:

Liquidware Keywords

This will give you an idea of how Google will see your site later on. You can use the keyword ideas to find alternate wordings that may index better. Even better, drop your competition’s site into the keyword tool and see if you should bother competing on their keywords or not.

Use Analytics & Webmaster Tools

Really use these tools. The Google webmaster tool will tell you when the big G indexes your site, who links to you, what their keywords are, and what your site’s keywords are. It’ll even tell you where you fall in search query results, and what position your site was in when it was clicked. It’s got some other great features too.

Google analytics is a no brainer, it’s got great info on your site, as well as being able to show you how people arrive at a given page. Even better, making a purchase can be identified as a goal under analytics, showing you how many users start the checkout process and how many complete it, as well as where they came from. This guy has a nice video that sums up goals quite well: Google Analytics: Working With Goals.

Use Google Base

This is the step that really helps push your products further out front. Google Base is a way to inject items into google’s search index, among them individual products. When you search for “arduino” on Google this sneaks right into the results list:

Arduino Product Search Results

Liquidware’s products should be there! Google base let’s us add them:

Google Product Search For Hack Pack

There are a couple was to do this, and they’re all quite confusingly documented. They break down to this:

  • Upload a tsv or xml file
  • Have google fetch the former from a webserver on a regular cycle
  • Use an API and your own cart software

We’ve combined the first two: added a Liquidware products RSS feed for Google to fetch nightly, and uploaded it the first time to seed the process. This is a great solution, allowing the Liquidware guys to update products and prices on their store and see them change on Google nightly. Liquidware does have product variations, which Google base does not, so each variation is spun off as another product.

Stay Active

Incoming links have the largest effect on ranking. Pushing updates to youtube will get attention and a quick bursts of links, but the SEO and Google Base techniques above will keep up every-day traffic.

Oh, I’ve been talking about releasing an updated version of the Mootools table sort script I posted on, and that’s still coming. Moving from Nashville to New York was slightly distracting, but there’s a full set of conversion_functions and other features to introduce.

Author: "mixonic"
Send by mail Print  Save  Delicious 
Date: Friday, 22 Feb 2008 15:49

So just two days ago Obama swept Hawaii for his 10th Democratic primary win in a row. Barack in particular has been beat up for lacking real content in his message, though I personally think his actions have spoken loudly (he taught constitutional law, supported Net-neutrality, and helped push the ethics reform bill). Using tag clouds for visualizing messages has been done before, with pretty interesting results

That example was nice for a snapshot of each candidate, but I’m looking to dig into the data for a single candidate a little more. I want to compare tag clouds for Obama to see how his message has changed over time.

Word Frequency Analysis

First we need some speeches. I’ve chosen these:

  1. DNC Speech in 2004
  2. Winning Iowa
  3. Losing New Hampshire
  4. Winning super-tuesday
  5. Winning Wisconsin
  6. Austin Debate

Cleaned up of APPLAUSE and MR. BARACK prompts, we can get to ripping some word frequencies. There are surely easier ways to do this, but I’ve written a bash script to handle it.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
#!/bin/bash
#
# parse.sh prints out a json/javascript style word frequency list
#
echo 'frequencies: {'
EXCLUDES=`cat exclude.txt |sed -e s/\\\\\\(.*\\\\\\)/\\\\\\|\\\1/ |tr -d '[:space:]'`
tr ' ' '
'<$1 |\
sed -e 's/[^a-zA-Z0-9]//g'|\
tr '[:upper:]' '[:lower:]'|\
sort |\
grep -Eiv "^(\ $EXCLUDES)$" |\
uniq -c |\
grep -iv ^\\\ *[0-9]*\\\ *$ |\
grep -iv ^\\\ *[12345]\\\ .*$ |\
sed -e 's/\( *\)\([0-9]*\)\ \([^ ]*\)/  "\3": \2,/'
echo '}'

You can download is as parse_1.sh note: Wrap the keys in double quotes to keep IE and Safari happy. This code assumes a file called exclude.txt that contains common words. I’m using the 100 most common English words, you can get that list as exclude.txt. The parse script will also drop words with a frequency below 6. Output is formatted to drop right into a javascript file.

1
2
3
4
5
6
7
8
9
10
11
12
frequencies: {
  "country": 5,
  "hope": 11,
  "just": 6,
  "led": 8,
  "me": 9,
  "moment": 8,
  "never": 10,
  "new": 6,
  "our": 9,
  "us": 7,
}

There we go, some word frequencies. Now to draw a basic tag cloud.

Tag Cloud Markup

Some basic cloud markup:

1
2
3
4
5
6
7
8
9
10
<style type="text/css">
ol.cloud { width: 300px; }
ol.cloud li { display:inline; padding: 2px 5px; }
ol.cloud li.hidden { padding: 2px 4px; }
</style>

<ol id="example_1_cloud" class="cloud">
<li style="font-size:14px;">A tag</li>
<li style="font-size:22px;">A big important tag</li>
</ol>

The important part of the CSS is the inline display of list elements. That’s what lets them wrap onto new lines, along with having newlines after the list element tags.

Drawing A Cloud In Javascript

This code uses frequency lists like the one generated earlier and styles them like the markup used above. It accepts a request like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
my cloud = new TagCloud( 'example_cloud', {
  clouds: {
    convention: {
      frequencies: {
        blue: 4,
        country: 9,
        even: 5,
        expect: 3
      }
    },
    iowa: {
      frequencies: {
        blue: 4,
        country: 2,
        even: 5,
        let: 1
      }
    }
  }
});
// Now draw one
cloud.draw('convention');

Lines right up with the frequency lists we generated on the command line. It could also accept some arguments for customized clouds:

1
2
3
4
5
6
7
8
my cloud = new TagCloud( 'example_cloud', {
  tag_class: "tag", // By default a class of "tag" is set on all tags, change it here
  hidden_class: "hidden", // By default a class of "hidden" is applied to hidden tags
  tag_sizes: [ '8px', '16px', '30px' ], // Set as many size increments as you like
  clouds: {
    // The clouds
  }
});

So you can set three size increments or 15, whatever amount of detail you want.

TagCloud looks like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
//
// TagCloud requires mootools v1.11 with these modules:
//
// Class.Extras, Array, Number, Element.Event, Element.Selectors,
// Window.DomReady, Fx.Style, Fx.Styles, Fx.Elements, Fx.Transitions,
// Hash
//
var TagCloud = new Class({

  initialize: function( cloud, options ) {
    this.options = $merge({
      clouds: {},
      tag_class: 'tag',
      hidden_class: 'hidden',
      tag_sizes: [ '8px', '12px', '18px', '20px', '22px', '24px', '26px', '28px' ]
    }, options);
    this.cloud = $(cloud);
    this.depth = this.options.tag_sizes.length;
    this.tags = $A();
    $each(this.options.clouds, function(v, k){
      this.reset_bounds();
      $each(v.frequencies, function(v2, k2){
        this.expand_bounds(v2);
      }.bind( this )); 
      $each(v.frequencies, function(v2, k2){
        this.update_tag( k, k2, v2 );
      }.bind( this )); 
    }.bind( this )); 
    this.sort_tags();
  },

  update_tag: function(cloud, tag_content, frequency){
    var found = this.tags.some(function(tag, i) {
      if (tag.content == tag_content) {
        tag.cloud_weights.set(cloud, this.get_weight(frequency));
        return true;
      }
      return false;
    }.bind( this ));
    if (!found){
      var cloud_weights = new Hash();
      cloud_weights.set(cloud, this.get_weight(frequency));
      tag = { content: tag_content, cloud_weights: cloud_weights };
      tag.toString = function(){ return this.content }
      this.tags.push(tag);
    }
  },

  get_weight: function( frequency ){
    var class_i = Math.floor(
      parseFloat(
        ((frequency-this.lower) / (this.upper-this.lower)),
          this.depth
      ) * this.depth
    );
    if (class_i == this.depth) class_i = class_i - 1;
    return class_i;
  },

  reset_bounds: function(){
    this.lower = 99999999999;
    this.upper = 0;
  },

  expand_bounds: function(v){
    if (v > this.upper) this.upper = v;
    if (v < this.lower) this.lower = v;
  },

  sort_tags: function( cloud_name ){
    this.tags.sort();
  },

  draw: function( cloud_name ) {
    $each(this.tags, function( tag, i ){
      if (!tag.element) {
        tag.element = new Element( 'li', {
          'rel': 'tag',
          'class': this.options.tag_class
        });
        tag.element.setHTML(tag.content).injectInside( this.cloud );
        tag.fx = new Fx.Styles( tag.element );
        this.cloud.appendText("\n");  
      }
        
      if ( ''+tag.cloud_weights.get(cloud_name) != 'NaN' &&
           ''+tag.cloud_weights.get(cloud_name) != 'null') {
        if (this.options.hidden_class)
          tag.element.removeClass(this.options.hidden_class);
        tag.fx.start({
          'opacity': 1,
          'font-size': this.options.tag_sizes[tag.cloud_weights.get(cloud_name)]
        });
        return;
      }

      if ( tag.element.getStyle('opacity') != 0 ) {
        tag.fx.start({
          'opacity': 0,
          'font-size': 0
        });
        if (this.options.hidden_class)
          tag.element.addClass(this.options.hidden_class);
        return;
      }
    }.bind( this ));
  }

});

Give it a try!

As always, you can download morphing_cloud.js or play with an html example of democratic campaign speeches. Take a look at how the use of “hope” has changed over time, and how more details have emerged in recent speeches. I’m looking forward expanding this to look at some other candidates.

Author: "mixonic" Tags: "morph javascript tag cloud mootools"
Send by mail Print  Save  Delicious 
Date: Monday, 04 Jun 2007 16:53

Oh Ferret, how lovely your speed, how confusing your documentation. Maybe we’ll go over that in another post, but for now, let’s see how we can make Ferret a bit kinder to normal users by better understanding their queries.

Over at FindYourDoc we’re searching not only a large number of records, but a large variety of fields. When we get a query such as:

  Doctor in Nashville TN

You and I know the user has given us a lot to go on, but Ferret doesn’t. When humans look at that query we pull out a human understanding.

  • Doctor - a type of care provider
  • in - a throwaway word
  • Nashville - a city
  • TN - a state

So if I were a programmer (imagine that), I would form the same query using Ferret’s query syntax:

  type:Doctor city:Nashville state:TN

Well great, but our visitors are not programmers, they’re grandmothers and dog trainers, patients and college students. Let’s look at that query again, and see how we could break it down:

  • Doctor & TN - These are phrase that look for a match in specific sets.
  • Nashville - matches data in a very large set.
  • in - throwaway word.

So let’s see how the raw query might be moved closer to my programmers query.

Regex to the Rescue

Wow, that was a whole intro paragraph with no code! Let’s take a basic transformation that we could do to move toward our programmer’s query, that of changing TN to state:TN. Code time!

1
2
3
4
5
6
7
%w( AL AZ AR CA CO CT DE FL GA HI ID
    IL IA KS KY LA ME MD MA MI MN
    MS MD MT NE NV NH NJ NM NY
    NC ND OH OK OR PA RI SC SD TN
    TX UT VT VA WA WV WI WY ).each do |state|
  query.gsub!( /(?:\A|\s)(#{state})(?=\s|\z)/i, " state:#{state}" )
end

And what a block of code it is. Naturally, we wouldn’t normally have a big block of states there, it would be in self.states or something similar. So what’s going on?

  query.gsub!( /(?:\A|\s)(#{state})(?=\s|\z)/i, " state:#{state}" )

For each state we run this gsub line. The regex in it has three parts:

  (?:\A|\s)

This section is a “grouping”. We know it’s a grouping because it’s in (). The magic of this particular grouping is the use of ?:, which tells the regex engine this is not a grouping to be saved for reference later on. It will require this group to be matched, but when we replace TN with state:TN we won’t want to replace what this grouping matched. That’s why we have ?:.

Inside our non-referenced group we have a short snippet “\A|\s”. Well, that’s simple:

  • \A - The beginning of the string
  • \s - A space or other white-space

The pipe symbol, |, is an “or” in regex. So we have a non-referenced group that matches either the beginning of the string, or a space. The next segment is easy:

  (#{state})

Super easy. We’re matching a state. Note that the state is in (), which means this is our match to be replaced later on. Our last section is quite similar to the first:

  (?=\s|\z)

Look at how we’ve used ?= inside the (). Adding ?= first thing in our parentheses turns it into a look-ahead assertion. We’re looking forward in the string to see if we can find something, but we’re not storing it for later. The \s|\z is looking for:

  • \s - a space or other white-space
  • \z - the end of the string

So remembering |, we want to find a space or the end of the string after our match. Take a peek at the whole thing again:

1
2
3
4
5
6
7
%w( AL AZ AR CA CO CT DE FL GA HI ID
    IL IA KS KY LA ME MD MA MI MN
    MS MD MT NE NV NH NJ NM NY
    NC ND OH OK OR PA RI SC SD TN
    TX UT VT VA WA WV WI WY ).each do |state|
  query.gsub!( /(?:\A|\s)(#{state})(?=\s|\z)/i, " state:#{state}" )
end

Notice the regex also uses an i at the end. That will make our regex case insensitive so we can match TN and tn. Also notice that we don’t look for IN. Well, sorry Indiana, but we don’t want queries like:

  Cardiology in New York

to become:

  Cardiology state:IN New York

It just wouldn’t work.

Wash, Rinse, Repeat

Well, neato, what other kinds of data could we apply this same technique to? Two types that I can come up with:

  • Discrete values in a set (matching a state)
  • Structured values (like a zipcode)

Take a peek at an example of the latter:

  query.gsub!( /(?:\A|\s)([0-9]{5})(?=\s|\z)/i, ' zipcode:\1' )

We’re looking for 5 numbers, then tacking zipcode: onto the front of them. We’ve taken our human understanding of a structure and explained it to ferret.

Ferret has a concept of weighting certain fields, and that can help tweak your results to better match your queries, but tricks like this can help a lot. Searching for:

  Doctor in Nashville TN

Without FindYourDoc’s query tweaking, my top result is scored at ~0.46. With it turned on, the top result is ~7.97. That’s a sign Ferret is doing much better at understanding what I was asking for. We can’t manage to trap city names, since there’s just too many, but we can trap the provider type and state to get this:

  type:Doctor in Nashville state:TN

Other fields besides provider type and state are captured by our query tweaker as well. Those tweaks give every visitor a personal programmer to help re-phrase what they say, and that makes our results far better for grandmothers and dog trainers. Try it our on your own Ferret site, it doesn’t disappoint.

Author: "mixonic" Tags: "code, Ferret, FindYourDoc, parse, Rails,..."
Send by mail Print  Save  Delicious 
Date: Friday, 01 Jun 2007 13:58

Geocoding is like a spicy pepper, it provides an impressive kick, but should normally be sprinkled in moderation. Now that rails has GeoKit, it’s even super-easy to do.

Once you fight your way through the API-only documentation.

But lucky you, we’re going to walk through a basic GeoKit set up modeled after what we used over at FindYourDoc (we launch next week-ish):

  1. Install GeoKit
  2. Find out where a visitor is from and stuff it in a cookie
  3. Test our geocoding (‘cause duh, you’re testing, right?)
  4. Use javascript to put it somewhere

At the end of this you’ll be customizing pages for visitors based on their location, but without killing off your action_caches. Won’t that be a ball?

Installing GeoKit

This one’s easy, like any rails plugin if you run:


ruby script/plugin install -x svn://rubyforge.org/var/svn/geokit/trunk

You’ll install it as a SVN external (and then you can use piston to manage it!), or you can run:


ruby script/plugin install svn://rubyforge.org/var/svn/geokit/trunk

and just pull it down locally. Next we’ll need some configuration lovin’. You’ll want to go get a Google Maps API key for http://localhost:3000/ for while you develop, and chances are you’ll want another key for production. GeoKit’s installer appends a bunch of lines to your config/environment.rb, find the one that looks like this:


GeoKit::Geocoders::GOOGLE='REPLACE_WITH_YOUR_GOOGLE_KEY'

and drop the key Google gave you in there. If you got a key for your production install, you could always:

1
2
3
4
5
6
if RAILS_ENV == 'production'
  GeoKit::Geocoders::GOOGLE='MyProductionKey'
else
  # This will work for most people doing normal dev on localhost
  GeoKit::Geocoders::GOOGLE='MyDevKey'
end

You’ll also want to tell GeoKit to only use Google:


GeoKit::Geocoders::PROVIDER_ORDER=[:google]

So there you are, now to do some geocoding.

Geocode a visitor’s ip, stash it in a cookie

“A cookie?!” I hear you cry, “Why lord why?!” GeoKit comes with a nifty little helper that will stuff a geocoding object into your session automatically, but FindYourDoc needs a bit more. We wanted to know we could scale safely, so we’ve made aggressive use of caching, specifically with the action_cache plugin. Using sessions means we render our geocoded values in our view, so we can’t cache that page.

Ahh, but if we put them in a cookie, we can use a before_filter for that, and still cache our actual action. Then we can grab the cookies in Javascript and render them on our page load. Cache is good to go, and we have different geocoded values appear for each visitor.

Our application controller ends up looking something like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
class ApplicationController < ActionController::Base
  before_filter :put_geocoding_into_cookies

private
  def put_geocoding_into_cookies
    if set_geokit_cookies?
      set_geokit_cookies( request.remote_ip )
    end
  end

  def set_geokit_cookies?
    unless cookies['geocoded']
      unless cookies['geocoded'] == 'NO'
        return true
    end end
    return false
  end

  def set_geokit_cookies( ip )
    @location = GeoKit::Geocoders::IpGeocoder.geocode( ip )
    if @location.success &&
      @location.respond_to?( :country_code ) &&
      @location.country_code == 'US'
      cookies['zipcode']  = @location.zipcode if @location.respond_to? :zipcode
      cookies['state']    = @location.state if @location.respond_to? :state
      cookies['city']     = @location.city if @location.respond_to? :city
      cookies['geocoded'] = 'YES';
    else
      cookies['geocoded'] = 'NO';
    end
  end

end

Note how it’s split into private methods we could write tests against. If only testing private instance methods on the application controller was that easy (if you have ideas on this, let me know). To run over the highlights of that code:


before_filter :put_geocoding_into_cookies

Tells Rails to run our method put_geocoding_into_cookies when any page has attempted access. set_geokit_cookies? is looking to find out if we have tried to geocode before, and also if we’ve already set a flag of ‘NO’ to say we’ve tried and failed already.


@location = GeoKit::Geocoders::IpGeocoder.geocode( ip )

That’s our magic geocoding line. Note that the ip was passed in from request.remote_ip in put_geocoding_into_cookies. The if statement tests for a valid result, and a result in the US. If we pass the statement, we set our cookies for state, zipcode, and city. Note that we use strings for our keys:


cookies['state']    = @location.state if @location.respond_to? :state

Which is a quirk in Rails, you’d expect a HashWithIndifferentAccess there. The respond_to? test is a work-around for GeoKit’s decision to not create attributes that is has no value for.

And that’s that. Your addresses should be geocoded in tossed into cookies. Try using Firebug and you’ll see them in your server headers. Of course, you’re probably geocoding 127.0.0.1, which won’t really give you anything useful. So how do we know any of this if working?

Testing IP Geocoding

We test it. There are two little hurdles to doing that, the first is pretty handily taken care of by the assert_cookie plugin. Install that, and you’ll be able to:


assert_cookie :geocoded, :value => 'NO'

and we’ll need that. With assert_cookie we can test what cookies get set for what IPs, but we still have another hurdle:


set_geokit_cookies( request.remote_ip )

We need to fake an IP. Enter the AnywhereController:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
class AnywhereController < ActionController::TestRequest
  def remote_ip
    @my_remote_ip || @remote_ip
  end

  def go_to_sunnyvale 
    @my_remote_ip = '64.233.187.99'
  end
    
  def go_to_nowhere
    @my_remote_ip = '0.0.0.0'
  end
  
end

With ActionController::TestRequest subclassed, can can overwrite the remote_ip method to return any IP we want, and therefor test some IPs that will actually geocode. A full geocoding test might look like:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
require File.dirname(__FILE__) + '/../test_helper'
require 'welcome_controller'

# Re-raise errors caught by the controller.
class WelcomeController; def rescue_action(e) raise e end; end

class AnywhereController < ActionController::TestRequest
  def remote_ip
    @my_remote_ip || @remote_ip
  end

  def go_to_sunnyvale
    # Google geocoding google? should work, right? ;-)
    @my_remote_ip = '64.233.187.99'
  end

  def go_to_nowhere
    @my_remote_ip = '0.0.0.0'
  end

end

class GeocodingTest < Test::Unit::TestCase

  def setup
    # Here we use our new AnywhereController
    @controller = WelcomeController.new
    @request    = AnywhereController.new
    @response   = ActionController::TestResponse.new
  end

  def test_should_set_cookies_for_sunnyvale
    # Try out google's IP
    @request.go_to_sunnyvale
    get :index
    assert_cookie :city, :value => 'Sunnyvale'
    assert_cookie :state, :value => 'CA'
    assert_cookie :geocoded, :value => 'YES'
  end

  def test_should_not_set_cookies
    # And nowhere-land
    @request.go_to_nowhere
    get :index
    assert_cookie :geocoded, :value => 'NO'
  end

  def test_should_not_keep_trying
    # Changing the IP halfway through should have no affect, since
    # we don't geocode if we've already tried, right?
    @request.go_to_nowhere
    assert_equal '0.0.0.0', @request.remote_ip
    get :index  
    assert_cookie :geocoded, :value => 'NO'
    @request.go_to_sunnyvale
    assert_equal '64.233.187.99', @request.remote_ip
    @request.cookies["geocoded"] = CGI::Cookie.new("geocoded", "NO")
    get :index
    assert_cookie :geocoded, :value => 'NO'
  end

end

Whammo, as someone once said: “This code may not run, I have not tried it, I have only tested it”. Our production server should geocode perfectly well if this test suite passes.

Finally, To Javascript

All right, we’ve run the gauntlet. We have GeoKit installed, we’ve stuffed some data into our cookies, and we’ve tested the whole ordeal. All that without disturbing our super-fast action-cached page just past this. But our visitors still can’t see a thing.

Well, Javascript has access to cookies, so let’s use that to push the info into a form field. It’ll be cool, visitors will all get the same rendered page, but the headers will send different cookies. The Javascript, which will itself be cached, can alter the rendered and drawn page so each visitor views something custom.

Accessing cookies is a bit of a pain, but there are a few options, including a prototype module. We’re going to stick with a few lines lifted from quirksmode.org and their write up on cookies. I’ll ignore the particulars, here’s the function we’ll use:

1
2
3
4
5
6
7
8
9
10
function readCookie(name) {
  var nameEQ = name + "=";
  var ca = document.cookie.split(';');
  for(var i=0;i < ca.length;i++) {
    var c = ca[i];
    while (c.charAt(0)==' ') c = c.substring(1,c.length);
      if (c.indexOf(nameEQ) == 0) return c.substring(nameEQ.length,c.length);
    }
  return null;
}

Toss that into your public/stylesheets/application.js and make sure you’ve included it in your application’s layout. Now we’ll use a few functions from prototype and readCookie to load our data into some fields:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
<script type="text/javascript">
//< ! [ C D A T A [
// Sorry, the above spaces are needed for a bug in html-scanner locally.
Event.observe( window, 'load', function() {
  var zipcode_el = $('search[zipcode]');
  if ( zipcode_el && zipcode_el.value == '' ) {
    var zipcode = readCookie('zipcode');
    if (zipcode) { zipcode_el.value = zipcode }
  }

  var city_el = $('search[city]');
  if ( city_el && city_el.value == '' ) {
    var city = readCookie('city');
    if (city) { city_el.value = city }
  }

  var state_el = $('search[state]');
  if ( state_el && state_el.options[state_el.selectedIndex].value == '' ) {
    var state = readCookie('state');
    if (state) {
      for (var i=0;i < state_el.length;i++){
        if ( state_el.options[i].value == state ) {
          state_el.selectedIndex = i;
        }
      }
    }
  }
});
//]]>
</script>

We can even toss that in our form’s partial, and as long as everything else attached to onload uses:


Event.observe( window, 'load', function() { } );

Then the onloads wont over-write each other.

1
2
3
4
5
  var zipcode_el = $('search[zipcode]');
  if ( zipcode_el && zipcode_el.value == '' ) {
    var zipcode = readCookie('zipcode');
    if (zipcode) { zipcode_el.value = zipcode }
  }

This block first finds a field with the id of search[zipcode], aborts if it already has a value in it (say, from your server’s render), then writes the cookie contents into the element if the cookie exists.

1
2
3
4
5
6
7
8
9
10
11
  var state_el = $('search[state]');
  if ( state_el && state_el.options[state_el.selectedIndex].value == '' ) {
    var state = readCookie('state');
    if (state) {
      for (var i=0;i < state_el.length;i++){
        if ( state_el.options[i].value == state ) {
          state_el.selectedIndex = i;
        }
      }
    }
  }

This block handles a select box in a similar manner, setting the state only if it hasn’t already been set.

Whew

Well, that’s a wrap. We’ve cached, we’ve cried, we’ve read cookies in Javascript and tested IPs our browsers didn’t even know existed. Best yet, you can customizes pages by location now without fragging your caches, so scale with joy!

This is my first write up on rails at this blog, thanks for stopping by. My background is a pretty varied one, hence the name madhatted, but they’ll be a big focus on Rails and Javascript here, so stick around. I’ll be adding links and such over the next few weeks, pardon my mess!

Author: "mixonic" Tags: "Caching, Cookies, Geocoding, GeoKit, Jav..."
Send by mail Print  Save  Delicious 
» You can also retrieve older items : Read
» © All content and copyrights belong to their respective authors.«
» © FeedShow - Online RSS Feeds Reader