Textile filtering with RedCloth

So, I am making a forum over at the Open Source Country

I wanted to use RedCloth to let members make textile formatted posts but I wanted to restrict the input they were allowed to use. How to do this? I thought it would be simple.

Textile has a +filter_html+ option which I thought would do the trick but that only filters what HTML RedCloth allows users to enter. It doesn’t filter the HTML created by Redcloth itself when a user uses textile tags.

So how to filter the textile tags?

First, assuming you are using Rails 2.3 create the following file

config/initializers/redcloth_extension.rb

This file will contain the code we want to override. Now paste the following code into the file

module RedCloth::Formatters::HTML
  include RedCloth::Formatters::Base
  def after_transform(text)
    text.chomp!
    clean_html(text, ALLOWED_TAGS)
  end
  ALLOWED_TAGS = {
      'a' => ['href', 'title'],
      'br' => [],
      'i' => nil,
      'u' => nil,
      'b' => nil,
      'pre' => nil,
      'kbd' => nil,
      'code' => ['lang'],
      'cite' => nil,
      'strong' => nil,
      'em' => nil,
      'ins' => nil,
      'sup' => nil,
      'sub' => nil,
      'del' => nil,
      'table' => nil,
      'tr' => nil,
      'td' => ['colspan', 'rowspan'],
      'th' => nil,
      'ol' => ['start'],
      'ul' => nil,
      'li' => nil,
      'p' => nil,
      'h3' => nil,
      'h4' => nil,
      'h5' => nil,
      'h6' => nil,
      'blockquote' => ['cite'],
    }
end

ALLOWED_TAGS is a hash of tags that you want to allow. You can take the BASIC_TAGS to use as a base and strip tags you don’t want form the hash.

So we have defined the tags that we want to allow. Now we need to actually do some stripping (ooo-er). This is where the after_transform method comes in. This is called by RedCloth as standard after initial modification. So what we can do is override the method and tell RedCloth to clean_html again with the HTML string it has just created. To give you a list of steps.

  • RedCloth is configured with +filter_html+ enabled
  • User enters string (Textile and HTML)
  • RedCloth strips HTML tags from the string according to the BASIC_TAGS using clean_html method
  • RedCloth converts the textile tags in the string to HTML

At this point the HTML’ised string is usually returned; however we do some overriding so that…

  • RedCloth strips HTML tags from the above generated HTML string according to our ALLOWED_TAGS using the clean_html method
  • RedCloth returns the twice filtered HTML string.

Thinking about it you don’t even need +filter_html+ since it will all be filtered the second time around but what the hell. I feel a little more secure by stripping all the user generated HMTL cruft before stripping our textile generated HTML.

Enjoy

Back to the top

Comments

Gravatar Dave Hollingworth
said on 09/11/10 (Tuesday) at 17:12 UTC:

Hi Jeffrey,

this is just what I was looking for – thanks for posting!

Cheers
Dave

Post a comment



Back to the top