Robots invasion

May 6 2011

again, and again, and again.. “how can I test this”? same old question.

this week my team had to “instruct” spiders not to follow certain links, in order those not to be processed by search engines indexing process. given robots.txt is a standard de-facto for this, we tried to test this behaviour. we initially only found syntactic check tools, useful while developing, but not enough as a safety-net against regressions. so, we later decided to go a bit further: testing that the actual URLs were not processed by spiders.

having a robots.txt file published under the site root with these rules..

User-Agent: *
Disallow: /shop/cart
Disallow: /shop/one-page-checkout

.. our goal could be easily documented by the following test:

test "should skip disallowed paths" do
  lets_crowl "/shop/cart"

  lets_crowl "/shop/one-page-checkout"
  lets_crowl "/shop/contact-us"
  assert_surfed_only '/shop/contact-us'

next step was running this test against a local development server (http://localhost:3000 for a Rails webapp). spiking around, I started looking for existing spider engines, written in Ruby so that I could easily integrate into our test codebase. it took me one hour for evaluating tools: the winner was ruby-spider, forked from Spider.

we actually start a new Spider intance, run it against a given URL, check if it was allowed to surf it or not. then we “stop it”: this is done instructing the Spider to only process URLs that match the exact target URL. otherwise, the Spider would continue processing every URL found in the HTML response, recursively, and so on.. extra points would be setting a timeout to the test, or a max depth. anyway, here’s the fixture code:

def setup
  @surfed_urls = []

def lets_crowl(target_path)
  Spider.start_at(local_url(target_path)) do |site|
    site.add_url_check do |an_url|

    site.on :every do |an_url, response, prior_url|
      @surfed_urls << an_url

def assert_surfed_only(path)
  expected_url = "#{local_url(path)}"
  assert_equal @surfed_urls, [ expected_url ]

def assert_nothing_was_surfed
  assert @surfed_urls.empty?

def local_url(path)

last thing done: running this test against a test server. shame on me, I could not manage to setup a Webrat integration test (server is in-process). so, after verifying our staging server would have really slowed down the test suite, I went for a Selenium test: actually, I only changed one line of code, switching local_url to http://localhost:3001. really, really nice!

Video from IAD 2010

December 9 2010

just updated personal page adding links to both slides and video for the presentation I had at the last Italian Agile Day 2010. enjoy!

Legacy pictures

July 18 2010

someone once stated that legacy code is simply “code that works” (I guess was Michael Feathers, but could not find any reference online). Feathers himself proposed characterization tests as a vise to put legacy code into, before starting doing any change.

this week had the pleasure to add support to our current application for customer’s internal search engine indexing, something already developed by some colleagues of mine, for a similar web application. so, I spent some time reading product documentation and reviewing existing code, which was clearly composed of custom domain logic and generic integration logic with given product. I also played a bit with code, in a sandbox, to understand how pieces could be splitted.

then, I reverted every change, and started writing an initial characterization test.

main application entry point was a “workflow” process, expected to be executed by the application container when a few events had occurred. visible behaviour consisted of text files, with specific content, stored via FTP. so, I first “reproduced” current behaviour starting a local FTP server (bundled with my MacBook, and available from System Preferences). after having seen text files written, I could write an automatic test (well, automatic but not yet reproducible on every workstation).

public void setUp() throws Exception {

  ftpFolder = createTmpFolderAt(FTP_HOME_FOLDER);

  properties.put("ftp.url", FTP_HOST);
  properties.put("ftp.port", FTP_PORT);
  properties.put("ftp.user", FTP_USER);
  properties.put("ftp.password", FTP_PASSWORD);
  properties.put("", ftpFolder.getName());
  properties.put("base.url", "");
  Hashtable<String, String> properties = new Hashtable<String, String>();
  activate(facade, new InMemoryComponentContext(properties));

public void shouldNotifyNewsCreation() throws Exception {
  String lavoro = "/its/a/news/lavoro";
  String[] arguments = new String[] { "creazione" };
  process.execute(workItem, workflowSession, arguments);

  String expectedContent = "VdkVgwKey:\n<<EOD>>";
  assertEquals(expectedContent, contentAt(ftpResource("/lavoro.bif")));

helper methods simply read resource content, stored at a given URL:

private String ftpResource(String resourcePath) {
    "ftp://" + FTP_USER + ":" + FTP_PASSWORD +
    "@" + FTP_HOST + ":" + FTP_PORT + "/" + 
    ftpFolder.getName() + resourcePath;

private String contentAt(String urlAsString) throws Exception {
  return ContentUtils.readUrl(new URL(urlAsString));

then, was the turn for turning my local FTP server off, and using one for testing purpose. I have to admit, I tought it would have been easier, I’ve done this many times for HTTP, mail and JMS servers, but getting an FTP server started and stopped by a JUnit test turned to be painful! I played with MockFtpServer (hung waiting FTP data on port 0, no way to set any other port), then Apache FtpServer (had no luck with user grants on filesystem) and HermesFTP (gosh! it requires Spring for running!).

then found a lightweight and simple Java FTP server: AXL FTP Server (details for adding a Maven dependency are available here). its entry point it’s a Main method: i just managed to start a server thread on test setup and stop it on test tear-down (actually, on suite setup and tear-down).

// see src/test/resources/ftp/ftp.cfg
private static final String FTP_HOST = "localhost";
private static final String FTP_PORT = "2121";
private static final String FTP_USER = "foo";
private static final String FTP_PASSWORD = "bar";
private static final String FTP_HOME_FOLDER = "/tmp";

private static Thread serverThread; 

public static void startServer() throws Exception {
  serverThread = new Thread(new Runnable() {
    public void run() {
      ftpd.main(new String[] { "src/test/resources/ftp" });


public static void stopServer() {
  if (serverThread != null) {

it was a matter of a few configuration files, which I put under Maven test resources folder (ftp.cfg for server config, and dir.cfg for users and filesystem grants).

this gave me confidence, really, adding a few more test cases. I could then start restructuring and refactoring code: extracting a brand new project, then removing duplicated logic, extracting classes and interfaces with clear names. in this process, as long as I was changing code, my understanding of the whole notification process improved.

in the end, I could easily move FTP integration test to the new project test suite, while unit-testing original code for workflow process with test-doubles (in-memory fake objects and mock objects). more unit tests were added, along the way, also.

well, it was really not legacy code, not yet! it was lack of automatic tests. this gave me the opportunity to apply Feathers’s process: take a picture of current behaviour, before starting refactoring code.

these last six months have been incredibly full for me, i’ve learnt so many technologies and technical stuff: RubyOnRails web application development (and a bit of S3 cloud deploying), Hippo CMS 6 and Cocoon pipelines, and now Day CQ stack, which means JCR and Jackrabbit, Sling RESTful web framework, and OSGI bundles with Felix. oh my!

yep, i’m currently working for a big TLC italian company, developing their internal portal based on CQ5. i was completely new to content-repositories and web content management, but i got it quickly: it’s a different paradigm, data are modeled around resources, not around relations (as with relational databases).

btw, what i want to show is my journey with CQ stuff, and how our development approach has grown during the last weeks (and where it’s going). beware: there’s a lot of technical stuff (maven, Day CRX, Apache Sling, Apache Felix); i won’t explain everything in detail, so i’m referring to documentation and other blog posts.

so, first of all, start reading CQ tutorial on “How to Set Up the Development Environment with Eclipse”: please, spend almost one hour following all steps, even boring ones, like grabbing jars from CRX repository and putting them manually into local maven repository. in the end, you’ll have two projects (ui and core), one page with template (manually created and edited), executing a component as JSP script (imported through VLT), which uses “domain” logic provided by a plain old Java class (from core project). that’s a lot of stuff!

then, let’s enter the magical world of CQDE, a customized (old version of) Eclipse, which provide access to remote content (via webdav) from within an IDE, so that you can edit, compile and debug code as it was stored locally (but it isn’t). at first, it seems a lot better than VLT-ing from commandline; but soon you’ll miss it: versioning, and sharing code with others. even if it’s not clear in the tutorial, ignoring VLT specific files let Subversion version also content stored in src/main/content/jcr_root. that’s not always funny, like manually merging conflicts on XML files, but it’s really a lot better than blindly edit code with CQDE, with no way back! also, sometimes i’ve found much more easier editing pages as XML files than using WCM editor (CQ authoring tool).

ok, relax, take a deep breath, and think about what you’ve done so far. do you like it? are you comfortable with this? well, i wasn’t; i missed my IDE-based development, checking-in and out code, running automatic tests all the time. the good news is we can do better than this, the bad news is we’ll still miss something (so far, red/green bars for UI). to recap, we can choose from:

  1. remote coding and debugging, with CQDE: no “native” versioning, VLT can be use as a “bridge” to Subversion
  2. local coding, with any IDE (eg Eclipse): still can’t compile JSP files, VLT used to deploy UI code

next step is (well, i’m a bit afraid, but time has come)… deploy an OSGI bundle with maven, with both UI code and initial content to put on repository.

step one: compiling JSP files locally. ingredients: JARs as local maven dependencies and sling maven jspc plugin.

i could not find any public Day maven repository (and it makes sense, from a business point of view), but as the tutorial shows, everything we need is already available from CRX. so, it takes long, but referring to the /libs/xyz/install convention and doing searches via CRX explorer you can come up with something like this:


function grabDependency(){

  wget --user=admin --password=admin $JAR_URL
  mkdir -p $REPOSITORY_DIR

cd /tmp; rm -rf deps; mkdir deps; cd deps

grabDependency \
  http://localhost:4502/crx/repository/crx.default/libs/commons/install/day-commons-jstl-1.1.2.jar \
  com/day/commons/day-commons-jstl/1.1.2 \

# ... grab other jar files

then, let’s add JSPC plugin to the maven build chain, and CQ and Sling dependencies (see attached file with sample code). this is a simple example; you’ll probably need to override plugin’s sling jar dependencies with versions used by application code!


moving JSP code into src/main/scripts (under apps/myApp subfolder) should be enough to have maven build (mvn clean compile). just remember to grab global.jsp from CRX and put it under src/main/scripts/libs/wcm folder. Eclipse also will compile (regenerate project files with mvn eclipse:eclipse), but it needs another copy of global.jsp into /libs/wcm (i know, it’s silly; i’ll check this next time).

step two: packaging an OSGI bundle with UI code and content nodes. ingredients: Felix maven bundle plugin.

the key concept for me was understanding what to put into the bundle. i was used to have JSP files on CRX under /apps node, editing nodes properties such as jcr:primaryType (cq:Component, cq:Template and the like) and jcr:content. deploying application as OSGI bundle it’s slightly different: code is available as bundle resources (from the bundle itself), while only property nodes are copied from bundle to CRX repository, as initial content. this separation was not clear to me in the beginning, but it now makes sense (even if less duplication would be nice, for example in content structure).

so, we should create a bundle with:

  • included resources: all required resources (maven resources and src/main/scripts folder) to be later referred
  • bundle resources: .class and JSP files
  • initial content: node properties, as JSON files (i decided to put them into src/main/resources, under CQ-INF/initial-content subfolder)

more details are available on the Sling website and on this blog post.

so, let’s add Felix bundle plugin to maven (remember to declare project bundle packaging with <packaging>bundle</packaging>):


          included resources folders (to be later referred):
          maven resources and JSP files

          resources available from within bundle
          (not available as CRX nodes):
          compiled .class files and JSP files.

          content initially copied into CRX nodes:
          properties as JSON descriptors
        CQ-INF/initial-content/apps/myApp/; overwrite:=true; path:=/apps/myApp,
        CQ-INF/initial-content/content/sample/; overwrite:=true; path:=/content/sample

this should be enough to create a package with mvn clean pakage. we’re almost done..

step three: installing the bundle. ingredients: maven sling plugin.

with CQ there are two ways to install a bundle: put it under /apps/myApp/install folder or using the Felix console. i choose the latter, which turns out to be a plain POST request to the console URL. anyway, we can hook the maven build chain with the Sling plugin, this way:


just type mvn install and we’re done.

that’s it. a lot of setups, expecially if, like me, you’re new to maven and OSGI. anyway, i’ve written this mainly for later reference and to share thoughts with colleagues. i’ve shown three approaches to develop with CQ, tested in my daily work on the last month. in my view, deploying OSGI bundles is the best one, so far; it’s a trade-off between ease of use while debugging (yep, no UI automatic tests yet) and development lifecycle (versioning, building, packaging). i hope to gather much more info next year, and probably something will be easier! next step will be setting up automatic tests for JSP files, using Koskela’s JspTest tool.

sample code is here: please, follow README and have fun.

well, happy new year to everyone!


September 24 2009

after almost a full day of nervous coding and testing..
trying to simplify a complex scenario..
before making it more complex with new requirements..
i finally found out it was easier than i thought..
simply asking the customer what he wanted.

so, please, don’t forget to ask the f**ing question!

Gimme a name

September 16 2009

i’m currently working on a Rails web application, i’m learning a lot about Ruby and ActiveRecord these days. after a few green bar on a quite complex search, yesterday i felt a little bit annoyed because i was not able to refactor enough the logic embeded in the query. that’s why i did a spike on an ActiveRecord feature for making queries clearer: named scopes.

typed rails spike for generating a new project from scratch, scaffolded User, then i started with this test case, simply looking for the youngest teen named ‘bob’:

class UserTest < ActiveSupport::TestCase

  def setup
    User.create!(:name => 'alice', :age => 11)
    User.create!(:name => 'mark', :age => 18)
    User.create!(:name => 'bob', :age => 12)
    User.create!(:name => 'bob', :age => 14)

  test "spike" do
    found = User.youngest_teen_Bob    
    assert_equal 12, found.age

make it pass. the initial and obvious implementation was a mix of :first, where clauses and order by:

class User < ActiveRecord::Base

  def User.youngest_teen_Bob
    User.find :first, 
      :conditions => ['name = ? AND age < ?', 'bob', 15],
      :order => 'age ASC'

yep, the example is very simple. anyway, i don’t think the query is clear enough. with any other ORM (even a hand-written one), i would liked to use something nearer to the domain, moving a little bit away from SQL. so, first step could be separating condition on ‘name’ from the rest:

named_scope :with_name, lambda { |name|
  { :conditions => { :name => name } }

def User.youngest_teen_Bob
    :conditions => ['age < ?', 15],
    :order => 'age ASC'

look at User.with_name('bob').find(..): we’re now using a named scope called with_name, that simply append to the current query a where clause on ‘name’. so far, so good, but i now want to go further. what do you think the find(..) selection is doing? in one sentece, “find the youngest teen”. ok, so let’s split it in two:

named_scope :teens,
  :conditions => [ 'age < ?', 15 ]

named_scope :with_name, lambda { |name|
  { :conditions => { :name => name } }

def User.youngest_teen_Bob
  User.teens.with_name('bob').find(:first, :order => 'age ASC')

great! another named_scope, teens, simplier because no parameter is passed. so, we’re quite done, actual search is User.teens.with_name('bob').find(..). again: what do you think the find(..) is doing? sure, looking for the youngest:

def self.youngest
  find(:first, :order => 'age ASC')
named_scope :teens,
  :conditions => [ 'age < ?', 15 ]

named_scope :with_name, lambda { |name|
  { :conditions => { :name => name } }

def User.youngest_teen_Bob

done! youngest teen ‘bob’ is now implemented as teens.with_name('bob').youngest. nice, isn’t it?

here a few notes:

  • teens.with_name(..) acts exactly as User.teens.with_name(..), no need to specify class, named scopes can be used from static methods
  • youngest should be added at the end, because it’s invoking find. it’s silly: no way to use something like youngest.teen.with_name('bob'). if you’ve got any idea, drop me a line..

that’s all for today.

Say what you want

September 3 2009

did you ever want to be notified about latest build status? think about having a script that monitors your build server via RSS feed, and notifies success or failures saying something like “build broken for project XYZ”. add a rubysh taste, and…

i’m happy to announce cruise-monitor has just been published! as README says:

“Cruise-monitor is, well, a monitor to CruiseControl build status, via RSS feed. It uses MacOS ‘say’ command for notifications. So far, only CriseControl.rb is supported, but plans are to support CC and CC.NET as well”.

this is the first open-source project hosted on our company public servers that i’m involved in. it basically was born after a few broken builds for the project my team is currently working on: a Rails application built on a CruiseControl.rb continous integration server.

it happened a few times no one noticed a failing test or a missed svn add. we started using a feed reader playing sounds on each new build, but still it was not able to distinguish failure from success. that’s when we thought about using great MacOs say command.

it started as a spike on ruby, RSS and system exec, but it soon turned into a real project: a Rake build file; unit, integration and acceptance tests; README, LICENSE and TODO files.

so far i’m the only committer! i hope the community around will grow a little bit. i’m going to add details on our public confluence in the following days.. so, stay tuned!