Epic Filth: Web Form Automation with Selenium WebDriver

Have you ever had a tedious task involving a web site form that you have wanted to automate, but couldn't? Or perhaps you wanted to automatically test a website you were making? Well, there's a fairly well-known tool for browser automation called Selenium. I'd heard of it before, and was looking into it when I had a project where I want to take some JavaScript based pages, run them through a JavaScript engine so they could generate the HTML I needed and then save the results. But until now, I had never had a chance to use it. It's powerful, and there's a lot you can do with it that I won't cover. But the basics will get you a long way.

More after the break.

Introduction to Selenium

Selenium is a set of libraries for browser automation. This is different than other automation strategies that might work using programmatic HTTP requests and responses. The advantage is that you can test your environment on a specific browser exactly as it would run in that browser. And, since Selenium supports several languages (Java, C#, Python, Ruby, PHP, and Perl), it should be easy to bring in to your current testing environment.

Of course, since it is a browser automation tool, you can do more than just test sites you are building. You can also create scripts that automatically carry out actions that would normally be done manually. Selenium has an IDE that allows you to record these actions by performing them yourself, or you can hand code them from a series of very simple operations.

Selenium is open-source, and licensed under the Apache 2.0 license, which makes it very easy to use in both open-source and closed-source solutions. The download location for the Selenium WebDriver that I'm covering in this post is here. I'm going to use C# for my examples, as that's how I've been using Selenium thus far, but the basic principles should translate fairly easily. If you want to follow along with the code, download the C# client drivers, create a new console application project in Visual Studio (if you don't have it, get a free version here), copy the Selenium files into the project folder, and add a reference to WebDriver.dll to your project.

Opening a Browser

A specific browser can be opened like this:

using System;
using System.Threading;

// We'll use Firefox as our example, but Chrome, IE, 
// and Android are also supported out of the box
using OpenQA.Selenium;
using OpenQA.Selenium.Firefox;

namespace BrowserAutomation
{
    public class FirefoxTest
    {
        public static void Main()
        {
            //Open the browser
            IWebDriver driver = new FirefoxDriver();

            //Automation code here - we'll just wait 5 seconds for now
            Thread.Sleep(5);

            //Close the browser
            driver.Quit();
        }
    }
}

IWebDriver is an interface that all browser drivers will implement. Here, we're creating a FirefoxDriver, so that we can control Firefox. Note that IE and Firefox work immediately, but there's some extra setup required for Chrome automation. Running this program should open your Firefox browser, wait for 5 seconds, then close it. Boring, but it's a start!

Navigation

Now that we can open the browser, let's go somewhere:

using System;

using OpenQA.Selenium;
using OpenQA.Selenium.Firefox;

namespace BrowserAutomation
{
    public class FirefoxTest
    {
        public static void Main()
        {
            //Open the browser
            IWebDriver driver = new FirefoxDriver();

            //Go to google
            driver.Navigate().GoToUrl("http://www.google.com");
        }
    }
}

You should see your browser open, and the Google main page show up. Here, we use the Navigate() method to get an INavigation interface. Then, we call its GoToUrl() method with the address we want to go to. You can also use the Back() and Forward() methods on INavigation to navigate backwards and forwards, as if the user clicked the back or forward button.

Finding Elements

Now that we're at Google, let's search for something. First, we need to find the search box. I've done the hard work already, and found that the search box has a name attribute of "q". Knowing that, we can do this:

using System;

using OpenQA.Selenium;
using OpenQA.Selenium.Firefox;

namespace BrowserAutomation
{
    public class FirefoxTest
    {
        public static void Main()
        {
            //Open the browser
            IWebDriver driver = new FirefoxDriver();

            //Go to google
            driver.Navigate().GoToUrl("http://www.google.com");

            //Grab the search box
            IWebElement searchBox = driver.FindElement(By.Name("q"));
        }
    }
}

Running this will give you pretty similar results to the last run. If it completed without throwing an exception, that tells you that it did indeed find the search box. You can also search by id, CSS class, CSS selector, tag name, link text, partial link text, and XPath query by using the different static methods of By.

Enough about searching, though - let's use it!

Simulating Typing

using System;

using OpenQA.Selenium;
using OpenQA.Selenium.Firefox;

namespace BrowserAutomation
{
    public class FirefoxTest
    {
        public static void Main()
        {
            //Open the browser
            IWebDriver driver = new FirefoxDriver();

            //Go to google
            driver.Navigate().GoToUrl("http://www.google.com");

            //Grab the search box
            IWebElement searchBox = driver.FindElement(By.Name("q"));

            //Type something in it
            searchBox.SendKeys("Epic Filth");
        }
    }
}

Notice how the browser opens, and "Epic Filth" is entered into the search box, and, thanks to Google Instant, results show up immediately. Pretty cool, right? But now, let's say we also want to click the search button...

Simulating Clicks

using System;

using OpenQA.Selenium;
using OpenQA.Selenium.Firefox;

namespace BrowserAutomation
{
    public class FirefoxTest
    {
        public static void Main()
        {
            //Open the browser
            IWebDriver driver = new FirefoxDriver();

            //Go to google
            driver.Navigate().GoToUrl("http://www.google.com");

            //Grab the search box and button
            IWebElement searchBox = driver.FindElement(By.Name("q"));
            IWebElement searchButton = driver.FindElement(By.Name("btnG"));

            //Type something in it
            searchBox.SendKeys("Epic Filth");

            //Click the search button
            searchButton.Click();
        }
    }
}

This probably doesn't do much, thanks to Google Instant, but it does illustrate how to click a button.

Conclusion

I believe you'll find that, although we've only covered a small handful of operations, there is very little you can't do. You should be able to find any visible element on the page, and either click on it or type on it. Beyond those two operations, I don't know that there's a whole lot you can do to a web page. And the best part is, Selenium is accessible from Java, C#, Python, Ruby, PHP, and Perl, so you should have an easy time fitting it in to whatever you are doing.

Epic Filth

Pages

Sunday, January 29, 2012

Web Form Automation with Selenium WebDriver