Friday, February 8, 2013

Sikuli: An "On Screen" Computer Vision Library

In my last post, I briefly mentioned Sikuli as a tool that I've been using in combination with other technologies for usability analysis and automated testing.  However, before I get deeper into how I've been using it, this post will give my overview on what Sikuli is and why I think it is so useful.

If you go to the Sikuli website (http://www.sikuli.org/), you'll see Sikuli described as a tool for either: 1) running macros, or 2) automated software testing.  It can do both of these things quite well, but it's important to note that Sikuli is much deeper and more powerful than other tools that can do this.

From my perspective Sikuli consists of several parts:
  • Open CV: Sikuli utilizes the Open CV computer vision library. This is where the real power comes from.  Unlike traditional macro and software testing tools, Sikuli is based on vision.  If you can see it on the computer screen, so can Sikuli.  It's not perfect (text recognition is a weakness), but it can handle situations that other tools can't.  For example, you can write a Sikuli script to play a game that is written in Flash (or Silverlight, or HTML, or C++) even if it doesn't have an API.  As long as there are things to see and react to, Sikuli can be used.
  • CV Tuned to the Computer Screen: This isn't really a separate 'part' per se, but it's so important that it's worth it's own bullet.  Computer vision is such a broad and deep topic that an average joe that tried to download Open CV and use it for recognizing something on the computer screen would have a lot of learning and work to do.  The creators of Sikuli at the MIT User Interface Design Group did all of this work and bundled it into Sikuli for you.
  • Java API: The base implementation of Sikuli is written as a Java library.  You can bundle this into any Java application.  It's my preferred method of using Sikuli.
  • Custom UI and Jython: This layer is a nod towards usability for non-programmers.  There are some cool features here, but as a programmer, it doesn' really fit what I'm trying to do.
Below I've provided code that shows the simplest possible example of using the Sikuli Java API to do something.  Here's what it does:

  1. First, I provide the location of a screenshot of an OK button that I'm going to ask Sikuli to find and click for me.  In future versions of this program, I will check right here that the file really exists (and if it doesn't exist, I will have Sikuli help me create it).
  2. Next, I create an instance of the 'Screen' class.  It's the starting point for most of Sikuli's functionality.
  3. Finally, I have a try-catch block where I ask Sikuli to find something that looks like the "OK_button.png" on the screen and click it.  It might fail for two reasons: a) the file doesn't exist, or b) it exists, but there is nothing on the screen that looks like it.
This is a pretty simple program that doesn't really show the full power of Sikuli, but I wanted to lay out the basics before I start to get into some really cool stuff in my upcoming posts...

----

import org.sikuli.script.FindFailed;
import org.sikuli.script.Screen;


public class SikuliExample1 {

 public static void main(String[] args) {

  String USER_HOME = System.getProperty("user.home");
  String buttonImage = USER_HOME+"/Sikuli/OK_Button.png";

  Screen s = new Screen();

  try {
   s.click(buttonImage);
  } catch (FindFailed e) {
   System.out.println("Couldn't find image: "+buttonImage);
  }

 }

}

No comments: