Saturday, November 27, 2010

Church Database Conversion with Groovy

This past summer, my church switched from our existing church database to Fellowship One. And when I say "existing database", in actuality we had two different databases, Church Helpmate and NSpire Youth Assistant that we wanted to consolidate into the new system.
Church Helpmate is a Microsoft Access-based Windows application, and we used it for the general church population. NSpire is a Windows application with a custom database, and we used it for the youth and children's ministries. Somehow we had to get all of the families, peoples, addresses, phone numbers, email addresses, etc. from the two old systems into the new system, and we wanted to try to automate as much of the record merging as possible.
Fellowship One is a web application, and they helpfully provide an Excel spreadsheet template for the data import. This meant that I had to get the data out of the two old programs, massage and manipulate the data, merge duplicate records, and then write them out to the Fellowship One spreadsheet. Once the spreadsheet was fully populated, we would ship that off to Fellowship One and they would take care of the rest of the import.
To get the data out of the two programs took a little bit of work. I decided to export to CSV files. For Church Helpmate, adding the following macro to the 'CustomFrontEnd.mdb' file made it simple. Once this was run, every table in the database was dumped to the temp directory as a CSV file.
Public Sub ExportAllTables_to_CSV()                                           
Dim obj As AccessObject, dbs As Object
Set dbs = Application.CurrentData

For Each obj In dbs.AllTables
If Left(obj.Name, 4) <> "MSys" Then
Dim strFolder As String
strFolder = "c:\temp\"
DoCmd.TransferText acExportDelim, , obj.Name, strFolder & obj.Name & ".csv", True
End If
Next obj
End Sub
NSpire provided a simple way to create a CSV export. By creating a custom report from the "All Households" and "All Individuals"reports in NSpire (with both Active and Inactive included), I was then able to pick out the fields I thought would be relevant.
Now I had to make a decision. Should I manually merge and import the data, or should I write a program to automate the work? At first blush, this didn't seem like a good candidate for automation, since it is a onetime deal. However, I decided that it was likely I would not get it right the first time (or second time), and because of that it seemed that having a repeatable process would be the best approach. I decided to write the conversion program in Groovy. This allowed me to harness all the power of the Java ecosystem, but in a more dynamic language with many Perl and Ruby-like language features (I am a big Groovy fan, in case you can't tell).
To parse the CSV files in Groovy, I used the opencsv library. With that, parsing is as simple as can be:
import au.com.bytecode.opencsv.CSVReader

CSVReader reader = new CSVReader(new FileReader(inputDir + householdCSV));
List<String[]> inputHouseholds = reader.readAll();
This code converts each CSV record into a String array. It was then easy to pull out the fields into various objects representing households, individuals, etc.
Once all the data was parsed from CSV, I wrote a lot of code to validate the data, and convert it into the format needed by Fellowship One. Some of the various tasks were to:
  • Determining the person's position in the household (head, spouse, child) and marital status, and handling households where no head was specified.
  • Converting dates into proper Fellowship One format.
  • Looking at the comment field to get dates for various milestones like baptism, dedication, membership, etc.
  • Determining the graduation class for children & youth based on their birthdate.
Once all the records from Church Helpmate and NSpire were parsed, I then had to merge the records. Here is the approach I used:
  • For each person in NSpire, if they are a child or youth, see if they were already in the Helpmate data. If so, assume that the NSpire data is more up to date and accurate and lean towards using it over the Helpmate data when creating the F1 output. This means if both NSpire & Helpmate have an email address and they are different, I'll assume the NSpire record is correct
  • If the person from NSpire is an adult, and they are in Helpmate, assume that the Helpmate data is more accurate when creating the F1 output.
  • If the person from NSpire is a child/youth and they are not in Helpmate, find their parents in NSpire (using the household ID which ties all people from the same household) and see if either of their parents are in the Helpmate data. If they are, then add the child/youth as a member of that household when creating the F1 output.
  • If the person from NSpire is an adult and is not in Helpmate, add them (along with their kids) into the F1 output.
Finally, once the data was merged and a final validation step was done, it was time to write out the data. I decided to write directly to the Fellowship One spreadsheet using Groovy's ExcelHelper class from the Scriptom library. It is not fast by any means, but it gets the job done and is simple to use. Here is an example:
import org.codehaus.groovy.scriptom.util.office.ExcelHelper

def write(limit) {
final outputFile = new File('../output/FT Standard Format.xls').canonicalFile

def helper = new ExcelHelper()
helper.create(new File("../FT Standard Format.xls"), outputFile) {workbook ->

def list = households.values() as List<Household>
sortHouseholds(list)

if (limit) {
list = list[0..limit - 1] as List<Household>
}
writePeople(workbook, list)
writeAddresses(workbook, list)
writeCommunications(workbook, list)
writeNotes(workbook, list)
writeAttributes(workbook, list)
}
Each "write..." method wrote to a separate worksheet in the spreadsheet. Here's what the writePeople method looks like:
private def writePeople(workbook, List<Household> list) {            
def personSheet = workbook.Sheets.Item[2]
def row = 2
list.eachWithIndex {hh, i ->
println "P$i: ${hh.fullName}"
hh.individuals.values().each {person ->
personSheet.Cells.Item[row, 1] = hh.id
personSheet.Cells.Item[row, 2] = person.id
personSheet.Cells.Item[row, 3] = person.position
personSheet.Cells.Item[row, 4] = person.title
personSheet.Cells.Item[row, 5] = person.lastName
personSheet.Cells.Item[row, 6] = person.firstName
personSheet.Cells.Item[row, 7] = person.nickName
personSheet.Cells.Item[row, 8] = person.middleName
personSheet.Cells.Item[row, 9] = person.suffix
personSheet.Cells.Item[row, 11] = person.statusGroup
personSheet.Cells.Item[row, 12] = person.status
personSheet.Cells.Item[row, 13] = person.subStatus
personSheet.Cells.Item[row, 15] = person.statusComment
personSheet.Cells.Item[row, 16] = person.envelopeNumber
personSheet.Cells.Item[row, 19] = person.gender
personSheet.Cells.Item[row, 20] = person.birthDate
personSheet.Cells.Item[row, 21] = person.maritalStatus
personSheet.Cells.Item[row, 23] = person.occupation
personSheet.Cells.Item[row, 24] = person.employer
personSheet.Cells.Item[row, 27] = person.school

row++
}
}
}
With this done, I was able to create a spreadsheet and send it off to the Fellowship One data conversion team. They were able to import the data for an initial data conversion, and then our church staff took a look at it to make sure everything looked ok. I was pleased when I got the word that it looked ok, and that they would skip doing a final data conversion and just do some minor cleanup manually.
Just recently I posted all of the source code for this project to Google Code. For those who are interested, you can download it at https://code.google.com/p/f1importer/

Thursday, September 2, 2010

Shell scripting technique for finding unique strings

I recently had to search through a bunch of log files to find a bunch of entries and count how many times they occurred. Shell scripting (via Cygwin) to the rescue!

I was looking for strings in this format

calc_name=RegularIRA
calc_name=Savings

Here is the solution:

grep -oh "calc_name=\w*" * | sort | uniq -c > calculator_counts.txt

This searches all files in the current directory for the pattern "calc_name=\w*" (which stops as soon as a non word character (like a symbol) is found. Then it sorts them, and runs the "uniq" command to get a count of unique occurrences. Then the output is piped to a file.

The output looks like this:

1332 Annuity
  59 AssetAllocator
4411 AutoEquityLoan
 119 AutoLoan
   4 AutoPayoff
 333 AutoRebate

Friday, April 2, 2010

Groovy Google Sites Backup

I've recently started using the Google Apps version of Google Sites to augment my church's website with subsites that are easy to farm out to people with a passion for particular areas of our church's ministry. 

The one problem with using Google Sites (as opposed to an old-school FTP site) is that there is no backup.  While Google obviously has a good reputation, it still makes me a bit nervous to be without any kind of backup.  I discovered the Google Sites Liberation tool, which allows you to export your sites.  It is part of a larger effort by a team of Google engineers to make sure Google users have free access to their data.

Google Sites Liberation (GSL) is available as a Java JAR file download which you can run as a GUI application.  However, I wanted to automate the backups, so I wrote a Groovy script to call GSL.  Here is my very simple script, BackupGoogleSites.groovy:

import java.text.SimpleDateFormat
import com.google.sites.liberation.export.Main

/**
 * Back up the various Google Sites for FEFC
 * User: Jeff Olson (jeff@olsonzoo.com)
 * Date: Dec 18, 2009
 */
def sites = [
 'abf', 
 'community-outreach', 
 'creekside',
 'small-groups', 
 'values', 
 'wiki', 
 'worship'
]

String backupDir = /C:\Backup\FFC Website Backup\Google Sites/
def username = 'your_username_here'
def password = 'your_password_here'
def host = 'sites.google.com'
def domain = 'firstfreechurch.org'

SimpleDateFormat df = new SimpleDateFormat("yyyyMMdd")
def today = df.format(new Date())

sites.each { site ->
  def location = "$backupDir/$today/$site" as String
  println "\n\nBacking up $site for $today to $location\n"

  def args = ['-h', host, '-d', domain, '-w', site, '-u', username, '-p', password, '-f', location]

  // hack because ABF site fails when getting all revisions for some reason
  if (site != 'abf') {
    args << '-r'
  }

  Main.main(args as String[])
}

To use this, just fill in your list of sites, the backup directory, username & password, host, and domain (if using Google Apps for your domain). The script will create a new subdirectory for each day and put each site into a subdirectory under that.

The "hack" for the 'abf' site is due to a so-far unexplained NullPointerException I get when trying to export all revisions that site, so I disabled the -r option for that one.

To run this Groovy script, I created a short batch file (yes, I'm running on Windows XP):

groovy -cp lib\google-sites-liberation-1.0.3.jar src\BackupGoogleSites.groovy

Oh, if you try this and your system complains it can't find 'groovy', you'll need to install the latest version of Groovy.

Tuesday, March 23, 2010

Unit Testing Java Code with JMockit

My current project has a lot of "ancient-in-Java-years" code that unit testing purists would say is "untestable".  Here are some of the things about it that make testing difficult:
  • Use of static methods in the service layer
  • No dependency injection or service locators/factories to speak of (dependencies are typically instantiated with "new", or are static method calls)
  • Little use of interfaces
About six years ago, which was the time when I was first exposed to the EasyMock mock object framework, it indeed was more or less true that Java code with the above problems was in fact "untestable".  The standard version of EasyMock required that all collaborating objects being mocked out implement interfaces, although an EasyMock Class Extension was released some time after the initial release of EasyMock that allowed you to mock out collaborators that do not implement interfaces.

However, EasyMock still cannot help us with the first two issues: static methods and lack of dependency injection.  So what is a poor Java developer to do?  This is where my most recent favorite unit testing tool comes in: JMockit. It can handle all of the above problems and even more, such as mocking final classes and final methods, mocking JDK classes, and more.  It is probably the most powerful Java mocking framework currently available.  The other contender is PowerMock, but there are several reasons why I prefer JMockit which I will go over later.  Another option would be to use Groovy to test your Java code, using the MockFor and StubFor classes, but for the sake of this discussion let's stick to the Java language.

About the only thing that I don't like about JMockit is the name.  It's too easily confused with JMock, a less powerful mocking framework that is somewhat similar to EasyMock.  Almost every time I've told someone about JMockit, they think I'm talking about JMock and much confusion ensues.  Perhaps JMockit should consider a name change...maybe SuperMock?  MegaMock? Hmmm...

Anyway, back to my main point.  JMockit has made it possible to test any & all of our legacy code without changing the code to fit someone's idea of "testable" code.  Here's a quote from the JMockit site that I like a lot on this point:
The set of limitations listed above, which are found in conventional mocking tools, has come to be associated with the idea of "untestable code". Often, we see the restrictions resulting from those limitations considered as inevitable, or even as something that could be beneficial. The JMockit toolkit, which breaks away from these limitations and restrictions, shows that in fact there is no such thing as truly untestable code. There is, of course, code that is harder to test because it is too complicated and convoluted, lacks cohesion, and so on and so forth.
Therefore, by eliminating the technical limitations traditionally involved in the isolation of an unit from its dependencies, we get the benefit that no artificial design restrictions must be imposed on production code for the sake of unit testing. Additionally, it becomes possible to write unit tests for legacy code, without the need for any prior adaptation or refactoring. In short, with a less restrictive mock testing tool the testability of production code becomes much less of an issue, and developers get more freedom in using Java language features, as well as more OO design choices.
Here is an example to demonstrate some basic JMockit features.  Say we have a Bookstore class, which is the class we want to test (also known as a "system under test", or SUT).  It has a dependency on a static method in a BookstoreService class, which is sometimes called a "collaborator class".  Let's take a look at the Bookstore class:

package com.olsonzoo.example.legacy;

/**
 * Legacy Code example, to demonstrate JMockit usage.
 *
 * @author Jeff Olson (jeff@olsonzoo.com)
 */
public class Bookstore {
    public String getBookTitle(String isbn) {
        return BookstoreService.getBookTitle(isbn);
    }
}

I could show you the implementation of BookstoreService, but in fact we don't really care about that, because to test Bookstore we want to mock out the BookstoreService.getBookTitle() method. This is because we are assuming, for the sake of example, that this method is actually doing something expensive like contacting a remote web service to get the book title. Since this is a static method, though, we need JMockit to come to our rescue. Here is how we would write a couple of tests to do just that:

package com.olsonzoo.example.legacy;

import mockit.Expectations;
import mockit.Mocked;
import org.junit.Test;

import static org.hamcrest.CoreMatchers.*;
import static org.junit.Assert.assertThat;

/**
 * Test class for Bookstore.
 *
 * @author Jeff Olson (jeff@olsonzoo.com)
 */
public class BookstoreTest {

    @Mocked
    private BookstoreService service;

    @Test
    public void testGetBookTitle() throws Exception {
        final String isbn = "999999999X";
        final String expectedTitle = "The Dilbert Principle";
        new Expectations() {{
            BookstoreService.getBookTitle(isbn);
            result = expectedTitle;
        }};
        Bookstore store = new Bookstore();
        String title = store.getBookTitle(isbn);
        assertThat(title, equalTo(expectedTitle));
    }

    @Test
    public void testGetBookTitle_NotFound() throws Exception {
        final String isbn = "9999999980";
        new Expectations() {{
            BookstoreService.getBookTitle(isbn);
            result = null;
        }};
        Bookstore store = new Bookstore();
        String title = store.getBookTitle(isbn);
        assertThat(title, equalTo(null));
    }
}

Notice a few things here. First, we have declared that BookstoreService is to be mocked out by JMockit by using the @Mocked annotation. Second, we put our expected behavior inside an Expectations block. (The double braces are there because we are instantiating an anonymous class and using an initialization block.) Inside the Expectations block we tell JMockit what call to expect and what the result should be. After the Expectations, we call the Bookstore.getBookTitle() method and then assert that the resulting value is what we expected to get.

And that's it. Because we used an Expectations block, which is strict by default, JMockit automatically does a verification at the end to make sure that the methods expected were actually called (and no more). There is also a NonStrictExpectations alternative which allows you to be more lenient about which methods are called on the mocked classes, but in that case you have to do any verification yourself by using a Verifications() block, similar to the Expectations.

And that is a simple example using JMockit's behavior-based testing support. Another alternative is state-based testing. Behavior-based testing is the approach typically used in EasyMock, and is concerned with testing the details of the interactions between the object under test and the collaborator object. State-based testing, on the other hand, tends to be used when the interactions are not as important and a "stub" object is used in place of the collaborator. Martin Fowler has a great overview of the differences between mocks and stubs if you are interested.

Here is a JMockit example that tests the same method in Bookstore, but this time using state-based testing.

package com.olsonzoo.example.legacy;

import com.google.common.collect.Maps;
import mockit.Mock;
import mockit.MockClass;
import org.junit.BeforeClass;
import org.junit.Test;

import java.util.Map;

import static org.hamcrest.CoreMatchers.equalTo;
import static org.junit.Assert.assertThat;

/**
 * Test class for Bookstore.
 *
 * @author Jeff Olson (jeff@olsonzoo.com)
 */
public class BookstoreStateBasedTest {
    private static Map<String, String> bookMap = Maps.newHashMapWithExpectedSize(2);

    @BeforeClass
    public static void setup() {
        bookMap.put("0553293354", "Foundation");
        bookMap.put("0836220625", "The Far Side Gallery");
    }

    @MockClass(realClass = BookstoreService.class)
    public static class MockBookstoreService {
        @Mock
        public static String getBookTitle(String isbn) {
            if (bookMap.containsKey(isbn)) {
                return bookMap.get(isbn);
            } else {
                return null;
            }
        }
    }

    @Test
    public void testGetBookTitle() throws Exception {
        final String isbn = "0553293354";
        final String expectedTitle = "Foundation";
        Bookstore store = new Bookstore();
        String title = store.getBookTitle(isbn);
        assertThat(title, equalTo(expectedTitle));
    }

    @Test
    public void testGetBookTitle_NotFound() throws Exception {
        final String isbn = "9999999980";
        Bookstore store = new Bookstore();
        String title = store.getBookTitle(isbn);
        assertThat(title, equalTo(null));
    }
}

Notice that the BookstoreService is replaced by the MockBookstoreService by using the @MockClass annotation. Likewise, the getBookTitle() method is replaced by the mocked version using the @Mock annotation. It's really that simple.

You can also mock out constructors and static initializers by using the special $init and $clinit methods. See the JMockit documentation for more details.


To include JMockit when running your unit tests, all you need to do is make sure you are running your tests under JDK 1.5 or later, and include the following line in your VM arguments (assuming your jmockit.jar is in the lib directory):

-javaagent:lib/jmockit.jar

Finally, what about PowerMock? PowerMock has a lot of the same features as JMockit. However, one thing that I have seen in my limited experience in trying out PowerMock is that is much more low-level. You have to explicitly list out using the @PrepareForTest annotation which classes you want PowerMock to mock out for you. JMockit is much easier to use in that regard.

In conclusion, if you have legacy code that doesn't meet the traditional "testable" criteria, give JMockit a try.

Monday, March 15, 2010

Google Collections Goodness

I recently got permission at work to add Google Collections into our project.  There are a ton of great features of this library, so I wanted to share a couple of my favorites here.

Multimaps

Ever find yourself writing code like this?

import java.util.*;
public void doOldSchoolMultimap() {
    Map<String, List<Integer>> oddEvenMap = new HashMap<String, List<Integer>>();
    for (int i = 0; i < 100; i++) {
        if (i % 2 == 0) {
            String evenKey = "Even";
            if (!oddEvenMap.containsKey(evenKey)) {
                List<Integer> list = new ArrayList<Integer>();
                oddEvenMap.put(evenKey, list);
            }
            oddEvenMap.get(evenKey).add(i);
        } else {
            String oddKey = "Odd";
            if (!oddEvenMap.containsKey(oddKey)) {
                List<Integer> list = new ArrayList<Integer>();
                oddEvenMap.put(oddKey, list);
            }
            oddEvenMap.get(oddKey).add(i);
        }
    }

    for (Map.Entry<String, List<Integer>> entry : oddEvenMap.entrySet()) {
        for (Integer integer : entry.getValue()) {
            System.out.println("key=" + entry.getKey() + "; integer = " + integer);
        }
    }
}

This is a contrived example where we find all even numbers between 1-100 and put them into a List inside a Map, with "Even" as the key. Ditto for the odd numbers.

Now how would you like to replace it with this code instead?

import com.google.common.collect.ArrayListMultimap;
import com.google.common.collect.ListMultimap;

public void doGoogleMultimap() {
    ListMultimap<String, Integer> oddEvenMap = ArrayListMultimap.create();
    for (int i = 0; i < 100; i++) {
        if (i % 2 == 0) {
            oddEvenMap.put("Even", i);
        } else {
            oddEvenMap.put("Odd", i);
        }
    }

    for (Map.Entry<String, Collection<Integer>> entry : oddEvenMap.asMap().entrySet()) {
        for (Integer integer : entry.getValue()) {
            System.out.println("key=" + entry.getKey() + "; integer = " + integer);
        }
    }
}

Notice a couple of things about the second code block:
  1. No more checking to see if the List exists, then creating the List, etc.  The Google Multimap takes care of that for you.
  2. You only had to specify the generics on the left side, not on both sides.  Using the static factory method to create the ArrayListMultimap reduces the clutter considerably.
  3.  
     

Predicates

One of the features I have used to good effect in the past from Apache Commons Collections were the Predicate classes. Unfortunately, Commons Collections are not written for Java 1.5, so they don't have generics support. Google Collections to the rescue. Here's an example where we find all words that start with "Z" in a List of Strings.
import com.google.common.base.Predicate;
import com.google.common.collect.Collections2;
import com.google.common.collect.Lists;

public void doPredicateSearch() {
    Predicate<String> zWordFilter = new Predicate<String>() {
        public boolean apply(String s) {
            return s.toUpperCase().startsWith("Z");
        }
    };
    List<String> words = Lists.newArrayList("Able", "Baker", "Zulu");
    Collection<String> zWords = Collections2.filter(words, zWordFilter);
    for (String zWord : zWords) {
        System.out.println("zWord = " + zWord);
    }
}

Again, this is a very simplistic example, but the apply() method can be arbitrarily complex.  The Predicate object can also be reused as a function.  In fact, this is a step towards functional programming in Java. 

There are additional "pre-built" Predicates available from the Predicates class that allow you create compound predicates that can do even more.

Conclusion

Hopefully this quick overview has gotten you interested in Google Collections as a way to improve your Java code.