Sunday, October 22, 2006

Closures for Java (v0.3)

This post discusses a draft proposal for adding support for closures to the Java programming language for the Dolphin (JDK 7) release. It was carefully designed to interoperate with the current idiom of one-method interfaces. The latest version of the proposal and a prototype can be found at http://www.javac.info/.

We've just completed a major revision and simplification of the Closures for Java specification. Rather than post the specification on this blog, it is on its own web page, here. There are two versions of the specification: one with function types and one without. There is a pulldown menu at the top of the specification that makes it display only the text relevant to the version of the specification you want to see. Keeping the specification in this form will allow us to more easily maintain the two parallel specifications so we can compare them and delay a decision on this issue until later. These specifications are the starting point for our prototype.

There are two significant changes in this revision. First, there is a completely new syntax for closures and function types. Using the new syntax, with functon types, you can write

{int,int => int} plus = {int x, int y => x+y};

As you can see, we're proposing to add the new "arrow" token => to the syntax. Just to be perfectly clear, this code declares a variable of function type

{int,int => int}

which is a function that takes two ints and yields an int. The variable is named "plus" and it is assigned the closure value

{int x, int y => x+y}

which is a closure that receives two ints (named x and y) and yields their sum.

The second major change has to do with the treatment of "restricted" closures. We've done away with the "synchronized" parameters from the previous revision of the specification. Instead, you can inherit from a marker interface to restrict the closure conversion. If you don't use the marker interface, then closures are not restricted when converted to that type.

Another important change is to the meaning of a function type. It is now defined to be a system-provided interface type, and it is provided in a way that gives the required subtype relations among function types. That means that in order to invoke a value of function type, instead of simply placing arguments in parens after the function value, you use its "invoke" method. This significantly simplifies the name lookup rules for variables of function type. In fact, now there are no special rules at all.

As always, your feedback and ideas are welcome.

Friday, October 13, 2006

Concurrent Loops Using Java Closures

The java.util.concurrent package has very nice support for writing concurrent loops. I used it recently to improve the performance of some code in the Google Calendar Server. In the Calendar Server, we have a class representing an event on a calendar. The following method on an event computed the set of attendees to the event. The old code looked something like this:

public Collection<Principal> getAttendees() {
    List<Principal> result = new ArrayList<Principal>();
    for (EventResponse r : getResponses()) {
        if (r.mayAttend()) {
           
result.add(r.getAttendee());
        }

    }
    return result;
}

The problem with this code, it turned out, was that getAttendee() has to talk to a remote server to look up the Principal in a database, and although the remote server is very fast, it is far enough away that there is some latency. Since this loop is sequential, when there are many responses it spends most of its time waiting for a response from the remote server, over and over again. Consequently the call to getAttendees() was very slow for events with many attendees. One solution to this kind of problem would be to send a single batch request, and that is probably the best long-term solution to the problem, but the server in this case doesn't yet know how to handle batch requests. So our fix to the immediate performance problem was to use java.util.concurrent and make the requests in parallel; this did speed things up significantly:

public Collection<Principal> getAttendees() {
    final List<Principal> result = new ArrayList<Principal>();
    CompletionService<Void> ecs =
        new ExecutorCompletionService<Void>(threadPool);
    for (final EventResponse r : getResponses()) {
        ecs.submit(new Callable<Void>() {
            public Void call() {
                if (r.mayAttend()) {
                    try {

                        Principal attendee = r.getAttendee();
                        synchronized (result) {
                            result.add(attendee);
                        }
                    } catch (Throwable ex) {
                        LOGGER.log(Level.SEVERE, "Uncaught Exception", ex);
                    }
                }
                return null;
            }
        });
    }
    // wait for the tasks to complete
    for (final EventResponse r : getResponses()) {
        try {
            /*discard*/ ecs.take().get();
        } catch (InterruptedException ex) {
            throw new AssertionError(ex); // shouldn't happen
        } catch (ExecutionException ex) {
            LOGGER.log(Level.SEVERE, "Uncaught Exception", ex);
        }
    }
    return result;
}

When I find code like this I have a few reactions. First, my eyes glaze over at the complexity. Then, if I'm interested, I look carefully at the code to try to understand it. After all, I'm likely to want to do something like this again someday. Finally, I bookmark it so I can add it to my bag of tricks.
I don't think writing a concurrent loop should have to be so complex. Here is what I would like to have written:

public Collection<Principal> getAttendees() {
    List<Principal> result = new ArrayList<Principal>();
    for eachConcurrently(EventResponse r : getResponses(), threadPool) {
        if (r.mayAttend()) {
            Principal attendee = r.getAttendee();
            synchronized (result) {
                result.add(attendee);
            }
        }
    }
    return result;
}

You might think that in order to do this kind of thing we would need to add a concurrent looping statement to the language. Actually, it is possible to add the concurrent looping construct as a library method if you have closures in the language! It isn't trivial to write, but that's why we have people like Doug Lea in the world. An API like this "for eachConcurrently" thing should be written once by an expert and placed into the JDK for everyone to use.

What should happen if you use continue, break, or return within the body of this loop, or throw an exception? The continue case is easy: it just completes execution of that one iteration, and the other iterations proceed on their merry way. The semantics of break are a bit subtle, but obvious once you realize this is supposed to act like a loop: it completes the current iteration and cancels any other iterations that have not completed, and control returns to the caller of for eachConcurrently. Handling a return statement is similar: it cancels any uncompleted iterations and returns from the enclosing method, which in this case would be getAttendees. Finally, any exception that propagates out of a loop iteration cancels uncompleted iterations and propagates from the loop.

p.s.: Martin Buchholz offered the follow improved version of the code using java.util.concurrent:

public Collection<Principal> getAttendees() {
    final Collection<Principal> result
        = new ConcurrentLinkedQueue<Principal>();

    final Collection<Callable<Object>> tasks
        = new ArrayList<Callable<Object>>();
    for (final EventResponse r : getResponses())
        tasks.add(Executors.callable(new Runnable() { public void run() {
            try {
                if (r.mayAttend())
                    result.add(r.getAttendee());
            } catch (Throwable ex) {
                LOGGER.log(Level.SEVERE, "Uncaught Exception", ex);
            }}}));
    try {
        threadpool.invokeAll(tasks);
    } catch (InterruptedException ex) {}
    return new ArrayList<Principal>(result);

}

Monday, October 02, 2006

Iterative control abstraction: user-defined loops

A couple of people have written about using Java closures to write methods that act as looping constructs. For example, iterating through a map is currently an ugly exercise in boilerplate:

Droid seekDroid(Map<String,Droid> map) {
    for (Map.Entry<String,Droid> entry : map.entrySet()) {
        String droidName = entry.getKey();
        Droid droid = entry.getValue();
        if (droid.isSought()) {
            System.out.printf("'%s' is the droid we seek.%n", droidName);
            return droid;
        }
    }
}

It has been proposed that we add looping methods to Collection and Map that take advantage of closures, but it would be incompatible to add methods to an existing interface. We can accomplish the same thing using static methods. With one small additional twist to the closures proposal, we could make it even easier to write a method that iterates over maps. Here is what we want our client to look like:

import static java.util.Collections.eachEntry;
...
Droid seekDroid(Map<String,Droid> map) {
    for eachEntry(String droidName, Droid droid : map) {
        if (droid.isSought()) {
            System.out.printf("'%s' is the droid we seek.%n", droidName);
            return droid;
        }
    }
}

To enable this, the following would be added to java.util.Collections:

public interface Block2<K,V,throws E> {
    void invoke(K k, V v) throws E;
}
public static <K,V,throws E>
void for eachEntry(Map<K,V> map, Block2<K,V,E> block) throws E {

    for (Map.Entry<K,V> entry : map.entrySet()) {
        block.invoke(entry.getKey(), entry.getValue());
    }
}

Notice the "for" qualifier on the declaration of eachEntry, and the for keyword on its invocation. Methods defined this way and then invoked using the control abstraction syntax would require the use of the for keyword on the invocation. The compiler arranges such control abstractions to act like loops, by properly interpreting break and continue statements within the body of the closure. Although this example doesn't use break or continue statements, the mere presence of the for keyword on the invocation helps the reader understand the code. In this way user-defined control constructs could be written that appear and act as loops.