Asynchronous HTTP Requests in Perl Using AnyEvent


Summary

I need to make a number of HTTP requests from within a Perl program. There might be as many as a dozen (sometimes more). Doing them serially in a for loop would be too slow. So, I started exploring asynchronous programming in Perl using AnyEvent.

Wrong-Way Concurrency

I need to make a number of HTTP requests from within a Perl program. There might be as many as a dozen (sometimes more). Doing them serially in a for loop would be too slow. So, I started exploring asynchronous programming in Perl. I quickly found AnyEvent, described as the DBI of Perl event loops. You can use it as is, or use it as an abstraction on some other event-loop framework like Event or Tk. AnyEvent has an HTTP package called AnyEvent::HTTP.

BTW, if you're a Perl programmer and you've been jealous of all the cool kids and node.js, AnyEvent is how you do node.js-style programming in Perl.

The goal is to use a GET to request each URL, collect results, and then end once they're all done. Doing that requires using condition variables (built into AnyEvent) in a particular way. I put together information from a number of places to get a working demo, so I thought I'd blog it for others.

Before we see how it's implemented, though, here are the results:

[web@dev00 tmp]$ ./ae1.pl 
End of loop
done at ./ae1.pl line 8.
http://www.windley.com/  has length 86636 & loaded in  0.16ms
https://www.google.com   has length 32408 & loaded in  0.29ms
https://www.bing.com     has length 32022 & loaded in  0.37ms
http://www.wetpaint.com  has length 93982 & loaded in  0.45ms
http://www.example.com   has length 2966  & loaded in  0.49ms
Total elapsed time: 0.507862091064453ms

I'll explain why the End of loop and done appear before the result in a minute, but notice that the program called five URLs and the total elapsed runtime is just a few 100ths of a second over the longest time to grab a single URL. That's concurrency in action.

Here's how it works. We start off with the preliminaries:

#!/usr/bin/perl
use strict;
use AnyEvent;
use AnyEvent::HTTP;
use Time::HiRes qw(time);

The next step is to create a condition variable and give it a call back to tell us when it's all done.

my $cv = AnyEvent->condvar( cb => sub {
    warn "done";
});

Now some declarations, including the list of URLs to get.

my $urls = ["https://www.google.com",
\t    "http://www.windley.com/",
\t    "https://www.bing.com",
\t    "http://www.example.com",
\t    "http://www.wetpaint.com",
\t    ];

my $start = time;
my $result;

We call begin() on the condition variable $cv to indicate that we're starting the concurrent section. We also give this a callback function that calls send() on $cv so that the result is transmitted to the end of the concurrent block. The begin() increments a counter in the condition variable. This callback won't be executed until the counter returns back to the current value. When the counter is zero, we're done.

$cv->begin(sub { shift->send($result) });

Now we loop

for my $url (@$urls) {

Inside the loop, we call begin() again so that we count each time though the loop. You'll see that we call end() to decrement the counter after each HTTP GET. When we get back to zero, we know that each concurrent thread has completed.

    $cv->begin;

    my $now = time;
    my $request;  

Now we make the request. The third argument is the callback subroutine. It won't be called until the HTTP GET is done. Notice that it takes the headers and body as arguments, processes them by pushing something on the $result array and then undefines the request to ensure it is canceled. Finally we call end() to decrement the condition variable counter because this thread is done.

    $request = http_request(
      GET => $url, 
      timeout => 2, # seconds
      sub {
        my ($body, $hdr) = @_;
\tif ($hdr->{Status} =~ /^2/) {
          push (@$result, join("\\t", ($url,
\t\t\t\t      " has length ",
\t\t\t\t      $hdr->{'content-length'}, 
\t\t\t\t      " & loaded in ",
\t\t\t\t      time - $now,
\t\t\t\t      "ms"))
\t       );
        } else {
\t  push (@$result, join("",
\t\t\t       "Error for ",
\t\t\t       $url,
\t\t\t       ": (", 
\t\t\t       $hdr->{Status}, 
\t\t\t       ") ", 
\t\t\t       $hdr->{Reason})
\t\t);
\t}
        undef $request;
        $cv->end;
      }
   );

At the end of the loop, we call end() again to decrement the counter from 1 to zero, and run the callback we attached to the first begin() we called above. That sends the results to the recv() below.

  }
$cv->end;

The final chunk of code receives the result and prints it out.

warn "End of loop\
";
my $foo =   $cv->recv;
print join("\
", @$foo), "\
" if defined $foo;
print "Total elapsed time: ", time-$start, "ms\
";

So, hopefully at this point, you can make sense of the print order in the results:

[web@dev00 tmp]$ ./ae1.pl 
End of loop
done at ./ae1.pl line 8.
http://www.windley.com/  has length 86636 & loaded in  0.16ms
https://www.google.com   has length 32408 & loaded in  0.29ms
https://www.bing.com     has length 32022 & loaded in  0.37ms
http://www.wetpaint.com  has length 93982 & loaded in  0.45ms
http://www.example.com   has length 2966  & loaded in  0.49ms
Total elapsed time: 0.507862091064453ms

The End of loop prints out immediately. The done prints when the results are received. As I play with this, I notice the order of results change as one or another Web site responds a little slower than another. I love seeing the whole thing come back in the time it takes to do one call. That's going to be a big win.

I've put the code online so you don't have to cut and paste the pieces above. All you need to run it are the AnyEvent packages.