Wednesday 26 December 2007

Sunday 16 December 2007

Concurrent fun

Somebody asked a question on Experts Exchange about how to lock an object so that writes are exclusive but concurrent reads are allowed. Locking on a data-holding object using synchronized methods of that object is an all-or-nothing affair since a thread acquiring the monitor, whether to read or write, will lock that object to any other access. The java.util.concurrent package, present in the Java JDK since 1.5 and based on the work of Doug Lea, makes this granularity of lock easy to achieve. I thought I'd write something to acquaint myself more with this package, since, I'm glad to say, I rarely am forced to write multi-threaded code and the little game of chance below is the result. The log4j classes should be in your classpath and your log4j.properties should be something like this one. The executable jar is here
package net.proteanit.demo;

import org.apache.log4j.Logger;
import java.util.ArrayList;
import java.util.Collections;
import java.util.List;
import java.util.concurrent.ThreadFactory;
import java.util.concurrent.Callable;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.RejectedExecutionException;
import java.util.concurrent.locks.Lock;
import java.util.concurrent.locks.ReentrantReadWriteLock;

enum AccessType {
   READER,
   WRITER;
}

/**
* This is a simple game of chance to demonstrate ReadWrite locks in Java: a
* writer acquires a write lock and rewrites a List by shuffling it.
* If this ends up being in the correct order, the next reading thread to read
* the data wins the game
*
* @author Charles Johnson
* @version 1.0
 */
public class LockGame {
   private final ReentrantReadWriteLock rwl;
   private final List<String> data;
   private final int correctOrder;
   private final Lock r;
   private final Lock w;
   private final Logger log;
   private ExecutorService threads;
   private volatile boolean playerHasWon;

   public LockGame() {
       log = Logger.getLogger(LockGame.class);
       rwl = new ReentrantReadWriteLock();
       data = new ArrayList<String>();
       data.add("alpha");
       data.add("beta");
       data.add("gamma");
       correctOrder = data.hashCode();
       r = rwl.readLock();
       w = rwl.writeLock();
   }

   private void start() {
       final int SIZE = 16;
       List<Callable<Void>> workers = new ArrayList<Callable<Void>>(SIZE);
       for (int i = 0; i < SIZE; i++) {
           workers.add(new ReaderWriter());
       }
       threads = Executors.newFixedThreadPool(SIZE, new PrettyPrintThreadFactory(99, "thread-"));
       try {
           if (log.isDebugEnabled()) {
               log.debug("Invoking all players...");
           }
           threads.invokeAll(workers);
       } catch (InterruptedException e) {
           e.printStackTrace();
       } catch (RejectedExecutionException e) {
           // Ignore this - some thread has won
       } finally {
           // Another thread might have already called this
           stop();
       }
   }

   private void stop() {
       if (!threads.isShutdown()) {
           threads.shutdownNow();
       }
       if (!playerHasWon && log.isDebugEnabled()) {
           log.debug("There was no winner on that run");
       }
   }

   private void changeData() {
       try {
           w.lock();
           if (log.isDebugEnabled()) {
               log.debug(String.format("Acquired lock in changeData() at %d", System.currentTimeMillis()));
           }
           Collections.shuffle(data);
           if (log.isDebugEnabled()) {
               log.debug(String.format("Data changed - now %s", data.toString()));
           }
           try { Thread.sleep((int)(Math.random() * 1000)); } catch(InterruptedException e) { /* ignore it */ }
       } finally {
           w.unlock();
           if (log.isDebugEnabled()) {
               log.debug(String.format("Released lock in changeData() at %d", System.currentTimeMillis()));
           }
       }
   }

   private void readData() {
       try {
           // We can't be sure that all threads can be closed down
           // after a single winner emerges so we make the boolean check
           if (playerHasWon) {
               return;
           }
           r.lock();
           if (log.isDebugEnabled()) {
               log.debug(String.format("Acquired lock in readData() at %d", System.currentTimeMillis()));
           }
           if (log.isDebugEnabled()) {
               log.debug(String.format("Reading data - now %s", data.toString()));
           }
           if (data.hashCode() == correctOrder) {
               if (log.isDebugEnabled()) {
                   log.debug("I'VE WON! Data now correct");
               }
               playerHasWon = true;
               stop();
           }
           else {
               try { Thread.sleep((int)(Math.random() * 1000)); } catch(InterruptedException e) { /* ignore it */ }
           }
       } finally {
           r.unlock();
           if (log.isDebugEnabled()) {
               log.debug(String.format("Released lock in readData() at %d", System.currentTimeMillis()));
           }
       }
   }

   public static void main(String[] args) {
       LockGame lg = new LockGame();
       lg.changeData();
       lg.start();
   }

   private class ReaderWriter implements Callable<Void> {
       private AccessType access;

       public ReaderWriter() {
           // Roughly 1 in 5 chance of being a writer, else reader
           access = (Math.random() > 0.8) ? AccessType.WRITER : AccessType.READER;
       }

       public Void call() {
           if (log.isDebugEnabled()) {
               log.debug(String.format("This ReaderWriter is of type %s",
                       access));
           }
           switch (access) {
           case WRITER:
               changeData();
               return null;
           case READER:
               readData();
               return null;
           default:
               return null;
           }
       }
   }
}

The log file produced gives us some interesting information, but first we'll do a little processing on it:
1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: 21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 14:43:50,957 [main] Acquired lock in changeData() at 1197902630929 14:43:51,486 [main] Released lock in changeData() at 1197902631485 14:43:51,567 [thread-00] Acquired lock in changeData() at 1197902631566 14:43:51,869 [thread-00] Released lock in changeData() at 1197902631869 14:43:51,872 [thread-01] Acquired lock in readData() at 1197902631871 14:43:51,876 [thread-03] Acquired lock in readData() at 1197902631875 14:43:52,070 [thread-03] Released lock in readData() at 1197902632069 14:43:52,636 [thread-01] Released lock in readData() at 1197902632633 14:43:52,640 [thread-04] Acquired lock in changeData() at 1197902632637 14:43:53,626 [thread-04] Released lock in changeData() at 1197902633621 14:43:53,628 [thread-06] Acquired lock in readData() at 1197902633627 14:43:53,631 [thread-08] Acquired lock in readData() at 1197902633630 14:43:53,749 [thread-06] Released lock in readData() at 1197902633749 14:43:54,322 [thread-08] Released lock in readData() at 1197902634321 14:43:54,323 [thread-07] Acquired lock in changeData() at 1197902634323 14:43:55,310 [thread-07] Released lock in changeData() at 1197902635305 14:43:55,312 [thread-09] Acquired lock in readData() at 1197902635311 14:43:55,315 [thread-02] Acquired lock in readData() at 1197902635314 14:43:55,493 [thread-09] Released lock in readData() at 1197902635493 14:43:55,514 [thread-02] Released lock in readData() at 1197902635509 14:43:55,516 [thread-10] Acquired lock in changeData() at 1197902635516 14:43:55,554 [thread-10] Released lock in changeData() at 1197902635553 14:43:55,556 [thread-05] Acquired lock in readData() at 1197902635555 14:43:55,561 [thread-12] Acquired lock in readData() at 1197902635559 14:43:55,918 [thread-05] Released lock in readData() at 1197902635917 14:43:56,546 [thread-12] Released lock in readData() at 1197902636545 14:43:56,547 [thread-13] Acquired lock in changeData() at 1197902636547 14:43:57,118 [thread-13] Released lock in changeData() at 1197902637117 14:43:57,119 [thread-14] Acquired lock in readData() at 1197902637119 14:43:57,122 [thread-11] Acquired lock in readData() at 1197902637121 14:43:57,124 [thread-15] Acquired lock in readData() at 1197902637123 14:43:57,394 [thread-14] Released lock in readData() at 1197902637393 14:43:57,534 [thread-15] Released lock in readData() at 1197902637533 14:43:58,010 [thread-11] Released lock in readData() at 1197902638009
Lines 1 and 2 show the main application thread scrambling the data from its intial ordered state. Of course there's a one in six-ish chance the shuffle could leave it unchanged, but let's live dangerously.
The important thing to note is that in places like lines 9-10 and 21-22 the write thread enters and exits in an orderly fashion with no competition.
In contrast, in place like lines 5-8, concurrent threads co-exist happily when only reading is being done. 'thread-01' gets a read lock in line 5, followed by 'thread-03' before thread-01 has released its lock.

Friday 14 December 2007

Blogger fix

I noticed yesterday that Blogger does not automatically escape html such as <p> when working in the editor in the 'Compose' tab. Since I'm going to be talking about markup, the first thing to do is to knock up a simple html escape utility so I can paste markup into this blog editor and have it come out correctly. The following should do the trick, implemented as a bash shell function, but probably isn't comprehensive:
# Escape html for unsmart html editors
function eschtml() {
  # Taking the input from a file is probably the easiest
  # , and let's save it to a to file of the same name with
  # .esc appended as an extension
  cat $1 | sed 's/\&/\&amp;/g' | sed 's/</\&lt;/g' | sed 's/>/\&gt;/g' >$1.esc
  echo "Escaped version now in $1.esc"
}
It seems to do the trick: The original file 'func.htm' looked like the above, but in order to post that, I had to paste the output of func.htm.esc into the editor. And for recursion lovers, that latter looked like:
# Escape html for unsmart html editors
function eschtml() {
  # Taking the input from a file is probably the easiest
  # , and let's save it to a to file of the same name with
  # .esc appended as an extension
  cat $1 | sed 's/\&amp;/\&amp;amp;/g' | sed 's/&lt;/\&amp;lt;/g' | sed 's/&gt;/\&amp;gt;/g' &gt;$1.esc
  echo "Escaped version now in $1.esc"
}
Yes, you guessed it - that was made from the output of func.htm.esc.esc

Thursday 13 December 2007

Bloggered

It's very tempting to leave the post below entitled 'Truncating an mpeg' with its current mashed line spacing as a testament to what irritated me immensely for what seemed like a long time this evening. Just in case i don't, this is what it once looked like.
It all began yesterday when Mantex pointed out to me that the image in the post previous to this one didn't line up with the text. I discovered that Blogger allows the line breaks of standard 'typing' to be preserved by inserting automatically. After removing the whitespace from that region, which corrected the mysterious alignment problem, I immediately turned off this 'helpful'feature.
Today I noticed that something had messed up the 'Truncating..' post so I started to look into it. After a while I thought, OK, I'll simply delete it all and repost it. This is what i posted into a blank field in the blog editor:
======================================= Had to edit some video today in the most rudimentary fashion - namely to truncate a six minute mpeg to its first three minutes, the result of which is <a href="http://byrnegaunt.co.uk/tech/rez3.mpg">here</a>. After a bit of hunting around I came up with <a href="http://mpgtx.sourceforge.net">mpgtx</a>, which looked like the tool for the job, and one with which my tottering system could cope. Figuring out the correct command from the man page took me rather longer than I'd have liked, but I finally came up with <div><pre>mpgtx reza.mpg '[-03:00]' -b rez3</pre></div>which gave me a file with the first three minutes (or very nearly) in it called rez3.mpg. I got a message on the command line telling me something to the effect that the file had been cut in a place that was less than completely desirable and that were I to try to concatenate it with another bit, there could be a problem. I ignored this, since I'd never need to do this, but the whole business got me thinking along two lines. <br> Firstly, I find my thoughts being often drawn to the disparity between the abstract beauty of the engineering and its principles that allow us to do these things and the banality of what we actually use it for. <div style="font-style: italic;">What did you do today?</div> <div style="color: rgb(0, 0, 255);">Oh, I cut a video in half. </div> <div style="font-style: italic;">How long did that take you?</div> <div style="color: rgb(0, 0, 255);">Err...</div> <br> Now anything in excess of five minutes in response to that last question seems frankly embarrassing. OK I'm not a video editor, but even so. Of course, the author of mpgtx probably took many hours writing the stuff that allowed me to do that. <br> Secondly, why aren't there more sites, centralising sites where one can find good examples of the use of command line utilities. Some man pages are good in this respect, providing some relevant examples but normally one simply finds sites that regurgitate those very same examples. If this were organised it should be possible to find relevant examples much more easily. ======================================= The top part of what actually got posted is this: ======================================= <div class='post-body entry-content'> <p>Had to edit some video today in the most rudimentary fashion - namely to truncate a six minute mpeg to its first three minutes, the result of which is <a href="http://byrnegaunt.co.uk/tech/rez3.mpg">here</a>. After a bit of hunting around I came up with <a href="http://mpgtx.sourceforge.net">mpgtx</a>, which looked like the tool for the job, and one with which my tottering system could cope. Figuring out the correct command from the man page took me rather longer than I'd have liked, but I finally came up with </p><div><pre>mpgtx reza.mpg '[-03:00]' -b rez3</pre></div>which gave me a file with the first three minutes (or very nearly) in it called rez3.mpg. I got a message on the command line telling me something to the effect that the file had been cut in a place that was less than completely desirable and that were I to try to concatenate it with another bit, there could be a problem. I ignored this, since I'd never need to do this, but the whole business got me thinking along two lines. <br> ======================================= Wait a minute - I didn't put that <p> in there! Look how it starts at the beginning and closes, seemingly arbitrarily, just before the code quote. Blogger puts these in of course, but the tag should close at the end of the posting. We're getting somewhere. Now why would the enclosed paragraph have a different line spacing from the rest of the post. If you look below, you'll see it does. Because of the aforementioned formatting 'aids', I started looking around at whitespace issues, and after drawing a blank there, came up with the notion that the <p> tag must be redefined by CSS in terms of line spacing. Sure enough, if we look at the code for the default template, we see
.post p {
  margin:0 0 .75em;
  line-height:1.6em;
}
To be continued...

Tuesday 11 December 2007

Joomla

More work on the ByrneGaunt website. This uses the content management system Joomla with an e-commerce plugin called ECJC.

The idea of using a CMS is to avoid having to code pages oneself and to supply a framework of services and functionality around which a website can be built. You might need to be able to organise subscriptions to your site and send out newsletters. Getting that sort of stuff out of the box is a boon. A further aspiration may be to set up a site for an organisation such that the non-technical can edit it themselves although I'm not entirely convinced that leaves the realms of theory that often.

I like to write as little code as possible so have taken one of many free templates that are available and adapted it. It's in this area though that one needs to dive in and alter the code on occasions. For instance, the template I've chosen for the front page likes to organise news items in long columns as in a magazine. I've yet to find out how the template works together with the rendering system to do this. So far, detailed documentation on how this comes about I've found hard to find. There are usability annoyances: having to click on Apply and on Save when editing image placement in text: finding it so difficult to use the WYSIWYG html editor because the text was invisible with the template I'm using that I had to turn WYSYWYG off. Nonetheless I'd recommend Joomla.

Monday 10 December 2007

Truncating an mpeg

Had to edit some video today in the most rudimentary fashion - namely to truncate a six minute mpeg to its first three minutes, the result of which is here. After a bit of hunting around I came up with mpgtx, which looked like the tool for the job, and one with which my tottering system could cope. Figuring out the correct command from the man page took me rather longer than I'd have liked, but I finally came up with
mpgtx reza.mpg  '[-03:00]'  -b rez3
which gave me a file with the first three minutes (or very nearly) in it called rez3.mpg. I got a message on the command line telling me something to the effect that the file had been cut in a place that was less than completely desirable and that were I to try to concatenate it with another bit, there could be a problem. I ignored this, since I'd never need to do this, but the whole business got me thinking along two lines.
Firstly, I find my thoughts being often drawn to the disparity between the abstract beauty of the engineering and its principles that allow us to do these things and the banality of what we actually use it for.
What did you do today?
Oh, I cut a video in half.
How long did that take you?
Err...

Now anything in excess of five minutes in response to that last question seems frankly embarrassing. OK I'm not a video editor, but even so. Of course, the author of mpgtx probably took many hours writing the stuff that allowed me to do that.
Secondly, why aren't there more sites, centralising sites where one can find good examples of the use of command line utilities. Some man pages are good in this respect, providing some relevant examples but normally one simply finds sites that regurgitate those very same examples. If this were organised it should be possible to find relevant examples much more easily.

FtpCrawl

No, not the stroke that Duncan Goodhew never learned but a utility I wrote to take some of the aggravation out of using a cheap hosting company with no proper remote access other than through a slow and clunky control panel. I put all sorts of stuff up there and then forget where. FTP itself doesn't support searching in the protocol so we have to do it in software. FtpCrawl simply burrows down listing all files, starting from a given directory. It's as bare as that and the minimal documentation (!) can be seen from running the Java jar without arguments as
java -jar ftpcrawl.jar
The output can be piped through grep or if you're unlucky enough, through Windows find, although I think you might find grep on some Windows systems. You can get it here. It's only a näive domestic utility but I think you'll be amused by its recursion.