Nov
8
Longest Words Followup – Java -v- Perl
Filed Under 42 (Life the Universe & Everything), Computers & Tech on November 8, 2010 at 6:40 pm
Yesterday I posted about using Perl to solve the question “what’s the longest word I can type with just half a keyboard?”. My self an Connor were joking that it would be a lot more difficult with Java, first to write the code, then to run.
I literally used the identical algorithm for the Java program, even using the same variable names, and printed the results out identically (I verified this with the Unix diff
command). I also did my best to use the various built-in Java functionality and java.util
classes to minimise the amount of heavy lifting my code had to do.
So, this is the resulting code:
import java.util.Vector; import java.util.Enumeration; import java.io.BufferedReader; import java.io.File; import java.io.FileReader; public class dict{ public static void main(String args[]){ //declare the needed variables String longestLeft="", longestRight=""; int minLength = 10; Vector<String> longLeftWords = new Vector<String>(); Vector<String> longRightWords = new Vector<String>(); try{ //open the dictionary file File file = new File(args[0]); BufferedReader reader = null; reader = new BufferedReader(new FileReader(file)); // loop through the file String line; while((line = reader.readLine()) != null){ // remove the trailing new line character from the string line = line.replaceAll("\n|\r", ""); // check for characters on the right, if not, then it's an all-left word if(!line.toLowerCase().matches(".*[yuiophjklnm].*")){ if(line.length() >= minLength){ longLeftWords.add(line); } if(line.length() > longestLeft.length()){ longestLeft = line; } } //vica-versa if(!line.toLowerCase().matches(".*[qwertasdfgzxcvb].*")){ if(line.length() >= minLength){ longRightWords.add(line); } if(line.length() > longestRight.length()){ longestRight = line; } } } // close the dictionary file reader.close(); }catch(Exception e){ System.out.println("\n\nERROR - Failed to read the dictionary file '" + args[0] + "'\n"); e.printStackTrace(); System.exit(1); } // print the results System.out.println("\nLong words (at least " + minLength + " letters) with the left-side of the KB only:"); Enumeration words = longLeftWords.elements(); while(words.hasMoreElements()){ System.out.println("\t" + (String)words.nextElement()); } System.out.println("\t\t(total: " + longLeftWords.size() + ")"); System.out.println("\nLong words (at least " + minLength + " letters) with the right-side of the KB only:"); words = longRightWords.elements(); while(words.hasMoreElements()){ System.out.println("\t" + (String)words.nextElement()); } System.out.println("\t\t(total: " + longRightWords.size() + ")"); System.out.println("\nLongest left-only word: " + longestLeft + " (" + longestLeft.length() + " letters)"); System.out.println("\nLongest right-only word: " + longestRight + " (" + longestRight.length() + " letters)\n"); } }
The obvious thing is that it’s longer than yesterday’s final delux Perl solution, about twice as long in fact. The code is also much wordier, with the lines being longer than in the Perl version. There’s also a heck of a lot of ‘fluff’ in Java. In perl it literally takes two characters (<>
), while in Java it takes about 6 when you include the mandatory exception handling. Getting a variable-length array is also far more cumbersome, using java.util.Vector
helps a lot, but it means you have to use java.util.Enumeration
to iterate through your vector for printing instead of a simple foreach loop like in Perl. Finally, notice how much clunkier the regular expressions are! Nothing as trivial as the m
operator in Perl in Java
OK, so the code is longer, more fluffy, and harder to read and write, but how does it run? The simple answer, slower! About three times slower in fact:
bartmbp:Temp bart$ time ./dict.pl /usr/share/dict/words >>/dev/null real 0m0.761s user 0m0.275s sys 0m0.010s bartmbp:Temp bart$ time java dict /usr/share/dict/words >>/dev/null real 0m2.391s user 0m2.230s sys 0m0.121s bartmbp:Temp bart$
Given that Perl is a scripting language and Java is at least partially compiled, you’d expect Java to have the edge. But, when it comes to pattern matching, Perl is in its element, while Java is really rather lost. I think it’s Java’s poor RE engine that’s making the difference here.
So, there you have it, Perl really is quicker and simpler for messing with text. Who knew 😉