This program is part of the Comparative Programming :: Frequency Analysis set of examples.
Our Java example, while longer than most of the other frequency analysis programs, is fairly straight forward in it’s approach. To keep track of the number of character occurances in our text, we want to create a map of characters to numbers, which we’ll increment every time we see a valid character. In lines 11 through 19 we create our Hashtable and initialize it with the letters ‘a’ through ‘z’, associating with each the value ‘0.0′.
We’ll need to reference the letters ‘a’ through ‘z’ twice in our program; once to create our Hashtable, and again to print the results. For this reason we create a string of letters in alphabetical order on line 12. We can iterate through the string with a simple for loop (lines 17 and 44) to access our character counts.
With our Hashtable ready, we can open and read our file. On line 22 we create a BufferedReader to access the contents of our text file line by line. Our while loop (line 26) extracts each line from the file, placing it’s value in the line variable. This while loop will continue until our input file’s readLine method returns null, which signifies the end of the file.
Once we have a line of text from our file, we need to break it down into characters, the fundamental unit of text for our analysis. To do this we use a for loop, starting on line 29, that counts from 0 up to the length of our line. Using the index of our for loop, we can sequentially extract the characters from our line of text (line 30) using the String class’ substring method.
We don’t want to count any random characters we come across; we’re only interested in letters. Line 33 of our program contains a simple Regular Expression to test our extracted character. The pattern [a-zA-Z] matches any lower or upper case character, but nothing else. If our character matches our regular expression, we want to include it in our frequency analysis, so translate the character to lower case (in case it was an upper case character), and increment our character frequency in the Hashtable (line 37). We increment this value by first retrieving the old value using our current character
freq.get(c) + 1.0
and then inserting our incremented value back into the Hashtable
freq.put(c, freq.get(c) + 1.0)
The last step in our for loop keeps track of the total number of valid characters so far encountered.
Once we’re done with our file, having read each line and extracted each valid character, we need to calculate our character frequencies and print them out (lines 44 to 48).
Using the string of letters from ‘a’ to ‘z’ that we defined earlier, we can get the key and value for each of the characters in our Hashtable. The key is the character itself, a letter from ‘a’ to ‘z’, while the value is how often the character occured in the text file we analyzed, the number we computed in the course of analyzing the line and characters of the file.
With the key and value in hand, it is a simple matter to determine the relative frequency of our character in the file (line 46). Printing the frequency of our character takes a little bit more work. We’d like the output of our program to look like:
a: 8.00 b: 1.51 c: 2.42 d: 3.90 e: 12.84 f: 2.20 . . . t: 9.42 u: 3.02 v: 0.94 w: 2.28 x: 0.15 y: 2.16 z: 0.04
A first attempt at printing our results could be:
System.out.println(key + ": " + perc);
Which correctly prints a letter, followed by a colon, a space, and our frequency, but our frequency is a significantly long number:
a: 7.9990618682967325 b: 1.5108759990916503 c: 2.4194118807679277 d: 3.9029257051809445 . . .
We want to limit our frequency to two decimal places. Thankfully there is a simple method in the String class for printing numbers and strings in a controlled manner. We can pass C-Style string formats to the format method to create a better output string (line 47). The %2.2f instructs the format method that our floating point value should be printed with at most two leading digits, and two trailing digits.
While our frequency analysis is done, it’s important to note that our entire program is wrapped in a try-catch block; a construct of the java language intended for exception processing. When working with any type of Input/Output, be it files, networks, or peripherals, there is the possibility that communication will break, or somehow be disrupted. Many Java classes leave it up to the programmer to handle these situations. In our case opening and reading from a file can cause IOExceptions that we need to account for. In our program we don’t attempt to recover from the error, opting instead to print out a program trace of where the error occured (line 53).
From the command line:
java freq republic.txt
This program is part of the Comparative Programming :: Echo set of examples.
This single Java class is about as bare-bones as a Java program can get. We utilize a Java 5 foreach loop to retrieve each program argument and print it back to System.out, finishing with a new line.
Program Source: echo.java
From the command line:
java echo Hello World
The Wobble program is pure Java, utilizing the QTJava quicktime capture extensions for video acquisition. In the program archive you will find:
javac -classpath .:VBP.JAR wobble.java java -classpath .:VBP.JAR wobble
The SlideGame program is pure Java, utilizing the QTJava quicktime capture extensions for video acquisition. In the program archive you will find:
javac -classpath .:VBP.JAR slideGame.java java -classpath .:VBP.JAR slideGame
The Pointillism program is pure Java, utilizing the QTJava quicktime capture extensions for video acquisition. In the program archive you will find:
javac -classpath .:VBP.JAR pointilism.java java -classpath .:VBP.JAR pointilism