Patrick Dwyer

Frequency :: C

in News, Comparative Programming, C by patrick

This program is part of the Comparative Programming :: Frequency Analysis set of examples.

In our C example we don’t have a readily available construct akin to the Hashtable or Map of other languages, so we’ll resort to a simple array of 26 values to count our character frequency. With lines 10 through 13 we declare our array of floating point numbers, initializing each value to 0. In the end our calculation of frequency will be based upon a floating point formula, so using decimal (floating point) numbers instead of integers for counting characters saves us a later conversion.

Reading a file is straightforward; we open a file in read mode (r) on line 16, saving the resulting file handle as a File pointer. All of our file operations will use this file handle, starting with the while statement on line 18 that controls our character reading loop. The feof method will return true if the provided file handle is at the End-Of-File, so our while loop continues so long as the end of file has not been reached (notice the ! before feof, which turns our statement into while not end of file).

To count our character frequency, we’ll read the input file one character at a time with the fgetc method, storing each character in the c variable as it is read.

For our frequency analysis we’re working with ASCII encoded files, so a lowercase letter will fall in the numeric range of 97 to 122 (a to z), and an uppercase letter will be in the range 65 to 90 (A to Z). Two if statements can check for a character in the lower or upper case range. In either case (upper or lower case) we need a simple means of recording the letter we’ve read from the file. In the case of an upper case letter, subtracting 65 from the letter value will place it in our array, while subtracting 97 from a lowercase letter will do the same. Each letter a through z and A through Z can now be counted as a number 0 through 25, which maps exactly to our array indices for the char_count array.

Once the counting is complete we close our input file with fclose (line 36), and can begin our calculations of frequency. To start or calculation we need a for loop to move through each value in our frequency array. Counting from 0 to 25 (line 39) we can use a simple formula (line 40) to discover the character frequency for a specific array index. To translate the integer index (our 0 to 25 value) and calculated frequency to a printable string we’ll take advantage of the formatted printing method (printf line 41), where the format string %c: %.2f translates to “a character, followed by a colon, a space, and a floating point number printed to two decimal places”. The remaining arguments to the printf method are a character (our index value plus 97, which is equivalent to the letters a through z), and our calculated frequency. With that our program is complete, and will print out a frequency chart like the following:

a: 8.00
b: 1.51
c: 2.42
d: 3.90
e: 12.84
f: 2.20
t: 9.42
u: 3.02
v: 0.94
w: 2.28
x: 0.15
y: 2.16
z: 0.04
01#include <stdio.h>
03int main(int argc, char *argv[]) {
05 int c, i;
06 float total = 0.0;
07 float freq;
09 // initialize our 26 counters to zero
10 float char_count[26];
11 for (i = 0; i < 26; i++) {
12 char_count[i] = 0.0;
13 }
15 // open our file
16 FILE *f = fopen(argv[1], "r");
18 while ( !feof(f) ) {
20 // check each character in the file
21 c = fgetc(f);
22 if ( (c >=65) && (c <= 90) ) {
24 // upper case letter
25 char_count[c - 65] += 1.0;
26 total += 1.0;
27 }
28 else if ( (c >= 97) && (c <= 122) ) {
30 // lower case letter
31 char_count[c - 97] += 1.0;
32 total += 1.0;
33 }
35 }
36 fclose(f);
38 // calculate the frequency of each character and print the result
39 for (i = 0; i < 26; i++) {
40 freq = char_count[i] / total * 100.0;
41 printf("%c: %.2f\n", (char)(i + 97), freq);
42 }
[+-] Toggle Line Numbers

Program Source: freq.c
Text Source: republic.txt
This text was acquired from Project Gutenberg, and
is distributed as per the license at the beginning of the text.

Compiling the example

From the command line:

gcc freq.c -o freq_c

Running the example

From the command line:

./freq_c republic.txt