20061213

Matrix Fun in Haskell

I have always thought that the snobbiest of programmers are those that use Haskell, which is a language so academic that I'm not sure if it's been used outside academia. Haskell is known for what it doesn't have. Haskell doesn't have the same loop structure. It doesn't have variables. It doesn't have the imperative design like most languages. I like Haskell. Any time I need to jump on a math program, I usually solve it in Haskell first, then the language I need second.

A friend of mine is a physics Ph.D. student who is trying to learn C. He wanted to write a program to find all of the 2-by-2 determinants found in a 4-by-4 matrix. There are 36 total. He had his own ideas on how to solve this, but I still think the quickest way is to brute force all of the possibilities.

The following patches of Haskell and C are almost identical. They use the exact same variables ("labels" for the Haskell code) to accomplish the same task. The Haskell code returns an array of floats, where the C code prints each of the 36 determinants.

twoByTwo :: [[Float]] -> [Float]
twoByTwo m = concat (map(\i->
                 concat (map(\j->
                     concat (map(\p->
                         map(\q->
                             (m!!i!!j)*(m!!p!!q)-(m!!i!!q)*(m!!p!!j)
                         )[(j+1)..(n-1)]
                     )[(i+1)..(n-1)])
                 )[0..(n-2)])
             )[0..(n-2)])
             where
                 n = length m
Yes, Haskell fans, there is a quadruple-nested lambda expression there, which does seem a little confusing. It's short and elegant, which is why Haskell reminds me of Python.
void twoByTwo(float *m, int n) {
    int i, j, p, q;
    for (i = 0; i < n-1; i++)
        for (j = 0; j < n-1; j++)
            for (p = i+1; p < n; p++)
                for (q = j+1; q < n; q++)
                    printf("%3.1f\n", m[i*n+j]*m[p*n+q] - m[i*n+q]*m[p*n+j]);
}
No one ever said C was pretty. I also wrote a recursive Matrix determinant algorithm in Haskell, but I may save that for another time.

20061121

A Review of MySpace Passwords using Ruby

Over on the site Reddit, someone posted a link to some 50,000+ MySpace user e-mails and passwords from a phishing site. I'm not going to post the link to those passwords. You'll have to find it yourself.

I wrote a quick Ruby script to pour through the text. Normally I would have done this in Perl, but Ruby seems to be the language du jour.

Here's a breakdown of things I noticed about people's passwords. I tallied the number of passwords that had a lowercase letter, uppercase letter, a number, or a symbol in the password itself. I also checked for usernames that were the same as their passwords, and passwords that followed the pattern of "word-then-number".

total: 52692
lowercase: 50818 (96.44%)
numbers: 42570 (80.79%)
symbols: 2959 (5.62%)
uppercase: 2330 (4.21%)
Same as username: 197 (0.37%)
Word-Number pattern: 35800 (67.94%)

Almost everyone is using lowercase letters in their password. 4 out of every 5 users are using numbers. Less than 6% of all users are using symbols or uppercase letters in their passwords. Almost 1/3rd of a percent of MySpace's users are dumb enough to use their e-mail's user name as their password. Two-thirds of users use a "word followed by a number" pattern for their password. If I were to write a password cracker, I would first try to hack accounts that followed this very common pattern. (I must admit: most of my own passwords follow this pattern.)

I went on to check for the most common word usage. This was done by stripping out any non-letter characters and dumping the results in a hash table.

password 334
a 299 (The single letter 'a' is commonly used with a sequence of numbers.)
soccer 285
iloveyou 273
fuckyou 173
love 139
abc 137
football 135
baseball 125
myspace 122

There are a few surprises in this list. Why on earth would anyone make their password "password"? For that matter, if you are on MySpace, why make your password "myspace". There seem to be a lot of sports fans on MySpace. "soccer","baseball", and "football" all appear in the top ten most common words. There is also a love/hate drama going on between many users. "love", "iloveyou" and "fuckyou" are common choices for passwords.

So where's the code?!?

#!/usr/bin/ruby

file = ARGV[0]

total     = 0
wordnum   = 0
numbers   = 0
lowercase = 0
uppercase = 0
symbol    = 0
samename  = 0

words = Hash.new

IO.foreach(file) do |line|

    line.strip!

    (user,password) = line.split /:/
    next if password == nil

    (name,domain)   = user.split /@/
    total = total + 1

    if password =~ /^[a-zA-Z]+[0-9]+$/
        wordnum = wordnum + 1
    end

    if password =~ /[0-9]/
        numbers = numbers + 1
    end

    if password =~ /[a-z]/
        lowercase = lowercase + 1
    end

    if password =~ /[A-Z]/
        uppercase = uppercase + 1
    end

    if password =~ /[`~!@#\$%^&*()_\+\-\\=]/
        symbol = symbol + 1
    end

    if password == name
        samename = samename + 1
    end

    lettersonly = password.gsub(/[^a-zA-Z]/, '')

    next if lettersonly =~ /^$/

    if false == words.key?(lettersonly)
        words[lettersonly] = 1
    else
        words[lettersonly] = words[lettersonly] + 1
    end

end

puts "total:               #{total}"
puts "numbers:             #{numbers}"
puts "lowercase:           #{lowercase}"
puts "uppercase:           #{uppercase}"
puts "symbols:             #{symbol}"
puts "Same as username:    #{samename}"
puts "Word-Number pattern: #{wordnum}"

words_ary = words.sort {|a,b| b[1]<=>a[1]}

10.times do |i|
    puts "#{words_ary[i][0]}  #{words_ary[i][1]}"
end

20061113

Bits of Java

Note: Today's coding gem is one that probably most Java programmers have experienced at least once.

A student of mine came to me with a problem from a different class she was taking. She had to write a program to generate truth tables for certain electrical gates. Because she's in my Java class, Java was her choice to complete the assignment. While I teach Java, I'm very unfamiliar with many of its low level aspects. This was a unique challenge for me.

These particular electrical gates required 4 binary inputs. How could I quickly generate all 16 possible binary combinations of 1's and 0's without resorting to spaghetti code? My first thought was to write a simple 'for' loop to generate the binary bits.

Solution #1:

    for (int i = 0; i < 16; i++) { // Combination Number
        for (int j = 0; j < 4; j++) { // Nth bit in combination
            int bit = (int) Math.pow(2,j);
            System.out.print( (i&bit)/bit );
        } // End For
        System.out.println("");
    } // End For

It just looks ugly, but it's mathematically correct (or at least plausible). We call "Math.pow", convert double to integer, there seems to be division taking place (but it's anyone's guess as to why), and we still have to perform the logical "and" (&). Bleh! How is anyone suppose to understand what's going on?

Solution #2:

    for (int i = 0; i < 16; i++) { // Combination Number
        for (int j = 0; j < 4; j++) { // Nth bit in combination
            System.out.print( (i>>>j)&1 );
        } // End For
        System.out.println("");
    } // End For

It's a little cleaner. There's a unsigned shift which is then compared and-wise to the number 1. It's only two operations, both of which happen at the binary level. Works for me. Too bad I figured this out after she came up with her own solution using large, ugly 2D-arrays.

20061102

The Impossible FFTW Bug

Ever had a computer bug that you spent weeks working on it? You might try every trick you know to solve it: debugging messages, profilers, Google, professors, coworkers, fellow students. You might try to explain the code to someone who doesn't even program just so you have to describe the logic in simple terms. Nothing seems to fix the problem.

I had this bug in my code for about two weeks. I had to write a program to compute the 2 dimensional FFT of an image. That's it. I'll just use FFTW, the most popular open source library for computing FFTs.

The Wrong Solution for computing FFTs with FFTW:

int j;
fftw_plan myplan;
fftw_complex in[x*y], out[x*y];

for (j = 0; j < x*y; j++) {
    in[j][0] = r[j];
    in[j][1] = i[j];
} // End For

myplan = (dir == 1)? fftw_plan_dft_2d(x, y, in, out, FFTW_FORWARD,  FFTW_MEASURE) :
                     fftw_plan_dft_2d(x, y, in, out, FFTW_BACKWARD, FFTW_MEASURE) ;

fftw_execute(myplan);

In these few lines of code lies a problem that I've spent weeks trying to find. There are no syntax errors, nor are there logic errors. Every time I ran this code, the result would be all zeros. If I ran it multiple times, I got the correct results for every execution after the first. I dropped the data into a prepared array, made a plan (as defined by the FFTW C library), then executed that library. Should work, right? Wrong.

The Correct Solution for computing FFTs with FFTW:

int j;
fftw_plan myplan;
fftw_complex in[x*y], out[x*y];

myplan = (dir == 1)? fftw_plan_dft_2d(x, y, in, out, FFTW_FORWARD,  FFTW_MEASURE) :
                     fftw_plan_dft_2d(x, y, in, out, FFTW_BACKWARD, FFTW_MEASURE) ;

for (j = 0; j < x*y; j++) {
    in[j][0] = r[j];
    in[j][1] = i[j];
} // End For

fftw_execute(myplan);

I found buried in the FFTW FAQ that I'm suppose to make a plan before giving it the data. I switched two sets of lines around and everything magically works for reasons I fail to understand. Let this be a lesson to anyone having to use FFTW.

20061024

vi Number Sequence Trick

I'm a vim user. Often times in my editing, I want to number some of the lines in a file from 1 to some ending number. It might be only a few lines, or it might be thousands. To save myself some carpal tunnel, I wrote one line of perl to do the job. This program will pass a block of lines in whatever file you are editing to a one line perl script which will number the lines starting with 1 and drop that text back into the editor:
:x,y!perl -ne'print "$.  $_";'
In this example, x is the starting line number and y is the ending line number. That's it!

20061015

File Character Frequency

In my machine learning course, our first project was to write a program to sort e-mail into nine class labels A through I. In my implementation after every e-mail was classified, I would write a single character to the screen. There were several thousand of e-mails that needed sorting, so I needed a program that displayed a final tally of the results.

I could have easily built something into the program I was writing, but I thought something more general purpose would be useful. I quickly wrote the following code. It takes a tally of each character in a file and then uses a quick sort to sort the data and then prints the results. I even added a switch to ignore whitespace characters.

#include <stdio.h>

/* Author:      James Church
 * Date:        09/11/06
 * Program:     charfreq
 * Version:     0.3
 * Description: Takes a single file from the command line as input and
 *              reports the character frequence of each character in order
 *              from most common to least common.
 *
 * Copyright © 2006 Free Software Foundation, Inc.
 *
 * Copying and distribution of this file, with or without modification,
 * are permitted in any medium without royalty provided the copyright
 * notice and this notice are preserved.
 */

#define CHARRANGE 256

typedef struct _charcount {
    unsigned char c;
    long          count;
} CharCount;

void quicksort (CharCount *a, int i, int j);
int  partition (CharCount *a, int i, int j);

int main(int argc, char **argv) {
    int            i;
    unsigned char  c;
    CharCount      freq[CHARRANGE];
    FILE          *file;
    int           ignore_whitespace = 0;
    int           argparse = 1;

    if (argc == 1) {
        printf("\nUsage: %s [-s] [filename] - Reports character frequence of file.\n", argv[0]);
        printf(" -s  Ignores whitespace (optional)\n");
        return 0;
    } // End If

    if (strcmp("-s", argv[argparse]) == 0) {
        ignore_whitespace = 1;
        argparse++;
    } // End If

    if ((file = fopen(argv[argparse], "r")) == NULL) {
        printf("Error: Cannot open %s\n", argv[1]);
        return 0;
    } // End If
    argparse++;

    for (i = 0; i < CHARRANGE; i++) {
        freq[i].c     = (unsigned char) i;
        freq[i].count = 0;
    } // End For

    while (1) {
        fread (&c, sizeof(unsigned char), 1, file);
        if (feof(file)) break;

        if (ignore_whitespace && (c == ' ' || c == '\t' || c == '\r' || c == '\n'))
            continue;

        freq[c].count++;
    } // End While

    quicksort(freq, 0, CHARRANGE-1);

    for (i = CHARRANGE-1; i >= 0; i--) {
        if (freq[i].count == 0) break;

        printf("%c[%3d]: %d\n", freq[i].c, freq[i].c, freq[i].count);
    } // End For

    fclose(file);
    return 0;
} // End Main

void quicksort (CharCount *a, int i, int j) {
    int p;

    if (i < j) {
        p = partition (a, i, j);
        quicksort (a, i, p-1);
        quicksort (a, p+1, j);
    } /* End If */
} /* End mergesort */

int partition (CharCount *a, int i, int j) {
    int val = a[i].count;
    int h   = i;
    int k;
    CharCount temp;

    for (k = i+1; k <= j; k++)
        if (a[k].count < val) {
            h++;
            temp = a[h];
            a[h] = a[k];
            a[k] = temp;
        } /* End If */

    temp = a[i];
    a[i] = a[h];
    a[h] = temp;
    return h;
} /* End partition */

Enjoy!

20061012

Finding Identical Files using bash and find

I wanted to start a blog of all the little bits of code featuring tricks that I've learned over the years. I don't know how often I'll update this blog, but any time I use a bit of code to complete a complicated task, I'll describe the task and show the code used to solve it. My first example will be a identical file finder. In my class on Java programming that I teach, I suspected two students of turning in the exact same work. As I was grading, I had seen some identical code elsewhere, but I couldn't remember. I keep all the students work in different directories on my office computer just in case I need to look at their old homework solutions. Rather than look at every solution manually for an identical file, I wrote this little bash script to do the work for me:
#!/bin/bash

if [ $# -eq 2 ]
then
    for i in $(find . -name $1); do diff -s $i $2 | grep -v differ; done
else
    echo "USAGE: findIdent [SOME FILE EXPRESSION] [SOME FILE]"
fi
This script "findIdent" takes two arguments: a file pattern (say... "*.mp3") and the file that you want to see if duplicates exist. So did I find any duplicates of students' work? No. Turns out I just found some very similar code and it was nothing to worry about. But I kept this script in case I ever need it again.