20080829

Yo.

Still there?

20071205

Perl's Diamond Operator in Python

I'm been a full Python convert for over a year now. Perl is just too cumbersome at times. I think the artist who draws XKCD has made the same choice. The one thing that still takes me back to Perl is the ultra-handy <> "diamond operator". You can't deny Perl's dominance when it comes to string processing. Look at this elegance:

#!/usr/bin/perl

while (<>) {
    print
}

Every time I encounter a string processing problem where I have to work with multiple files, I go back to Perl. I didn't think Python could do this. Python should have the same ability as Perl's diamond:

def diamond():
    """
    diamond()
    Pre: Nothing.
    This is my attempt to recreate the very handy diamond operator found in
    Perl, which is my only reason for still using that language.

    To use the code:
    for line in diamond():
        process(line)
    """
    import sys
    if len(sys.argv[1:]) > 0:
        for file in sys.argv[1:]:
            if file == '-':
                for line in sys.stdin:
                    yield line
            else:
                f = open(file, 'r')
                for line in f:
                    yield line
                f.close()
    else:
        for line in sys.stdin:
            yield line

What does this do? It checks the sys.argv variable for file names on the command line and iterates through each line of each file. If it can't find a file, it defaults to standard input.

Of course, before I start to write code, I should always use Google first. There exist a module called "fileinput" which does the same thing.

D'oh! Hey... it's practice.

20071202

A Losing Strategy

Today I ran across a year 2000 article in the New York Times titled "Paradox in Game Theory: Losing Strategy That Wins." Given two games which lose most of the time, a strategy can be devised so that we can win most of the time. The article has a (fairly) detailed description of two example games:

The paradox is illustrated by two games played with coins weighted on one side so that they will not fall by chance to heads or tails.

In game A, a player tosses a single loaded coin and bets on each throw. The probability of winning is less than half.

In game B, there are two coins and the rules are more complicated. The player tosses either Coin 1, loaded to lose almost all the time or Coin 2 loaded to win more than half the time. He plays Coin 1 if his money is a multiple of a particular whole number, like three.

If his money cannot be divided evenly by that number, he plays Coin 2. In this setup, the second coin will be played more often than the first.

After reading this story, I wrote a quick Python script to demonstrate this game. The problem description isn't definitive on things like the exact probabilities of the coin flips, so we have to make a few assumptions. I set the probability of the coin flip in Game A to p1=0.33, and in Game B, the probability of coin1 to p=0.1 and coin2 to p=0.66. These probabilities are pulled from the air, but they do meet the requirement set forth in the problem description, so they should work. So the two games alternating should outperform both Game A and Game B when run indivdually. Let's run the program a few times to see if a combination of Games A and B are better than Game A or Game B.

[jcchurch@mcart python]$ python badgames.py 
Game A, 1000 runs:
  Before: 1000 MONEY
  After:  674 MONEY

Game B, 1000 runs:
  Before: 1000 MONEY
  After:  978 MONEY

Alternating: 1000 runs:
  Before: 1000 MONEY
  After:  850 MONEY

[jcchurch@mcart python]$ python badgames.py 
Game A, 1000 runs:
  Before: 1000 MONEY
  After:  734 MONEY

Game B, 1000 runs:
  Before: 1000 MONEY
  After:  890 MONEY

Alternating: 1000 runs:
  Before: 1000 MONEY
  After:  870 MONEY

[jcchurch@mcart python]$ python badgames.py 
Game A, 1000 runs:
  Before: 1000 MONEY
  After:  636 MONEY

Game B, 1000 runs:
  Before: 1000 MONEY
  After:  932 MONEY

Alternating: 1000 runs:
  Before: 1000 MONEY
  After:  820 MONEY

[jcchurch@mcart python]$ python badgames.py 
Game A, 1000 runs:
  Before: 1000 MONEY
  After:  604 MONEY

Game B, 1000 runs:
  Before: 1000 MONEY
  After:  918 MONEY

Alternating: 1000 runs:
  Before: 1000 MONEY
  After:  846 MONEY
I just ran the problem described in the New York Times 4 times and got 4 negative results. Game B always outperforms Game A and the combined Games A and B. This article looked very interesting and very improbable. If it's too good to be true, check for yourself.

20070908

Three Parts of Division

I was going to update this blog at some point.

I'm back to solving problems on Project Euler. I've solved the easy ones, and now I'm getting around to the medium difficulty problems. I like #26.

Problem 26: Find the value of d < 1000 for which 1/d contains the longest recurring cycle in its decimal fraction part.

That's kinda interesting. Every time you divide two numbers, there are three parts: The integer part, the non-repeating decimal part, and the repeating decimal part.

I wrote up this little piece of Python to compute the three parts. It is a forced base-10 long division calculator, and it makes good use of Python's stings doubling as list, which is a feature that I always thought was nice about Python.

def division(num, den):
    ipart = str(num / den);
    fpart = ""
    rpart = ""
    seen  = [ num%den ]
    while num % den != 0:
        num        = (num%den) * 10
        multiple   = num / den
        fpart     += str(multiple)
        num        = num - (den * multiple)
        c = 0
        quit = False
        for i in seen:
            if num == i: # This number has been seen before.
                rpart = fpart[c:] # Make characters from c to end of list part of repeating part
                fpart = fpart[:c] # Truncate fpart
                quit = True
                break
            c = c + 1
        if quit:
            break
        seen += [num]
    return [ipart, fpart, rpart]

This method accepts two arguments: An integer numerator and an integer denominator. The return is a list with three items: The whole number part, the non-repeating decimal part, and the repeating decimal part. Each part is a string data type (because sometimes these parts can be excessively long for python's Integer type.)
>>> division(1,4)
['0', '25', '']
>>> division(1,6)
['0', '1', '6']
>>> division(1,7)
['0', '', '142857']

20070425

Converting ISBN13 to ISBN10 in PHP

I've been writing a textbook buyback system for a buddy of mine. He runs a service that purchases textbooks from students at the end of a school semester. He wanted me to write a program to query Amazon.com and estimate the best value to offer a student for his or her book.

As of January 1, 2007, the agency that controls ISBN codes for all books printed in the world has changed their code format from 10 digits (regular expression /[0-9Xx]{10}/) to 13 digits (regular expression /[0-9]{13}/). Amazon has not updated their Web E-Commerce API to accommodate this change in standard. Thus, any time I get an 13-digits ISBN, I must convert it to the older 10-digits standard.

When "Googling" for solutions to this problem, I found several, but I decided to write this one myself.

// Converts ISBN-13 to ISBN-10
// Leaves ISBN-10 numbers (or anything else not matching 13 consecutive numbers) alone
function ISBN13toISBN10($isbn) {
    if (preg_match('/^\d{3}(\d{9})\d$/', $isbn, $m)) {
        $sequence = $m[1];
        $sum = 0;
        $mul = 10;
        for ($i = 0; $i < 9; $i++) {
            $sum = $sum + ($mul * (int) $sequence{$i});
            $mul--;
        }
        $mod = 11 - ($sum%11);
        if ($mod == 10) {
            $mod = "X";
        }
        else if ($mod == 11) {
            $mod = 0;
        }
        $isbn = $sequence.$mod;
    }
    return $isbn;
}

20070322

Handling Multiple MySQL Queries - "AS" labels

I'm working on a project for the School of Pharmacy on campus. They wanted me to write a program that pulls numbers from a database on Marijuana into a nicely formatted report that they could print off and send to government agencies that are interested in the data. One of the reports requires me to pull hundreds of individually calculated items from a MySQL database. A single MySQL query isn't able to do the job. When this report is completed, there will probably 75 to 100 queries. To think that this quarterly report has been done manually for almost 2 decades overwhelms me as I try to write this report software. I'm trying to reduce several weeks of work to prepare this report to a few seconds.

The entire project is done in PHP and MySQL. PHP and MySQL go hand-in-hand, but every time I write a query, I can't help but wonder how this could be made easier. Pulling one value out of a query looks like spaghetti code. Pulling 300 values from 100 different queries is an explosion of crappy code. MySQL queries required for this report look like this:

SELECT THC FROM samples HAVING MAX(THC);

This is just one query that will find the one sample in the database with the highest THC composition and report what that composition value is. There are hundreds more queries in the report similar to this one. Typically, when you run a MySQL query from PHP, here are the steps you have to take:

  1. Format the query command.
  2. Execute the query command.
  3. Retrieve the first row of results from the successful execution (usually in the form of an associative array).
  4. Transfer those values from the array to variables that actually mean something to the nature of the program (like $max_thc_composition).

If I have 100 queries, and I have to perform the same 4 steps for each query, that's 400 lines of code. Obviously, as any freshmen computer science instructor will tell you (because I am one), this needs to be regulated to a function. But how do you specify a query (which can return more than one value) and the variables associated with those results in a concise manner? If the function doesn't know the variables names going into the function, it can't assign those values coming out of the function.

SELECT THC AS max_thc_composition FROM samples HAVING MAX(THC);

The "AS" expression in MySQL will rename a column into something else. It can also be used to uniquely label a query. This may not seem useful in a single call, but when you have a few 100 calls to make, it's a real time saver.

    //runQueries: Runs multiple queries.
    // Pre: An array of MySQL single-result queries where
    // each result has a unique "AS" expression.
    // Post: An associative array populated with all of those values.
    function runQueries($sqls) {
        $report = array();
        //Execute each SQL query and drop the result in the report array
        foreach ($sqls as $query) {
            $result = mysql_query($query) or die("Query Failed: $query Reason: ".mysql_error());
            $row    = mysql_fetch_assoc($result);

            // Essentially, for each "AS" expression in the query,
            // look for that result in row returned and pull that value out if it exist.
            // If it does not exist, just give it a value of 0.
            preg_match_all("/ AS (\w+)/", $query, $keys);
            foreach ($keys[1] as $key)
                if (isset($row[$key]))
                    $report[$key] = $row[$key];
                else
                    $report[$key] = 0;
        }
        return $report;
    }

This function takes an array of MySQL single-result query strings, where each result has an "AS" label identifying what this result means. This method executes each query and drops the result into a associative array where the "AS" labels double as the array keys. And of course, it uses a regular expression.

I'm suppose to be teaching a class on Python programming in the fall. Have you noticed that I haven't featured a Python script in this blog series yet? That bothers me.

20070222

Finding lost files

So a student e-mails me today saying that he doesn't have a grade recorded for one of his programming assignments. I check his folder of turned-in assignments and I can't find it.

Who knows? Maybe I just dropped his homework in a different folder by mistake. Checking through everyone's folders for his will take me a while. I make all of my students put their e-mail address in the comments of their code, so I just run a search on all the *.java files on my computer that have a certain e-mail address with this handy line:

for i in $(find . -name *.java); do grep -H 'emailName' $i; done
(Replace 'emailName' with the descriptive feature of the file you are looking for.)

I still can't find his homework.