20080829
20071205
Perl's Diamond Operator in Python
I'm been a full Python convert for over a year now. Perl is just too cumbersome at times. I think the artist who draws XKCD has made the same choice. The one thing that still takes me back to Perl is the ultra-handy <> "diamond operator". You can't deny Perl's dominance when it comes to string processing. Look at this elegance:
#!/usr/bin/perl while (<>) { print }
Every time I encounter a string processing problem where I have to work with multiple files, I go back to Perl. I didn't think Python could do this. Python should have the same ability as Perl's diamond:
def diamond(): """ diamond() Pre: Nothing. This is my attempt to recreate the very handy diamond operator found in Perl, which is my only reason for still using that language. To use the code: for line in diamond(): process(line) """ import sys if len(sys.argv[1:]) > 0: for file in sys.argv[1:]: if file == '-': for line in sys.stdin: yield line else: f = open(file, 'r') for line in f: yield line f.close() else: for line in sys.stdin: yield line
What does this do? It checks the sys.argv variable for file names on the command line and iterates through each line of each file. If it can't find a file, it defaults to standard input.
Of course, before I start to write code, I should always use Google first. There exist a module called "fileinput" which does the same thing.
D'oh! Hey... it's practice.
20071202
A Losing Strategy
After reading this story, I wrote a quick Python script to demonstrate this game. The problem description isn't definitive on things like the exact probabilities of the coin flips, so we have to make a few assumptions. I set the probability of the coin flip in Game A to p1=0.33, and in Game B, the probability of coin1 to p=0.1 and coin2 to p=0.66. These probabilities are pulled from the air, but they do meet the requirement set forth in the problem description, so they should work. So the two games alternating should outperform both Game A and Game B when run indivdually. Let's run the program a few times to see if a combination of Games A and B are better than Game A or Game B.The paradox is illustrated by two games played with coins weighted on one side so that they will not fall by chance to heads or tails.
In game A, a player tosses a single loaded coin and bets on each throw. The probability of winning is less than half.
In game B, there are two coins and the rules are more complicated. The player tosses either Coin 1, loaded to lose almost all the time or Coin 2 loaded to win more than half the time. He plays Coin 1 if his money is a multiple of a particular whole number, like three.
If his money cannot be divided evenly by that number, he plays Coin 2. In this setup, the second coin will be played more often than the first.
[jcchurch@mcart python]$ python badgames.py
Game A, 1000 runs:
Before: 1000 MONEY
After: 674 MONEY
Game B, 1000 runs:
Before: 1000 MONEY
After: 978 MONEY
Alternating: 1000 runs:
Before: 1000 MONEY
After: 850 MONEY
[jcchurch@mcart python]$ python badgames.py
Game A, 1000 runs:
Before: 1000 MONEY
After: 734 MONEY
Game B, 1000 runs:
Before: 1000 MONEY
After: 890 MONEY
Alternating: 1000 runs:
Before: 1000 MONEY
After: 870 MONEY
[jcchurch@mcart python]$ python badgames.py
Game A, 1000 runs:
Before: 1000 MONEY
After: 636 MONEY
Game B, 1000 runs:
Before: 1000 MONEY
After: 932 MONEY
Alternating: 1000 runs:
Before: 1000 MONEY
After: 820 MONEY
[jcchurch@mcart python]$ python badgames.py
Game A, 1000 runs:
Before: 1000 MONEY
After: 604 MONEY
Game B, 1000 runs:
Before: 1000 MONEY
After: 918 MONEY
Alternating: 1000 runs:
Before: 1000 MONEY
After: 846 MONEY
I just ran the problem described in the New York Times 4 times and got 4 negative results. Game B always outperforms Game A and the combined Games A and B. This article looked very interesting and very improbable. If it's too good to be true, check for yourself.
20070908
Three Parts of Division
I'm back to solving problems on Project Euler. I've solved the easy ones, and now I'm getting around to the medium difficulty problems. I like #26.
Problem 26: Find the value of d < 1000 for which 1/d contains the longest recurring cycle in its decimal fraction part.
That's kinda interesting. Every time you divide two numbers, there are three parts: The integer part, the non-repeating decimal part, and the repeating decimal part.
I wrote up this little piece of Python to compute the three parts. It is a forced base-10 long division calculator, and it makes good use of Python's stings doubling as list, which is a feature that I always thought was nice about Python.
def division(num, den): ipart = str(num / den); fpart = "" rpart = "" seen = [ num%den ] while num % den != 0: num = (num%den) * 10 multiple = num / den fpart += str(multiple) num = num - (den * multiple) c = 0 quit = False for i in seen: if num == i: # This number has been seen before. rpart = fpart[c:] # Make characters from c to end of list part of repeating part fpart = fpart[:c] # Truncate fpart quit = True break c = c + 1 if quit: break seen += [num] return [ipart, fpart, rpart]
This method accepts two arguments: An integer numerator and an integer denominator. The return is a list with three items: The whole number part, the non-repeating decimal part, and the repeating decimal part. Each part is a string data type (because sometimes these parts can be excessively long for python's Integer type.)
>>> division(1,4) ['0', '25', ''] >>> division(1,6) ['0', '1', '6'] >>> division(1,7) ['0', '', '142857']
20070425
Converting ISBN13 to ISBN10 in PHP
I've been writing a textbook buyback system for a buddy of mine. He runs a service that purchases textbooks from students at the end of a school semester. He wanted me to write a program to query Amazon.com and estimate the best value to offer a student for his or her book.
As of January 1, 2007, the agency that controls ISBN codes for all books printed in the world has changed their code format from 10 digits (regular expression /[0-9Xx]{10}/) to 13 digits (regular expression /[0-9]{13}/). Amazon has not updated their Web E-Commerce API to accommodate this change in standard. Thus, any time I get an 13-digits ISBN, I must convert it to the older 10-digits standard.
When "Googling" for solutions to this problem, I found several, but I decided to write this one myself.
// Converts ISBN-13 to ISBN-10 // Leaves ISBN-10 numbers (or anything else not matching 13 consecutive numbers) alone function ISBN13toISBN10($isbn) { if (preg_match('/^\d{3}(\d{9})\d$/', $isbn, $m)) { $sequence = $m[1]; $sum = 0; $mul = 10; for ($i = 0; $i < 9; $i++) { $sum = $sum + ($mul * (int) $sequence{$i}); $mul--; } $mod = 11 - ($sum%11); if ($mod == 10) { $mod = "X"; } else if ($mod == 11) { $mod = 0; } $isbn = $sequence.$mod; } return $isbn; }
20070322
Handling Multiple MySQL Queries - "AS" labels
I'm working on a project for the School of Pharmacy on campus. They wanted me to write a program that pulls numbers from a database on Marijuana into a nicely formatted report that they could print off and send to government agencies that are interested in the data. One of the reports requires me to pull hundreds of individually calculated items from a MySQL database. A single MySQL query isn't able to do the job. When this report is completed, there will probably 75 to 100 queries. To think that this quarterly report has been done manually for almost 2 decades overwhelms me as I try to write this report software. I'm trying to reduce several weeks of work to prepare this report to a few seconds.
The entire project is done in PHP and MySQL. PHP and MySQL go hand-in-hand, but every time I write a query, I can't help but wonder how this could be made easier. Pulling one value out of a query looks like spaghetti code. Pulling 300 values from 100 different queries is an explosion of crappy code. MySQL queries required for this report look like this:
SELECT THC FROM samples HAVING MAX(THC);
This is just one query that will find the one sample in the database with the highest THC composition and report what that composition value is. There are hundreds more queries in the report similar to this one. Typically, when you run a MySQL query from PHP, here are the steps you have to take:
- Format the query command.
- Execute the query command.
- Retrieve the first row of results from the successful execution (usually in the form of an associative array).
- Transfer those values from the array to variables that actually mean something to the nature of the program (like $max_thc_composition).
If I have 100 queries, and I have to perform the same 4 steps for each query, that's 400 lines of code. Obviously, as any freshmen computer science instructor will tell you (because I am one), this needs to be regulated to a function. But how do you specify a query (which can return more than one value) and the variables associated with those results in a concise manner? If the function doesn't know the variables names going into the function, it can't assign those values coming out of the function.
SELECT THC AS max_thc_composition FROM samples HAVING MAX(THC);
The "AS" expression in MySQL will rename a column into something else. It can also be used to uniquely label a query. This may not seem useful in a single call, but when you have a few 100 calls to make, it's a real time saver.
//runQueries: Runs multiple queries. // Pre: An array of MySQL single-result queries where // each result has a unique "AS" expression. // Post: An associative array populated with all of those values. function runQueries($sqls) { $report = array(); //Execute each SQL query and drop the result in the report array foreach ($sqls as $query) { $result = mysql_query($query) or die("Query Failed: $query Reason: ".mysql_error()); $row = mysql_fetch_assoc($result); // Essentially, for each "AS" expression in the query, // look for that result in row returned and pull that value out if it exist. // If it does not exist, just give it a value of 0. preg_match_all("/ AS (\w+)/", $query, $keys); foreach ($keys[1] as $key) if (isset($row[$key])) $report[$key] = $row[$key]; else $report[$key] = 0; } return $report; }
This function takes an array of MySQL single-result query strings, where each result has an "AS" label identifying what this result means. This method executes each query and drops the result into a associative array where the "AS" labels double as the array keys. And of course, it uses a regular expression.
I'm suppose to be teaching a class on Python programming in the fall. Have you noticed that I haven't featured a Python script in this blog series yet? That bothers me.
20070222
Finding lost files
So a student e-mails me today saying that he doesn't have a grade recorded for one of his programming assignments. I check his folder of turned-in assignments and I can't find it.
Who knows? Maybe I just dropped his homework in a different folder by mistake. Checking through everyone's folders for his will take me a while. I make all of my students put their e-mail address in the comments of their code, so I just run a search on all the *.java files on my computer that have a certain e-mail address with this handy line:
for i in $(find . -name *.java); do grep -H 'emailName' $i; done
I still can't find his homework.