Find all the files in a directory.

When I find a comic amusing, I grab a screenshot of it. I thought it would be nice to display one comic on my home page each day, so I wrote a little code to look in the comics directory and find all the files, then randomly choose one to display.

There are a few interesting things about the code. First up is scope. Notice that I initialized the two arrays outside of the while loop. That way when I add elements to the arrays, they are not local to the loop. Second, notice that I don’t use an index to add to the arrays. In PHP the preferred way to add an element to an array is with the $arrayName[] notation. Third, I define $location to be the directory handle of the directory. Since I assign it in an if statement, if it fails to find the directory, none of the rest of the code runs. Fourth, I use a similar pattern to read the files in the directory. I read files until there aren’t any more. Finally, I use a flag in my URL so I can proof all of the comics— $all = $_GET[‘all’];.


<?php

$title = array();
$comicsLst = array();

$title = array();
$comicsLst = array();

if ($location = opendir('./Comics')) {

    while (false !== ($entry = readdir($location))) {
   
        if ($entry != "." && $entry != "..") {

            $titleArray = explode(".", $entry);
            $title[] = $titleArray[0];
            $comicsLst[] = $entry;
        }
    }
    closedir($location);
   
    $numComics = count($comicsLst);

    $randomComic = rand(0,$numComics);

    $all  = $_GET['all'];

    if (is_null($all)) {
        echo "<p>";
        echo "<img class='align-left' src='/Comics/$comicsLst[$randomComic]' ";
        echo "alt='$title[$randomComic]' title='$title[$randomComic]' /> ";
        echo "</p>";
        /*
        echo "<p class='attribution'>";
        echo "<a href='index.php?p=Comics&all=y'>Display all comics.</a>";
        echo "</p>";
        */
    } else {

        for($i=0; $i<$numComics; $i++) {
            echo "<p>";
            echo "$title[$i]<br />";
            echo "<img class='align-left' src='/Comics/$comicsLst[$i]' alt='$title[$i]' /> ";
            echo "</p>";
        }
    }
}

I frequently find new comics and add them to my local folder. To keep them synchronized with the server, I wrote a little rsync script.


#!/bin/bash
rsync -a -H -vv -z -l --update --delete --exclude \ Comics --exclude .DS_Store ~wellgolly/Documents/Comics/ \
wellgolly@ wellgolly:/www/WellGolly/Comics/ > \
~ wellgolly/Sites/rsync-backup-comics-`date +%F`.log

Remove all the files in a directory with a specific name

I have a web app that generates temporary files as part of a shell script. The files start with ‘frequency’ and have a hash appended. Sometimes the app doesn’t clear them out. This line does it. At some point I need to put this in a cron job, but I haven’t done it yet.


sudo find /tmp -iname "frequency*" -exec rm -f {} \;

By the way, I think the reason that the files are not cleared out is that they are generated by SpamBots. The code for clearing temporary files is on the display page. It never executes because the page that displays the results is never loaded.

Things I can’t remember: More GREP goodness

When cleaning up mailing lists, I often need to remove the zip+4 info. It is usually at the end of a line so this grep works in BBEdit to find them.


-[0-9]{4}$

Look for a dash followed by the numbers 0-9 four times and then an end of line.

Working on a word list, we needed to find and delete all of the words with 1, 2, or 3 letters. So we use this code switching out the 1 for 2 and 3. Note the ^ and $. This means that we start at the beginning of the line look for the pattern and nothing else until the end of the line.


^[a-zA-Z]{1}$

We could use the same technique to find all of the words with more than 5 letters, but then we’d have to do a bunch of searches. Instead we used this.


[a-zA-Z]{5}[a-zA-Z]

Here we look for pattern of five letters and then look for one more. Not to belabor the point, but suppose together is in the word list. It has five letters, toget and one more h. It also happens to have some more letters after that, but we don’t particularly care. We just care that there are more than 5.

Processing remove requests and bad email addresses in our customer list.

Update 2016-05-25. I found a much easier way.

Remove Requests

We don’t like to bother our customers with lots of emails, but from time to time we let them know about new products and sales. Some of them use the remove link on the bottom of the email to unsubscribe from the list. I have a rule in Apple Mail that automatically routes the remove response to a folder. We usually get a couple of remove requests each time we send out a mailing and we usually process them manually. I wanted to automate the process a bit and create a master list of email addresses that do not want our mailings. That way if someone orders from us under a different name or address, we won’t be sending them email if they opted out earlier.

After looking around in the Library/Mail folder it looks like our Remove folder is located at


 ~/Library/Mail/V2/Mailboxes/Remove.mbox/

I CD to that folder and then redirect the results of a recursive grep command to a file on the desktop. Since they are responding to our email, I look for the From: portion of the email. Note the period after the search term. I means to look for all files in the current directory. The -r says to look in all files in directories below this one too.


grep -r "From: " . > ~/Desktop/Remove.txt

Then I open the file in BBEdit and remove extra lines, e.g. anything with our company name.

I only want the email addresses, so I can use this grep line to remove everything before the address.


.*<

I can remove everything after the address with this.

>.*

Sort the lines and process duplicates and you’re done. I then import the email addresses into my MySQL database.

Bad Domain

Our mail server will try to find missing domains for a few days and generates warning messages. When it finally gives up it generates a message with the subject “Returned mail: see transcript for details”. The nice thing about these messages is that the failed address is easy to find. It looks like this:


The following address(es) failed:

 abby612@earthink.net
   retry timeout exceeded

In this case it probably couldn’t find the server because it was looking for earthink but the email was probably to earthlink. When I pull them out of the failure message, I correct obvious mis-typing before adding them to the bad email database.
The code to find the addresses is:


grep -r "^ .*@.*$" . > ~/Desktop/Failed.txt

The code looks for lines starting with a space, then an email address, then the end of the line. There are surprisingly few false positives e.g. lines that contain other things than just an email address.

Bad Addresses

It’s much harder to remove the bad email addresses since the format from different email providers varies tremendously. I ended up looking for lines that contain an “@” and doing a lot of manual cleanup. I’ll see if I can figure out a better way next month.

Non-Hackers as Founders

This is a continuation of my thoughts about founders that I started in the post on Women as Founders. There are literally thousands of startups out there, most of which you’ve never heard of—either because the solve a problem that you don’t have or because they never got the number of users required to be a sustainable business. I started at the beginning (2005) and looked through the list of startups funded by Y-Combinator. Looking through the list from the first few years, most of the startups solved problems that other hackers had. Part of this is no doubt due to the fact the Y-Combinator was founded by successful hackers. But part of it was not doubt due to the fact that the founders were solving a problem that they had. Most of the startups seem to be other solving technical problems, website design, secure payments, photo editing. Or they solve a communication problem, or social connection problem.

I’ve started several web-based companies. The ones that failed to get up and running did so because I was relying on other people to do the actual development. In 2001 my cousin was getting married and I thought it would be good to put up a website where people could share photos of the bride and groom, exchange anecdotes, and coordinate travel plans. There was even an obvious revenue model—link to hotels and gift-registeries and receive a commission on purchases. We could expand the site to other special events like reunions, golf-events, fund-raising walks, etc. The opportunities seemed wide open.

While I knew a little about programming a website, I didn’t know enough to do the whole thing. So I enlisted a partner to do the coding. The thing about coding is that even if you have a detailed specification, which we didn’t have, there are lots of decisions that get made on a regular basis that affect the final product. And every decision that the coder made was different from my vision. A lot of the things were things that just never occurred to the programmer. For example, if you have several pages on the site that start with a photo, they need to be in exactly the same place on every page. Otherwise when you move from page to page, the pages jump all over the place. And back then, most people were still using small monitors, so we couldn’t design a page that was wider than around 600 pixels or else people would have to scroll sideways to see it. Likewise, we needed to stick with a 65,536 color palette for most of the design. At the end I literally spent more time trying to persuade the coder to do things the way I wanted than they spent coding. I would have been better off spending the time learning how to code myself and then coding things the way they were supposed to be coded.

And I think that’s the difference between a coder and a hacker. The coder just wants to get the project done. The hacker wants to get it done right.

Since then, I have never let someone else be the coder on my projects. I will pay people to help me out when I am first learning a new language, but I do my own coding. And none of my projects have failed because they didn’t get built.

That’s not to say it couldn’t work. I can see how having a non-coding founder would be great for lots of startups, especially those that rely on face-to-face marketing or lots of back-end coordination with other businesses. But the kinds of startups that require a lot of coding seem to me to require hackers as founders.