Find all the files in a directory.

When I find a comic amusing, I grab a screenshot of it. I thought it would be nice to display one comic on my home page each day, so I wrote a little code to look in the comics directory and find all the files, then randomly choose one to display.

There are a few interesting things about the code. First up is scope. Notice that I initialized the two arrays outside of the while loop. That way when I add elements to the arrays, they are not local to the loop. Second, notice that I don’t use an index to add to the arrays. In PHP the preferred way to add an element to an array is with the $arrayName[] notation. Third, I define $location to be the directory handle of the directory. Since I assign it in an if statement, if it fails to find the directory, none of the rest of the code runs. Fourth, I use a similar pattern to read the files in the directory. I read files until there aren’t any more. Finally, I use a flag in my URL so I can proof all of the comics— $all = $_GET[‘all’];.


<?php

$title = array();
$comicsLst = array();

$title = array();
$comicsLst = array();

if ($location = opendir('./Comics')) {

    while (false !== ($entry = readdir($location))) {
   
        if ($entry != "." && $entry != "..") {

            $titleArray = explode(".", $entry);
            $title[] = $titleArray[0];
            $comicsLst[] = $entry;
        }
    }
    closedir($location);
   
    $numComics = count($comicsLst);

    $randomComic = rand(0,$numComics);

    $all  = $_GET['all'];

    if (is_null($all)) {
        echo "<p>";
        echo "<img class='align-left' src='/Comics/$comicsLst[$randomComic]' ";
        echo "alt='$title[$randomComic]' title='$title[$randomComic]' /> ";
        echo "</p>";
        /*
        echo "<p class='attribution'>";
        echo "<a href='index.php?p=Comics&all=y'>Display all comics.</a>";
        echo "</p>";
        */
    } else {

        for($i=0; $i<$numComics; $i++) {
            echo "<p>";
            echo "$title[$i]<br />";
            echo "<img class='align-left' src='/Comics/$comicsLst[$i]' alt='$title[$i]' /> ";
            echo "</p>";
        }
    }
}

I frequently find new comics and add them to my local folder. To keep them synchronized with the server, I wrote a little rsync script.


#!/bin/bash
rsync -a -H -vv -z -l --update --delete --exclude \ Comics --exclude .DS_Store ~wellgolly/Documents/Comics/ \
wellgolly@ wellgolly:/www/WellGolly/Comics/ > \
~ wellgolly/Sites/rsync-backup-comics-`date +%F`.log

Remove all the files in a directory with a specific name

I have a web app that generates temporary files as part of a shell script. The files start with ‘frequency’ and have a hash appended. Sometimes the app doesn’t clear them out. This line does it. At some point I need to put this in a cron job, but I haven’t done it yet.


sudo find /tmp -iname "frequency*" -exec rm -f {} \;

By the way, I think the reason that the files are not cleared out is that they are generated by SpamBots. The code for clearing temporary files is on the display page. It never executes because the page that displays the results is never loaded.

Things I can’t remember: More GREP goodness

When cleaning up mailing lists, I often need to remove the zip+4 info. It is usually at the end of a line so this grep works in BBEdit to find them.


-[0-9]{4}$

Look for a dash followed by the numbers 0-9 four times and then an end of line.

Working on a word list, we needed to find and delete all of the words with 1, 2, or 3 letters. So we use this code switching out the 1 for 2 and 3. Note the ^ and $. This means that we start at the beginning of the line look for the pattern and nothing else until the end of the line.


^[a-zA-Z]{1}$

We could use the same technique to find all of the words with more than 5 letters, but then we’d have to do a bunch of searches. Instead we used this.


[a-zA-Z]{5}[a-zA-Z]

Here we look for pattern of five letters and then look for one more. Not to belabor the point, but suppose together is in the word list. It has five letters, toget and one more h. It also happens to have some more letters after that, but we don’t particularly care. We just care that there are more than 5.

Processing remove requests and bad email addresses in our customer list.

Update 2016-05-25. I found a much easier way.

Remove Requests

We don’t like to bother our customers with lots of emails, but from time to time we let them know about new products and sales. Some of them use the remove link on the bottom of the email to unsubscribe from the list. I have a rule in Apple Mail that automatically routes the remove response to a folder. We usually get a couple of remove requests each time we send out a mailing and we usually process them manually. I wanted to automate the process a bit and create a master list of email addresses that do not want our mailings. That way if someone orders from us under a different name or address, we won’t be sending them email if they opted out earlier.

After looking around in the Library/Mail folder it looks like our Remove folder is located at


 ~/Library/Mail/V2/Mailboxes/Remove.mbox/

I CD to that folder and then redirect the results of a recursive grep command to a file on the desktop. Since they are responding to our email, I look for the From: portion of the email. Note the period after the search term. I means to look for all files in the current directory. The -r says to look in all files in directories below this one too.


grep -r "From: " . > ~/Desktop/Remove.txt

Then I open the file in BBEdit and remove extra lines, e.g. anything with our company name.

I only want the email addresses, so I can use this grep line to remove everything before the address.


.*<

I can remove everything after the address with this.

>.*

Sort the lines and process duplicates and you’re done. I then import the email addresses into my MySQL database.

Bad Domain

Our mail server will try to find missing domains for a few days and generates warning messages. When it finally gives up it generates a message with the subject “Returned mail: see transcript for details”. The nice thing about these messages is that the failed address is easy to find. It looks like this:


The following address(es) failed:

 abby612@earthink.net
   retry timeout exceeded

In this case it probably couldn’t find the server because it was looking for earthink but the email was probably to earthlink. When I pull them out of the failure message, I correct obvious mis-typing before adding them to the bad email database.
The code to find the addresses is:


grep -r "^ .*@.*$" . > ~/Desktop/Failed.txt

The code looks for lines starting with a space, then an email address, then the end of the line. There are surprisingly few false positives e.g. lines that contain other things than just an email address.

Bad Addresses

It’s much harder to remove the bad email addresses since the format from different email providers varies tremendously. I ended up looking for lines that contain an “@” and doing a lot of manual cleanup. I’ll see if I can figure out a better way next month.