Organizing Memories

So many memories we’ve forgotten

When was the last time you had pictures printed? How many pictures do you have on your phone or tablet? Do you even remember what pictures you have anymore or where they’re from?

My wife’s grandmother passed away 2 years ago. Her grandfather, John, and his wife were prolific picture takers and frequent vacationers. Over their last 5 years or so they had amassed several thousand pictures on various devices. Most of these were on two iPad’s, the balance being spread across a few CDs and usb flash drives.

So the problem was presented by the family to me. We have all these pictures, we don’t know what is a duplicate of what, we don’t know what’s on these drives or the ipads but we need it all organized in a single folder on a new laptop. The intent was for John to be able to view and curate these pictures on a laptop (that I would purchase and set ip), which was something he was struggling with on the ipad.

In a very rare act of taking the high road, I’ll skip my usual hate rant for Apple products, their apparent hatred for their user base, and their obvious goal to frustrate those of use who serve as the family tech support.  Suffice to say that I managed to invoke the right magic, spoke the correct incantations and beckoned the  appropriate powers to get all the pictures off of both ipads and safely onto my machine. We’ll skip the pain and move on.

All told, I now had around 2600 pictures to work with and I had two objectives in mind.

  1. Eliminate any duplicates in this collection of pictures
  2. Sort this collection of pictures in to a folder structure for the years and months the pictures were taken.

Finding Duplicates

This was an interesting problem to solve. I am very sure this wasn’t the most efficient way to solve the issue but it was rather interesting to me. My approach was to read through the various folders for the picture files. For each file, I load the filename, extension and path into a database. I also take an md5 hash of the picture as well as the date the picture was taken as reported in the exif data. These additional data points are also loaded into the database.

For those unaware, an md5 hash is a unique fingerprint of the data within the file, which for my purposes is enough to uniquely identify the file based on its content and not on its file name. Also, exif data is data saved inside a picture file the identifies several properties of the picture such as the camera it was taken on, the camera settings and the date the picture was taken.

From here, weeding out the duplicate pictures was a trivial SQL query to select the unique md5 hashes. Now I thought that was pretty neat. See the code for this first stage below. You can see that I am using exiftool to pull out the exif data. Its a great tool written by Phil Harvey.  Also below is the end result of this script

if [ $1 -z ];
then
echo "Need to include the root directory containing pictures"
exit
fi
if [ $1 = '-h' ];
then
echo "   [-d]:to create duplicate database "
exit
fi</code>

search_dir=$1
#echo $search_dir
first=0
find $search_dir -iname '*' -print0 | while IFS= read -r -d '' entry
do
if [ $first -gt 0 ];
then
filename=$(basename "$entry")
extension="${filename##*.}"
path=$(dirname "$entry")
md5hashString=$(md5sum "$entry")
md5hash=($md5hashString)

imageDate="$(exiftool "$entry" |
grep -m 1 "Date/Time Original" |
sed "s|Date/Time Original : ||g" |
sed "s|:|-|g")"

if [ -z "$imageDate" ];
then imageDate="$(exiftool "$entry" |
grep -m 1 "File Modification Date/Time : " |
sed "s|File Modification Date/Time : ||g" |
sed "s|:|-|g")";
fi;
if [ -z "$imageDate" ];
then
imageDate="1980-01-01"
else
imageDate=($imageDate)
fi

sql="INSERT INTO Pictures (Path,File,Extension,Hash,Date) VALUES ('$path', '$filename', '$extension','${md5hash[0]}','$imageDate')"
mysql -D GGPics -e "$sql"
fi
first=1
done
Picture hashes

Organizing the Final Product

Once I had everything in the database, I made a new table of distinct hashes, essentially weeding out the duplicate files. Next was the organizational step which was rather trivial. For this, I wrote a small python script (again probably not as efficient as it could be) to accomplish the organization. The script reads through the unique table, determines if a folder exists already for the year the picture was take, and the month the picture was taken. It creates the folders as needed and then copies the image into the new structure. Code and results below

oldPath = result[1]
oldFilename = result[2]
oldDate = result[3].strftime(‘%Y-%m-%d’)
oldFile = open(oldPath + "/" + oldFilename, "r")

#see if year folder exists
if not os.path.exists(newLocation + "/" +oldDate.split("-")[0]):
os.makedirs(newLocation + "/" + oldDate.split("-")[0])
month = calendar.month_name[int(oldDate.split("-")[1])]
#see if month file exists
if not os.path.exists(newLocation + "/" + oldDate.split("-")[0] + "/" + month):
os.makedirs(newLocation + "/" + oldDate.split("-")[0] + "/" + month)
if((oldDate.split("-")[0] == "1980") or (oldDate.split("-")[0] == "1900")):
newPath = newLocation + "/" + "UnknownDate" + "/"
filecount = len(os.listdir(newPath))
newFile = open(newPath + str(filecount+1) + ".jpg","w")
else:
newPath = newLocation + "/" + oldDate.split("-")[0] + "/" + month + "/"
if os.path.isfile(newPath + oldDate + ".jpg"):
filecount = len(os.listdir(newPath))
newFile = open(newPath + oldDate + "_" + str(filecount+1) + ".jpg","w")
else:
newFile = open(newPath + oldDate + ".jpg","w")
newFile.write(oldFile.read())
newFile.close()
oldFile.close()
lastID = lastID+1
print newFile.name
print str(lastID) + " of " + str(totalFiles) + " done"
curs.execute("UPDATE DistHash SET Processed=1 WHERE ID=" + str(result[0]))
conn.commit()

At the end I end up with a structure that looks something like this. I don’t have the actual pictures anymore so its showing 0 files but you get the idea.

Picture Structure

All in all, this was a pretty fun project I intend to work on a little more and use for my own organization. My wife is a prolific picture taker and over the years I have multiple backups. So having this type of tool to quickly organize and weed out duplicates is pretty useful.

More importantly, this tool gave me the ability to give something important to John. It let me give him back his memories. So often we leave our memories locked up in our devices. I was pleased to be able to give John access to the pictures he hadn’t seen in a long long time and the memories that are attached to them. It reminded me of how precious and fragile our memories are.