Today I Learned: Storage expensive, Data priceless

We have a combination Plex Media/Minecraft/Archive server that we’ve had since we purchased our first 6TB Hard Drive on December 30, 2019 ($99.99 at the time). After some time we upgraded to our massive 14TB Hard Drive ($293.00 at the time) on October 16, 2021. It took a bit over a couple years to fill things up, and now we recently invested into a 16TB Hard Drive ($279.00 at purchase) to continue our storage needs.

$ df -h
Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3       457G  288G  169G  64% /
/dev/sda1      1014M  202M  813M  20% /boot
/dev/sdd1        13T   12T   52G 100% /mnt/usb14
/dev/sdc1       5.5T  4.3T  962G  82% /mnt/usb03

Now it’s time to get this new drive ready for usage.

Read More

Today I learned: command > script

Had to compare two files at work today. Actually, I had to compare one file to a series of files to see what data exists in both of them. This technically comes down to a LEFT JOIN where we only want left column data when it exists in the right column.

So, in writing a script in PHP it comes down to:

<?php
ini_set('MEMORY_LIMIT', '256M');
if (!file_exists($argv[1])) { die('file ' . $argv[1] . ' not found'); }
if (!file_exists($argv[2])) { die('file ' . $argv[2] . ' not found'); }
$fp = fopen($argv[1], 'rt');
$lines = [];
do {
  $line = trim(fgets($fp));
  if (strlen($line) > 0) {
    $lines[] = $line;
  }
} while (!feof($fp));
fclose($fp);
$fp = fopen($argv[2], 'rt');
do {
  $line = trim(fgets($fp));
  if (strlen($line) > 0) {
    if (in_array($line, $lines)) {
      echo "$line\n";
    }
  }
} while (!feof($fp));
fclose($fp);

This script, albeit working like a charm, takes a while with large amounts of records.

After some googling this script isn’t really necessary if you use grep correctly. You also gain the speed of an executable in one fell swoop.

$ grep -Fxf [file1] [file2]

Output is exactly the same.

Today I learned: regex > loop

In writing “quad-quad”, which is a set of four 4-letter speak-able words that can be used as a user-friendly “bookmark” into easily finding a record, I was writing a “quick” program to extract the contents of wikidatawiki-20220820-pages-articles-multistream.xml (a wikipedia dump) and came into this large delay in the following loop:

$alphas = 'qwertyuiopasdfghjklzxcvbnm ';
$newline = '';
for ($x = 0; $x < strlen($line); $x++) {
    $c = substr($line, $x, 1);
    if (strpos($alphas, $c) !== false) {
        $newline = $newline . $c;
    else {
        $newline = $newline . ' ';
    }
}

The loops main purpose is to sanitize any non-letter data by replacing unknown characters with a space for later processing. The end result would be words that I could filter down to 4-character words and tally them up.

When the program read a line around 1mb in length it would “hang” for a bit as it chewed through the data. In a nutshell 25,100,655 bytes of data would take 24m36s. It was time to optimize.

Replacing the previous with the following regex performance was increased immensely.

$newline = preg_replace('/[^a-z]/', ' ', $line);

The same amount of data took 1.892s.

Lesson: If you don’t know regexes, learn regexes.

I Envy You, Alan Rickman

I recently learned about a book of Alan Rickman’s diaries that was published after his death titled “Madly, Deeply: The Diaries of Alan Rickman“.

I, like many others, used to have a diary as a child. Mine started around 1995 when I was in 8th grade. I used to write 2-3 times a week in my 4″x6″ 3-ring bound diary, and there always seemed to be pages begging for more of my life to be etched into the pages. My later months when I was 16 found me burning the book and throwing it and the seared pages into a fast-flowing brook in Kennedy, New York. All those memories, committed to pages and easily referenceable now gone like the leaf travelling down the stream.

Alan Rickman, born 1946, started to keep a detailed progress of his day-to-day starting in 1992. He was 46 at the time. I’m 41, with a slap-in-the-face-2-weeks until I’m 42, and I’ve decided to begin to keep a diary as well. I’m not going to go buy journals with intricate designs from shops, no. I’m going to do it my own way.

https://github.com/mjheick/diary is my project, and it’ll be hosted. It’s currently in the infant stages of development, but I do have the database mockup done and I can add to that as frequently as I’d like to until the frontend is done.

I feel I have to do this, in my own way, in the style of how Alan Rickman detailed his life. The fact that he did it from 46 to his final breaths amazes me. My Grandfather did this as well until his last breaths, and then my Grandmother continued it on.

I feel nothing of value can be acquired of my legacy except by the people that stumble across it and find value for themselves in it, and that’s enough of a driver to do something as simple as this.

A quote from Alans diary sits with me:

14 September

11am Three minutes’ silence which we shared with Kiss Me Kate cast.

Supper at home. Watching more coverage. Still trying to understand something. Cannot remove the fact of 4 million starving in Afghanistan not to mention the innocents in Iraq. There is such political naivety in the US that it only takes one image of five Palestinians dancing in the street to obliterate the bigger picture.

Madly Deeply: The Diaries of Alan Rickman