Had to compare two files at work today. Actually, I had to compare one file to a series of files to see what data exists in both of them. This technically comes down to a LEFT JOIN where we only want left column data when it exists in the right column.
So, in writing a script in PHP it comes down to:
<?php
ini_set('MEMORY_LIMIT', '256M');
if (!file_exists($argv[1])) { die('file ' . $argv[1] . ' not found'); }
if (!file_exists($argv[2])) { die('file ' . $argv[2] . ' not found'); }
$fp = fopen($argv[1], 'rt');
$lines = [];
do {
$line = trim(fgets($fp));
if (strlen($line) > 0) {
$lines[] = $line;
}
} while (!feof($fp));
fclose($fp);
$fp = fopen($argv[2], 'rt');
do {
$line = trim(fgets($fp));
if (strlen($line) > 0) {
if (in_array($line, $lines)) {
echo "$line\n";
}
}
} while (!feof($fp));
fclose($fp);
This script, albeit working like a charm, takes a while with large amounts of records.
After some googling this script isn’t really necessary if you use grep correctly. You also gain the speed of an executable in one fell swoop.
$ grep -Fxf [file1] [file2]
Output is exactly the same.