TakeoutExtractor v1.1

At the start of the year, as part of my general effort to minimise my exposure to Google, I ordered-up a Google Takeout and cleared-out and archived my photos and videos from Google Photos.

To this end, I’ve just released version 1.1 of Takeout Extractor over on github. This releases adds fixes for a few issues discovered during the year, including some kindly reported by other github users. It now correctly handles photos edited using the Google One add-on photo features, to which I subscribed some time back in the last year. I also updated the projects to .net 7 and did some general freshening up. As noted in the readme, this is a source-only release because publishing maui apps currently seems to be broken in Visual Studio 2022. I’ll see if this is resolved by future VS updates. For Windows and MacOS apps (sadly sans the latest fixes) see the v1.0 release

See here for some background on the project. I hope it is useful to someone.

Adventures in Yak Shaving with System.CommandLine

Over the last few months I’ve been using wasting odd moments of free time by tinkering with some code to extract pictures from a Google Takeout archive. The idea is to use the json metadata in the archive to restore the image timestamps (which Google removes from the embedded image metadata – for reasons best known to itself), rationalise the file naming, separate edits and originals, etc. The ultimate aim is to be able to grab a Takeout, extract and locally archive all the images from some period of time (say, the last year) so that I can then manually remove those images from my Google Photos collection. And the aim of that is to reduce my exposure to Google arbitrarily closing my account and consequently deleting my pics. And because I’m old fashioned enough to distrust 100% reliance on “the cloud”.

So I wrote some code and got it working as a c# .net 6 command-line app. Its a bit rough but it does what I need.

And then I had the genius idea of restructuring it as a set of providers that could be used to extract all the other stuff that you might find in a Takeout archive: contacts, emails, whatever. And of course this would need command-line options that apply to each provider, to allow the output to be customised. Which requires a way of grouping those options – basically I needed the idea of “commands” that delimit groups of options and correspond to the different types of media in the archive. I also needed some options that are global and not associated with a command – for input and output directories, for example. At this point my old CommandLineParser class that I’ve been dropping into console apps for the decade or so was not going to cut it.

So I did some reading and decided to try System.CommandLine – the shiny new way to parse command line parameters. This is still in beta but my initial impression was favourable. Basically, you create an object model of your command-line syntax, hook it up to handlers, and let the library do the grunt work of parsing the command-line into values, handling errors, automatically generating help text (particularly impressive), and lots of other stuff.

Here’s a little test app that I made:

    public static int Main(string[] args)
        // audio command
        var thresholdOpt = new Option<int>("--threshold");
        var scaleOpt = new Option<double>("--scale");
        var audioCommand = new Command("audio") { thresholdOpt , scaleOpt};
            (int threshold, double scale) => { Console.WriteLine($"threshold={threshold}, scale={scale}"); },
            thresholdOpt, scaleOpt);

        // video command
        var monochromeOpt = new Option<bool>("--mono", description: "Monochrome");
        var colourOpt = new Option<bool>("--colour");
        var brightnessOpt = new Option<int>("--brightness");
        var videoCommand = new Command("video") { monochromeOpt, colourOpt, brightnessOpt };
            (bool mono, bool colour, int brightness) => { Console.WriteLine($"mono={mono}, colour={colour}, brightness={brightness}"); },
            monochromeOpt, colourOpt, brightnessOpt);

        // root command
        var infileOpt = new Option<FileInfo>("--i");
        var outfileOpt = new Option<FileInfo>("--o);
        var rootCommand = new RootCommand("test");
            (FileInfo infile, FileInfo outfile) => { Console.WriteLine($"i={infile}, o={outfile}"); },
            infileOpt, outfileOpt);

        return rootCommand.Invoke(args);

This implements the commands for an entirely fictitious test program that might be invoked with arguments like:

test --i "input.dat" --o "output.dat" audio --threshold 42 --scale 3.14 video --mono --brightness 60

Hopefully the similarity to my Takeout extractor should be obvious.

I was initially a bit mystified by the use of lambdas as “handlers” that are passed the values of various options. This mean that there was no single place in the code where everything about the parse was “known”. I didn’t know why it was like that but I thought I could work around it.

The first difficulty I encountered was that, while it is possible to associate options with the root command and also associated commands (which have their own options), only the first command is ever parsed. So if I include the audio command then the video command is ignored. Also, if any command is included in the args array then options associated with the root command itself (e.g. --i and --o) are not parsed. Clearly I was either not understanding something, or I wasn’t using it in the way that it was designed to be used. I opened an issue on github and fairly quickly got confirmation that it was the latter.

There was, however, cause for hope: I could split the command-line at command-token boundaries and parse each subset of arguments separately. Since RootCommand.Invoke() is actually an extension method (more of this below) I wrote a new extension method to do this:

    public static int InvokeMultiCommand(
        this RootCommand command, 
        string[] args)
        var commands = new List<Command>() { command };
        foreach (var seg in SegmentArgs(args, commands.ToArray()))
            var exitCode = command.Invoke(seg);
            if (exitCode != 0)
                return exitCode;
        return 0;    }

SegmentArgs() does the job of chopping up the string[] arguments array into a string[][].

With that working, I looked at how to customise the help output to include all commands and their options. As it stood, invoking the app it with the –help option gave the following:

I needed descriptions for all the commands, and also for their options to be listed.

After reading the documentation for help customisation, and digging into how the help is generated, I realised that what I’d done so far was the easy bit. The library provides a CommandLineBuilder class, instances of which can be wired to lambdas that customise how it generates help text. But having done that, the CommandLineBuilder instance is responsible for doing the parse via it’s Invoke() method, not the root command. And there didn’t seem to be a way to make this compatible with the code I’d already written: I wanted to parse commands separately but have help generation that was aware of the syntax of all commands. There seemed to be a fundamental mismatch.

I tried extending CommandLineBuilder by the deeply unfashionable approach of sub-classing, but its Build() method (which generates a Parser object to actually do the parse) isn’t virtual so I couldn’t override it. And many of its key methods are implemented as extension methods, so I couldn’t override them either.

I tried extending CommandLineBuilder instead, but I found that I was having to wrap more and more of its functionality. And because CommandLineBuilder is injected as a dependency at various points, my non-overriding extension methods weren’t being called anyway.

So I gave up shaving the yak. At the top of my stack of requirements, I just wanted to archive photos. At the bottom of the stack I was hacking on a command-line parsing library to extend it in an unusual way. It was an interesting exercise, but I was wasting time. Its always good to know when to give up and pop the stack.