How to: Create a static copy of a Gallery 2 site

For many years I maintained an instance of Gallery 2 (http://galleryproject.org/) to share some of our photos with family and friends. However, life and technology moves on and I simply don’t have time to maintain the installation of the Gallery software* and couldn’t justify the time to upgrade it to version 3. Similarly, we don’t tend to use it for our photos any more and instead share things via Flickr, Facebook or just plain email.

That said, I didn’t want to remove the site entirely… just “make it safe” and make it trivial to host/maintain. To that end, and because I had no requirement to keep on updating the site, I looked around for solutions on how to make a static copy of the site (i.e. one that could just be served as static files without needing PHP or a MySQL database etc). The first version of Gallery had a static mode that you could put the site into and then crawl to create a static copy. Version 2 doesn’t have that functionality.

In the end I settled on crawling the site with ‘wget’ and then using a series of Perl one liners to tidy up the resulting HTML. This results in a reasonable copy of the site in static form. The only downside is that the image sizes are lost so the pages just show the images at the default size – no great loss though as we have the originals anyway.

In case anyone is faced with a similar task, here are the commands and the bash script used to clean up the pages.

wget -m -R "*g2_v*,*g2_h*,*g2_c*,*g2_i*" http://photos.example.com/

-m = mirror the site
-R “reject,list” = list of file patterns to reject

The above command can take many hours to run, on my site the spidering took seven hours: Downloaded: 30453 files, 526M in 27m 7s (331 KB/s)

Once the spidering is done you have copies of the pages and the images etc but the pages still have references to PHP files and so on… so you need to do a bit of cleaning up.

To do this I created a shell script which I have included below.

Edit 13/02/2020 – the original shell script was created and run on Ubuntu Linux back in 2014. It uses “rename” but Daniel M. Drucker kindly got in contact with me to point out that the behaviour of this command is different on different Linux distros and macOS and may not work. I’ve left the original command in the script below but if it doesn’t work then you may be able to replace that line (4) with this one. However, I no longer have a gallery2 site to test this on so use at your own risk:

find . -name '*g2_*' | while read file; do newfile=`echo $file | awk -F [=.] '{print "." $2 "_page_" $4 ".html" }'`; mv $file $newfile; done;

Original shell script:

# Tidy up offline gallery

# Rename files
find . -name '*g2_*' -exec rename s'/index\.html\?g2_page=(\d+)/index_page_$1.html/' {} \;

# Fix links to the renamed files above
perl -0777 -spi -e 's!\?g2_page=(\d+)!index_page_$1.html!gs' `find . -name \*.html`

# Remove various links to scripts and functionality we don't have now
perl -0777 -spi -e 's!<script.*?</script>!!gs' `find . -name \*.html`
perl -0777 -spi -e 's!<a href=\"/main\.php.*?</a>!!s' `find . -name \*.html`
perl -0777 -spi -e 's!<form\s+id=\"search_SearchBlock\".*?</form>!!gs' `find .  -name \*.html`
perl -0777 -spi -e 's!<div id=\"gsFooter\">!<div id=\"gsFooter\">Was once powered by:<br />!gs' `find . -name \*.html`
perl -0777 -spi -e 's!<div class=\"block-core-PhotoSizes.*?</div>!!s' `find .  -name \*.html`

# Fix first breadcrumb link
# CHANGEME = Update to the name of your site
perl -0777 -spi -e 's!<a href=\"/main\.php.*? class=\"BreadCrumb-1\".*?</a>!<a href="/" class="BreadCrumb-1">Someone'\''s photos</a>!gs' `find . -name \*.html`

# Fix image links
perl -0777 -spi -e 's!<a href=\".*?g2_imageViewsIndex=[\d+]\">[\s*]<img(.*?)/>[\s*]</a>!<img$1/>!g' `find . -name \*.html`

# Remove any remaining links to main.php
perl -0777 -spi -e 's!<a href="\"/main\.php.*?</a" data-mce-href="\"/main\.php.*?</a">!!gs' `find . -name \*.html`</a>

Use the above at your own risk… 😉

  • It turns out that the core Gallery team are also taking a break from the software too and it’s “in hibernation”