<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[The Archive]]></title><description><![CDATA[Flux capacitor? Best I can do is a Hypercore Doc.
Take a float down the archive]]></description><link>http://owforum.co.uk/category/11</link><generator>RSS for Node</generator><lastBuildDate>Tue, 17 Mar 2026 15:22:18 GMT</lastBuildDate><atom:link href="http://owforum.co.uk/category/11.rss" rel="self" type="application/rss+xml"/><pubDate>Mon, 15 Aug 2022 01:21:04 GMT</pubDate><ttl>60</ttl><item><title><![CDATA[Organising the Archive]]></title><description><![CDATA[<p dir="auto">That works :D<br />
Made a few tweaks that seemed to take fine.</p>

Changed the button text from &lt;a&gt; to &lt;span&gt; else it generates green link text because of the css which could be hard to read on the blue button. Tried to override it with some style but gave up and made it span instead :)
Added &lt;span&gt;&amp;nbsp;&lt;/span&gt; after the &lt;/button&gt; so  it creates a little space between itself and the voting bit.
Minor grammatical update to the missing post bit.

<p dir="auto">Full modified script below:</p>
<p>Spoiler <i class="fa fa-eye"></i></p><p></p>
#!/usr/bin/perl

=head1 NAME

forum-archive - Put a google chached community.onewheel.com thread back together

=cut

use IO::Handle;
use HTTP::Request;
use LWP::UserAgent;
use File::Copy;
use File::Path;
use File::Find;
use Pod::Usage;
use POSIX;

my($HEADER, @POSTS, $FOOTER, $COUNT, $WEIRD);
my(%META)=(
	'base'		=&gt; 'https://archive.owforum.co.uk/',
	'logo'		=&gt; 'http://archive.owforum.co.uk/Images/OWForumArchive.png',
	'logo_ht'		=&gt; '60',
	'icon'		=&gt; 'https://archive.owforum.co.uk/assets/resources/OWForumArchiveIcon.png',
	'profiles'	=&gt; '../../../assets/uploads/profile',
	'resources'	=&gt; '../../../assets/resources',
	'system'		=&gt; '../../../assets/uploads/system',
);

my(%RESOURCES)=(
	'fonts'		=&gt; 'https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css',
	'style'		=&gt; 'https://owforum.co.uk/assets/client-darkly.css',
	'broken'		=&gt; 'https://icon-icons.com/downloadimage.php?id=5390&amp;root=39/PNG/128/&amp;file=brokenfile_5952.png',
);

my(%TOPICS)=();
my(%SEARCH_PATH)=();

=head1 SYNOPSIS

forum_archive &lt;directory1&gt; [directory2] [directory3] [...]

=head1 DESCRIPTION

This script takes a set of files downloaded from the google page cache
for the community.onewheel.com NodeBB forum and tries to put it back
together.

=head2 Content Management

A small number of resources are available from the internet.  B&lt;forum_archive&gt;
will download these assets, if needed, for inclusion to the archive file
structure.  If assets have been downloaded recently (within the last day)
then new downloads are not attempted.  This should keep B&lt;forum_archive&gt; from
slamming remote resources during testing phases.

=cut

sub wget {
	my($src, $asset, $dst)=@_;
	my($file)=IO::Handle-&gt;new;
	my($req)=HTTP::Request-&gt;new( 'GET' =&gt; $src );
	my($get)=LWP::UserAgent-&gt;new;
	my($response);

	$asset=~s|^[/.]+||;
	
	if(!-s "$asset/$dst" || -M "$asset/$dst" &gt; 1 ) {
		$response=$get-&gt;request($req);
		if($response-&gt;is_success) {
			File::Path::make_path($asset, { 'chmod' =&gt; 0755 });
			open($file, '&gt;', "$asset/$dst");
			print $file $response-&gt;decoded_content;
			close($file);
		}
	}
}

=pod

B&lt;forum_archive&gt; will dynamically create the C&lt;assets&gt; and C&lt;topic&gt;
directory structures as needed to store content found within the
post.  In an effort to increase efficiency for commonly used content,
such as avatars, actual copying of these files will not occur each time
such content is seen, after its initial copy.

=cut

sub location {
	my($file)=@_;
	my($try);

	if(-r $file) {
		return($file);
	} else {
		foreach my $dir (keys(%SEARCH_PATH)) {
			$try=join('/', $dir, $file);
			return($try) if(-r $try);
		}
	}

	&amp;wget($RESOURCES{'broken'}, $META{'resources'}, 'broken-file.png');
	return(join('/', $META{'resources'}, 'broken-file.png'));
}

sub copy {
	my($src, $asset, $dst)=@_;

	$asset=~s|^[/.]+||;

	if(!-s "$asset/$dst" || -M "$asset/$dst" &gt; 1 ) {
		File::Path::make_path($asset, { 'chmod' =&gt; 0755 });
		File::Copy::copy(&amp;location($src), "$asset/$dst");
	}
	
}

=pod

Avatar image are stored in a central location, shared by the entire
archive.  This lowers the space requirements of the archive and
increases page load times and browser cache efficiency.

=cut

sub avatar {
	my($img)=@_;
	my($dst)=$img; $dst=~s|^.*/||;

	&amp;copy($img, $META{'profiles'}, $dst);

	return(join('/', $META{'profiles'}, $dst));
}

=pod

Uploaded images which are stored in the archive may be named slightly
differently on the archive than on the original.  NodeBB has gone through
a couple iterations about how to handle this conflict, and B&lt;forum_archive&gt;
tries to handle this by using the more unique C&lt;ALT&gt; tag element parameter
name.  When that doesn't work, the original name is kept.  Images are also
grouped by post, to avoid naming conflicts between different posts.

Additionally, if an image is referenced in a post, but is not contained in
the archive, a standard broken file image is substituded.

=cut

sub upload {
	my($src, $alt)=@_;
	my($new)=$src; $new=~s|^.*/||;

	if($alt=~m/\.\w+$/) {
		$new=join('_', $META{'postid'}, $alt);
	} else {
		$new=join('_', $META{'postid'}, $new);
	}

	&amp;copy($src, $META{'path'}, $new);
	return(sprintf('&lt;img src="%s" alt="%s"', $new, $alt));
	
}

=head2 Archive Display

One major change in the archive from the original is the banner.  The original
banner is replaced by one tailored to the archive, to set it apart from
the original forum and make it clear it is a wholly different entity.

=cut

sub banner {
	my($start, $img, $end)=@_;
	return($start.qq{
		&lt;div class="container"&gt;
		  &lt;div class="navbar-header"&gt;
		    &lt;a href="http://archive.owforum.co.uk"&gt;
		      &lt;img alt="The Archive homepage" src="$META{'logo'}" height="$META{'logo_ht'}"&gt;
		    &lt;/a&gt;
		  &lt;/div&gt;
		  &lt;div class="navbar-header pull-right"&gt;
		    &lt;p class="text-right" style="padding-top: 10px"&gt;
		      This page is an archived copy of the old Onewheel Forum.
		    &lt;/p&gt;
		  &lt;/div&gt;
		&lt;/div&gt;
	}.$end);
}

=pod

One of the differences which makes the archive different is, unfortunately,
that some posts are missing.  When this occurs, B&lt;forum_archive&gt; inserts a
break in the timeline with a note about the message IDs which are absent.

=cut

sub missing_post {
	my($id)=@_;

	return(qq{
	  &lt;li component="topic/necro-post" class=" necro-post timeline-event" data-index="$id"&gt;
	    &lt;small class="timeline-text"&gt;
           &lt;span&gt;Post(s) $id are missing from the archive :(&lt;/span&gt;&lt;br /&gt;
		 &lt;span&gt;
             Know where these posts are?  Visit
             &lt;a href="https://owforum.co.uk/topic/158/missing-posts"&gt;the new forum&lt;/a&gt;
             for how to help get them added :)
		 &lt;/span&gt;
         &lt;/small&gt;
	  &lt;/li&gt;
	});
}

=pod

Although most of this script works to remove unnecessary content from archive
posts, one little thing is added.  A button is added to help people copy
a posts permalink to their clipboard to facilitiate sharing.

=cut

sub share_fn {
	return(qq{
      &lt;script&gt;
        function shareButton(index) {
          navigator.clipboard.writeText("$META{'url'}#"+index);

          var tooltip = document.getElementById("shareTooltip"+index);
          tooltip.innerHTML = "Stoke Saved To Clipboard!";
          setTimeout(function() {
            tooltip.innerHTML = "&amp;nbsp; Share This Post! &amp;nbsp;";
          }, 3000);
        }
      &lt;/script&gt;
	});
}

sub share_btn {
	return(qq{
        &lt;button onclick="shareButton($META{'postid'})" class="btn btn-sm btn-primary"&gt;
	        &lt;span class="tooltiptext" onclick="shareButton($META{'postid'});event.preventDefault();" id="shareTooltip$META{'postid'}" href="$META{'url'}#$META{'postid'}"&gt;&amp;nbsp; Share This Post! &amp;nbsp;&lt;/span&gt;
        &lt;/button&gt;
	&lt;span&gt;&amp;nbsp;&lt;/span&gt;
	});
}

=pod

An interactive, HTML5 based NodeBB forum requires a lot of javascript to
work.  Since the archive is a static copy of that data, all of the javascript
is removed, and the archive works nearly identically on all platforms.

=cut

sub global {
	s|https?://community.onewheel.com/|$META{'base'}|sg;

	s|&lt;noscript&gt;.*?&lt;/noscript&gt;||sg;
	s|&lt;script&gt;.*?&lt;/script&gt;||sg;
	s|&lt;script .*?&gt;&lt;/script&gt;||sg;

	s|\s+&lt;div component="topic/reply/container" .*?&lt;/div&gt;||s;
	s|\s+&lt;a component="topic/reply/guest" .*?&lt;/a&gt;||m;

	s|class="posts"|class="posts timeline"|mg;
	s|\n\n&lt;hr&gt;\n||sg;
	if(m|&lt;span component="topic/post-count".*?&gt;(\d+)&lt;/span&gt;|m) {
		$COUNT=$1;
	}
}

=pod

B&lt;forum_archive&gt; assumes that all the headers from all the source files
are identical, and uses the first one it finds.  With that content,
the new banner is inserted, interactive metadata and buttons are removed,
and the new style is setup. B&lt;forum_archive&gt; also collects important
information like the page path and total message count.

=cut

sub header {
	local($_)=@_;

	#Cleanup to a reasonable starting header only
	s/(&lt;ul component="topic" class="posts timeline" .*?&gt;\s+).*$/$1/s;

	s/(&lt;body .*?&gt;).*$/$1/m;

	#Grab some important info
	if(m|&lt;link rel="canonical" href="($META{'base'}(.*?))"&gt;|) {
		$META{'url'}=$1;
		$META{'path'}=$2;
	}

	#reset links
	#strip out unneeded content
	s|(&lt;meta property="og:url" content=".*?)/\d+\?.*?"&gt;|$1"&gt;|mg;

	s|\s+&lt;meta name="msapplication-\w+" .*?&gt;||sg;

	s|\s+&lt;link rel="icon" sizes=.*?&gt;||sg;
	s|\s+&lt;link rel="prefetch" .*?&gt;||sg;
	s|\s+&lt;link rel="prefetch stylesheet" .*?&gt;||sg;
	s|\s+&lt;link rel="manifest" .*?&gt;||sg;
	s|\s+&lt;link rel="search" .*?&gt;||sg;
	s|\s+&lt;link rel="apple-touch-icon" .*?&gt;||sg;
	s|\s+&lt;link rel="alternate" .*?&gt;||sg;
	s|\s+&lt;link rel="next" .*?&gt;||sg;
	s|\s+&lt;link rel="prev" .*?&gt;||sg;

	s|(&lt;link rel="icon" type="image/x-icon" href=").*?"&gt;|$1$META{'icon'}"&gt;|mg;

	&amp;wget($RESOURCES{'style'}, $META{'resources'}, 'client-darkly.css');
	s|&lt;link rel="stylesheet" .*?&gt;|&lt;link rel="stylesheet" href="$META{'resources'}/client-darkly.css"&gt;\n\t&lt;link rel="stylesheet" href="$RESOURCES{'fonts'}"&gt;|s;

	if(m|forum-logo" src="(.*?/site-logo.png)"|m) {
		&amp;copy($1, $META{'system'}, 'site-logo.png');
		s|forum-logo" src=".*?"|forum-logo" src="$META{'system'}/site-logo.png"|mg;
	}

	s|(&lt;h1 component="post/header" .*?)&gt;|$1 style="padding-top: 50px;"&gt;|m;

	s|\s+&lt;section class="menu-section".*?&lt;/section&gt;||s;

	#Insert new banner
	s|(&lt;nav class="navbar navbar-default navbar-fixed-top header".*?&gt;).*?&lt;img alt="Onewheel Home Page" class=" forum-logo" src="(.*?)"&gt;.*?&lt;/nav&gt;|&amp;banner($1, $2, '&lt;/nav&gt;')|se;

	#Remove unnecessary buttons
	s|\s+&lt;a class="hidden-xs" target="_blank".*rss.*&lt;/a&gt;||mg;
	s|\s+&lt;div title="Sort by" .*?&lt;/div&gt;||s;
	s|&lt;li&gt;[^RL]+&lt;span&gt;Register&lt;/span&gt;.*?&lt;/li&gt;||gs;
	s|&lt;li&gt;[^RL]+&lt;span&gt;Login&lt;/span&gt;.*?&lt;/li&gt;||gs;
	s|&lt;a component="topic/reply/guest" .*?&lt;/a&gt;\s*||s;
	s|&lt;ol class="breadcrumb"&gt;.*?&lt;/ol&gt;||s;

	s|&lt;span class="hidden-xs"&gt;Loading More Posts&lt;/span&gt; &lt;i .*?&lt;/i&gt;||mg;

	s|class="slideout-panel" style=".*?"|class="slideout-panel"|m;

	#s|&lt;!--&lt;base href=.*$||m;
	s|&lt;/style&gt;&lt;/head&gt;.*$|&lt;/style&gt;|m;
	s|\s+(&lt;nav id="menu")|\n&lt;/head&gt;&lt;body&gt;\n$1|s;
	s|&lt;/head&gt;|&amp;share_fn.'&lt;/head&gt;'|es;

	return($_);
}

=pod

A lot of cleanup occurs within each forum post.  Firstly, and with the
javascript removed, all times are calculated and coded directly in UTC.
Interactive buttons are removed, and links to content (such as user
pages) not contained in the archive are also removed.  Other interactive
content (e.g. online status) is removed, too.

Media, such as avatars and uploaded content is collected and placed
properly into the new archive filesystem structures.

=cut

my(@MONTH)=qw(
	January February March April May June July
	August September October November December
);
sub utctime {
	my($epoch)=int($_[0]/1000);
	my($sec, $min, $hr, $day, $month, $year, $wd, $jd, $dst)=gmtime($epoch);

	return(sprintf("%d %s %d, %02d:%02d UTC",
		$day, $MONTH[$month], $year+1900, $hr, $min));
}

sub post {
	local($_)=@_;
	my($time);
	
	if(m/data-timestamp="(\d+)"/s) {
		#$time=POSIX::strftime("%e %B %Y, %H:%M UTC", gmtime($1/1000));
		$time=&amp;utctime($1);
		s|(&gt;&lt;span class="timeago") title="(.+?)"&gt;|$1 title="$time" datetime="$2"&gt;$time|sg;
	}

	s|&lt;span class="replies-last .*&lt;/span&gt;||mg;
	s|&lt;a component="post/parent" .*?&gt;(.*?)&lt;/a&gt;|$1|mg;
	s|&lt;i component="user/status" .*?&gt;&lt;/i&gt;||mg;
	s|&lt;a href=".*?/user/.*?"&gt;(.*?)&lt;/a&gt;|&lt;span class="btn-link"&gt;$1&lt;/span&gt;|sg;
	s|&lt;a class="plugin-mentions-user .*?&gt;(.*?)&lt;/a&gt;|&lt;span class="btn-link"&gt;$1&lt;/span&gt;|mg;
	s|&lt;a href="[^"]+/user/.*?"&gt;\s+(&lt;span class="avatar.*?&gt;)\s+&lt;/a&gt;|$1|sg;
	s|(?&lt;= component="user/picture" src=")([^"]+)|&amp;avatar($1)|meg;
	s|(?&lt;= component="avatar/picture" src=")([^"]+)|&amp;avatar($1)|meg;
	s|&lt;img src="(.*?)" alt="(.*?)"(?= \s*class="\s*img-responsive)|&amp;upload($1, $2)|meg;
	s|&lt;a (component="post/reply-count".*? href=").*?/(\d+)[?#].*?(".*?)&gt;|&lt;a $1#$2$3&gt;|mg;
	s|\s+&lt;i component="post/edit-indicator".*?&lt;/i&gt;||mg;
	s|\s+&lt;i class="fa fa-fw fa-chevron-right".*?&lt;/i&gt;||mg;
	s|\s+&lt;i class="fa fa-fw fa-chevron-down hidden".*?&lt;/i&gt;||mg;
	s|\s+&lt;i class="fa fa-fw fa-spin fa-spinner hidden".*?&lt;/i&gt;||mg;
	s|\s+&lt;small class="pull-right"&gt;\s+&lt;span class="bookmarked"&gt;.*?&lt;/span&gt;\s+&lt;/small&gt;||sg;

	s|(?&lt;= class="avatar" src=")([^"]+)|&amp;avatar($1)|meg;
	s|(component="user/picture" data-uid="\d+" src=")([^"]+)|$1.&amp;avatar($2)|meg;
	s|(&lt;img component="user/picture")|$1 class="avatar  avatar-sm2x avatar-rounded"|mg;
	s|(data-uid="\d+") class="user-icon"|$1 class="avatar  avatar-sm2x avatar-rounded"|mg;
	s|(title="\w+") class="user-icon"|$1 class="avatar  avatar-xs avatar-rounded"|mg;
	s|id="[^"]*google-cache-hdr"||sg;
	s|This is Google's cache of||sg;

	s|&lt;a class="permalink" href=".*?"&gt;(.*?)&lt;/a&gt;|&lt;span class="text-muted pull-right"&gt;$1&lt;/span&gt;|mg;
	s|\s+&lt;span component="post/tools".*?&lt;/span&gt;||sg;
	s|&lt;a component="post/\w+vote" .*?&gt;\s+(.*?)&lt;/a&gt;\s+|$1|sg;
	s|&lt;span class="votes"&gt;|&lt;span class="votes text-muted"&gt;|mg;

	s|\s+&lt;span&gt;\s+&lt;/span&gt;||sg;
	s|\s+&lt;span class="visible-xs-inline-block [^&gt;]+&gt;\s+&lt;/span&gt;||sg;
	s|&lt;small data-editor="[^"]*" .*?&lt;/small&gt;\s+||sg;
	s|\s+&lt;span class="bookmarked"&gt;&lt;i class="fa fa-bookmark-o"&gt;&lt;/i&gt;&lt;/span&gt;||sg;
	s|(&lt;span class="visible-xs-inline-block[^&gt;]+&gt;)(\s+&lt;span class="text-muted pull-right"&gt;&lt;span.*?&lt;/span&gt;&lt;/span&gt;)(.*?&lt;/span&gt;)|$1$3&lt;/small&gt;\n&lt;small class="pull-right"&gt;$2|sg;

	s|&lt;span class="post-tools"&gt;|'&lt;span class="post-tools"&gt;'.&amp;share_btn|es;

	return($_);
}

=pod

Similarly to the header, the HTML after all the posts is based on the first
file seen and removes some of the content better suited to an interactive
stite than a static, archive site.

=cut

sub footer {
	local($_)=@_;

	s|&lt;div class="progress-bar"&gt;&lt;/div&gt;||s;
	s|&lt;div class="spinner" role="spinner"&gt;&lt;div .*?&lt;/div&gt;&lt;/div&gt;||s;
	s|&lt;div id="nprogress"&gt;.*?&lt;/div&gt;&lt;/div&gt;&lt;/div&gt;||s;

	return($_);
}

=head2 Data Import Process

Each downloaded F&lt;.html&gt; file from the forum is read and separated into
3 sections, a header, a list of posts, and a footer.  The first file's
header will be processed and used as the archive files header, same with
the footer.

Each post is pulled into an array.  If a post occurs in multiple downloaded
cache files, then the last one read is kept.  Each one is processed and
prepared for the final archive topic.

=cut

sub ingest {
	my($source)=@_;
	my($html)=IO::Handle-&gt;new;
	my($move, $category_url, $category);
	local($/)="\n\t\t\t\t&lt;/li&gt;\n\t\t\t";

	open($html, '&lt;', $source);
	while(&lt;$html&gt;) {
		&amp;global;

		if(m/^&lt;!DOCTYPE html&gt;/) {
			if(m/class="nprogress-busy"/) {
				$WEIRD=0;
			} else {
				$WEIRD=1;
			}

			if(!$HEADER || !$WEIRD) {
				$HEADER=&amp;header($_);
			}
			s/^.*&lt;ul component="topic" class="posts timeline" .*?&gt;\s+\n//s;
		}

		if(m|&lt;/html&gt;$|) {
			if(!$FOOTER || !$WEIRD) {
				$FOOTER=&amp;footer($_);

				if($FOOTER=~s|(&lt;div class="post-bar"&gt;.*\n&lt;hr&gt;\n\t\t&lt;/div&gt;)||s) {
					$move=$1;
					($category)=($HEADER=~m|&lt;meta property="article:section" content="(.*?)"&gt;|m);
					($category_url)=($HEADER=~m|&lt;link rel="up" href="(.*?)"&gt;|m);
					$category=~s/&amp;amp;/&amp;/g;

					$HEADER=~s|&lt;/h1&gt;\n|&lt;/h1&gt;\n$move|s;
					$HEADER=~s|&lt;div class="tags pull-left"&gt;.*&lt;div class="topic-main-buttons pull-right"&gt;|&lt;div class="topic-main-buttons pull-left"&gt;&lt;a href="$category_url"&gt;$category&lt;/a&gt;|s;

					$HEADER=~s|class="stats hidden-xs"|class="stats text-muted"|mg;
					$HEADER=~s|(&lt;span component="topic/post-count" class="human-readable-number" title="\d+"&gt;\d+&lt;/span&gt;)&lt;br&gt;\s+&lt;small&gt;Posts&lt;/small&gt;|&lt;i class="fa fa-fw fa-pencil" title="Posts"&gt;&lt;/i&gt;$1|s;
					$HEADER=~s|(&lt;span class="human-readable-number" title="\d+"&gt;\d+&lt;/span&gt;)&lt;br&gt;\s+&lt;small&gt;Views&lt;/small&gt;|&lt;i class="fa fa-fw fa-eye" title="Views"&gt;&lt;/i&gt;$1|s;
				}

			}
			last;
		}

		if(m/data-index="(\d+)"/) {
			$META{'postid'}=$1;
			$POSTS[$META{'postid'}]=&amp;post($_);
		}
	}
	close($html);
}

=head2 Execution

The script expects a directory structure of HTML files which have valid links
to media files.  Other than that, it is pretty agnostic about the structure
of the directory.  It will read the header to find out what the name of the
document should be, create it, and write to it.

In addition to processing archived posts, a special post is inserted for
anything missing.  B&lt;forum_archive&gt; will also produce a report on F&lt;STDOUT&gt;
with information on missing posts.

=cut

sub process {
	my($html)=IO::Handle-&gt;new;
	my($posts, $total)=(0, 0);
	my(@missing)=();

	$HEADER="";
	@POSTS=();
	$FOOTER="";
	$COUNT=0;

	foreach my $entry (@_) {
		&amp;ingest($entry);
	}

	for(my $i=0; $i&lt;$COUNT; $i++) {
		if(!exists($POSTS[$i])) {
			my($begin);

			for($begin=$i; !exists($POSTS[$i+1]) &amp;&amp; $i&lt;$COUNT; $i++) {
				$posts++;
				$total++;
			}

			if($i==$begin) {
				$POSTS[$i]=&amp;missing_post($i);
			} else {
				$POSTS[$i]=&amp;missing_post("$begin-$i");
				push(@missing, "$begin-$i");
			}

			$posts++;
		}
		$total++;
	}

	if($total) {
		printf("%s, Total: %d, Coverage: %d%%, Missing: %s\n", $META{'path'},
			$total, (1-$posts/$total)*100, join(' ', @missing) || 'None');
	} else {
		printf("%s, Total: %d\n", $META{'path'}, $total);
	}

	File::Path::make_path($META{'path'}, { 'chmod' =&gt; 0755 });
	open($html, '&gt;', join('/', $META{'path'}, 'index.html'));
	print $html $HEADER;
	print $html @POSTS;
	print $html $FOOTER;
	close($html);
}

if($ARGV[0] =~ m/^-+h/i) {
	pod2usage(-verbose =&gt; 2, -exitval =&gt; 0);
} elsif(! -d $ARGV[0]) {
	pod2usage(-verbose =&gt; 1, -exitval =&gt; 0);
}

find(sub {
	$File::Find::prune=1 if(m/^assets$/);
	$File::Find::prune=1 if(m/^topic$/);

	if(m/(\d+)\s+.*\.html$/) {
		push(@{$TOPICS{$1}}, $File::Find::name);
		$SEARCH_PATH{$File::Find::dir}=1;
	}
}, @ARGV);

foreach my $topic (sort({ $a &lt;=&gt; $b }  keys(%TOPICS))) {
	&amp;process(sort(@{$TOPICS{$topic}}));
}


=head1 NOTES

B&lt;forum_archive&gt; is basically a conglomeration of regular expressions.  This
is by no means the best way to manage and manipulate complext HTML files.
However, given the static nature of this content and its relative complexity,
using regular expressions requires a substantially smaller code base and
interpretation of the original source files.  Essentially, in this case,
it is too much easier to strip out the junk you know you don't want than
to understand the entire document schema fully enough to make the meaningful
changes the right way.

=pod

<p dir="auto"></p><p></p>

<p dir="auto">Had too adjust the css and start migrating assets off the old format so if pages load funny force a cache refresh by pressing CTRL+F5.</p>

<p dir="auto">Also... made a custom <a href="https://archive.owforum.co.uk/thispagewontload" rel="nofollow ugc">404</a> page ;)<br />
Enjoy~</p>
]]></description><link>http://owforum.co.uk/topic/30/organising-the-archive</link><guid isPermaLink="true">http://owforum.co.uk/topic/30/organising-the-archive</guid><dc:creator><![CDATA[Lia]]></dc:creator><pubDate>Mon, 15 Aug 2022 01:21:04 GMT</pubDate></item><item><title><![CDATA[Archive Gallery]]></title><description><![CDATA[<p dir="auto">It ain't much but it's something.<br />
These are taking a while to add since there are so many cool pics that I'm barely into topic 105 &gt;.&gt;</p>
<h1><a class="anchor-offset" name="a-href-https-archive-owforum-co-uk-gallery-rel-nofollow-ugc-https-archive-owforum-co-uk-gallery-a"></a><a href="https://archive.owforum.co.uk/gallery/" rel="nofollow ugc">https://archive.owforum.co.uk/gallery/</a></h1>
<p dir="auto"><a href="https://archive.owforum.co.uk/gallery/" rel="nofollow ugc"><img src="/assets/uploads/files/1660769336795-capture.png" alt="Capture.PNG" class=" img-responsive img-markdown" width="1536" height="749" /></a></p>
<p dir="auto">The gallery itself is mostly comprised of code by <a href="https://timnwells.medium.com/create-a-simple-responsive-image-gallery-with-html-and-css-fcb973f595ea" rel="nofollow ugc">Tim Wells</a> who I've added credit for in a comment in the HTML. Probs not needed but I like to give credit where it's due.</p>
<p dir="auto">Before I add more I need to:</p>
<ul>
<li>Change how the images align so it's hozizontally instead of vertical.</li>
<li>Then I can attempt to <code>lazyload</code> anything in an <code>&lt;img&gt;</code> tag so it only tries once you're bringing it on screen. If I don't fix the alignment order it'll need to load everything before actually figuring out the order which would be bad for performance.</li>
<li>Add an auto scroll to drag the archive page a little down because posts are loading from the top rather than from the base of the banner leading to posts clipping off screen.</li>
<li>Add the rest of the images slowly.</li>
<li>Maybe add breaks between years or something.</li>
<li>Tweak the hover animation. Although a fun effect grey scaling the images when not hovered just makes the page bland.</li>
<li>Adjust the header colours or make a revised logo instead.</li>
</ul>
<p dir="auto">Each image thanks to the recent post tagging work <a class="plugin-mentions-user plugin-mentions-a" href="http://owforum.co.uk/uid/21">@biell</a> put in links to the image from the thread it was taken which is a neat feature :)</p>
<p dir="auto">So far I just have the 3 rules for getting in the gallery.</p>
<ul>
<li>No screenshots</li>
<li>Images must be of okay quality</li>
<li>Images with kids won't be added.</li>
</ul>
<p dir="auto">Should also work on mobile, spent some time tweaking the page so when scaled on that sort of screen it looks okay.</p>
<hr />
<h1><a class="anchor-offset" name="edit-1"></a>Edit 1:</h1>
<p dir="auto">Made a bunch of changes... omg why did I choose to build the site manually!<br />
<img src="/assets/uploads/files/1661045818027-capture.png" alt="Capture.PNG" class=" img-responsive img-markdown" width="1440" height="701" /></p>
<p dir="auto">First up I've gone for a basic grid that crops in the images to fit a pre-defined space and only scrolls vertically instead of the prior solution that was incredibly clunky after spending 2 days getting it working &gt;.&gt;</p>
<p dir="auto">Next up I changed the logo and moved the description into the footer.<br />
<img src="/assets/uploads/files/1661046000301-owforumarchivegallerymobile.png" alt="OWForumArchiveGalleryMobile.png" class=" img-responsive img-markdown" width="685" height="315" /></p>
<p dir="auto">Another change was to make the images not all load at once. Now they only load when you scroll to them. However they don't unload so I need to figure out how to do that else I doubt most will get very far before you inevitably run out of ram.</p>
<p dir="auto">Hovering over an image displays the original uploader and the date/time. I manually enter these as with the rest of the image link and post link so it takes a while to add these. As such I might be a bit more strict with what gets added.<br />
<img src="/assets/uploads/files/1661046138058-76a701ee-17c1-442f-bc6d-a64b94003037-image.png" alt="76a701ee-17c1-442f-bc6d-a64b94003037-image.png" class=" img-responsive img-markdown" width="649" height="610" /></p>
<p dir="auto"><img src="/assets/uploads/files/1661046225580-ac9374cc-7e20-4b4b-8642-3a7986ffe58b-image.png" alt="ac9374cc-7e20-4b4b-8642-3a7986ffe58b-image.png" class=" img-responsive img-markdown" width="1251" height="309" /></p>
<p dir="auto">And yes, mobile is usable :)<br />
<img src="/assets/uploads/files/1661046657423-img_10408.png" alt="IMG_10408.PNG" class=" img-responsive img-markdown" width="746" height="1613" /></p>
<hr />
<h1><a class="anchor-offset" name="edit-2"></a>Edit 2:</h1>
<p dir="auto">After some research Lazy load isn't ideal at all. I'll probably have to manually make thumbnails for each of these instead to lower the page size and up performance &gt;.&gt;</p>
<p dir="auto">Would be simpler if I wasn't pulling them from the archive and jut had  a singular directory with them all in but then cross-referencing them would be a nuisance. Slow simple solution is better than a slow complex one ;)</p>
]]></description><link>http://owforum.co.uk/topic/164/archive-gallery</link><guid isPermaLink="true">http://owforum.co.uk/topic/164/archive-gallery</guid><dc:creator><![CDATA[Lia]]></dc:creator><pubDate>Wed, 17 Aug 2022 21:01:45 GMT</pubDate></item><item><title><![CDATA[Missing Topic&#x2F;Page]]></title><description><![CDATA[<p dir="auto" style="text-align:center"><img src="/assets/uploads/files/1660520121901-40f2bda0-3f90-48d7-8bd3-8af16922d905-image.png" alt="40f2bda0-3f90-48d7-8bd3-8af16922d905-image.png" class=" img-responsive img-markdown" width="1062" height="686" /> </p>
<h1 style="text-align:center"><a class="anchor-offset" name="so-you-came-across-this-did-you"></a>So you came across this did you?</h1>
<hr />
<p dir="auto">Archiving wasn't a streamlined process and with 9600 different topic ID's to check. Not only was there the risk of me missing some but also Google Cache sometimes didn't have the data when requested.</p>
<p dir="auto">As such some topics that should exist are currently missing. Some deleted as they were spam created by bots and others removed by the author.  1180 topics were recovered in the process.</p>
<hr />
<h2><a class="anchor-offset" name="how-can-i-help"></a>How can I help?</h2>
<p dir="auto">Glad you asked ;) <a href="http://archive.org" rel="nofollow ugc">archive.org</a> is a great place to start looking for the content as their crawlers keep many versions of pages and retain the data.</p>
<p dir="auto">If you're wondering how to even start finding content on the wayback machine then <a href="https://web.archive.org/web/*/https://community.onewheel.com/topic/*" rel="nofollow ugc">here is a link that will help</a>. It's a pre-filled wildcard specifically looking at topics. Just modify the query with the topic number and see if the posts you're after were crawled by them. Then just check the latest working version and see if the page exists.</p>
<hr />
<h2><a class="anchor-offset" name="strong-for-example-strong"></a><strong>For example:</strong></h2>
<p dir="auto">One of my topics wasn't cached and got missed during the initial collection phase. This was 9588 being the XS thread. However you can find it there using the <a href="https://web.archive.org/web/*/https://community.onewheel.com/topic/*" rel="nofollow ugc">provided link</a> and adding the topic ID.<br />
<img src="/assets/uploads/files/1660583803892-68d5bb1f-ea9b-4729-840a-46553203dbd5-image.png" alt="68d5bb1f-ea9b-4729-840a-46553203dbd5-image.png" class=" img-responsive img-markdown" width="1812" height="429" /></p>
<p dir="auto"><img src="/assets/uploads/files/1660583907836-70e21212-f0e6-4dca-bcdd-f3bbf662a257-image.png" alt="70e21212-f0e6-4dca-bcdd-f3bbf662a257-image.png" class=" img-responsive img-markdown" width="1183" height="768" /><br />
<a href="https://web.archive.org/web/20211211030631/https://community.onewheel.com/topic/9588/onewheel-xs" rel="nofollow ugc">Captured page 1</a><br />
<a href="https://web.archive.org/web/20211216012411/https://community.onewheel.com/topic/9588/onewheel-xs/27" rel="nofollow ugc">Captured page 2</a></p>
<p dir="auto">Now there are more advanced ways of doing this so if there is something you really wanted archived that cannot be found with the above method I might be able to be a bit more forensic with a search. You'll be surprised how much hidden post data NodeBB loads on a page if you know where to look.</p>
<hr />
<p dir="auto">All you need to do is copy the link to that page and comment below with it and I'll curate the data before merging it into the site.</p>
<p dir="auto">Good luck and happy archiving ;)</p>
<p dir="auto"><img src="https://quizizz.com/media/resource/gs/quizizz-media/questions/f91bbffd-390f-4184-b7c6-af0af26f235b?w=90&amp;h=90" alt="alt text" class=" img-responsive img-markdown" /></p>
]]></description><link>http://owforum.co.uk/topic/159/missing-topic-page</link><guid isPermaLink="true">http://owforum.co.uk/topic/159/missing-topic-page</guid><dc:creator><![CDATA[Lia]]></dc:creator><pubDate>Sat, 13 Aug 2022 16:55:27 GMT</pubDate></item><item><title><![CDATA[Missing Posts]]></title><description><![CDATA[<p dir="auto">So you've been browsing the archive and seen some posts missing...<br />
<img src="/assets/uploads/files/1660404858840-c7bad6f7-c22f-4e69-8354-30f7c17df1ce-image.png" alt="c7bad6f7-c22f-4e69-8354-30f7c17df1ce-image.png" class=" img-responsive img-markdown" width="450" height="95" /></p>
<p dir="auto">Don't worry, not all is lost. Buried in the depths of <a href="http://archive.org" rel="nofollow ugc">archive.org</a> is a trove of the missing content waiting to be manually verified and merged back into the archive.</p>
<h4><a class="anchor-offset" name="span-style-color-5dc0fe-and-that-my-dear-floater-is-where-you-come-in-span"></a><span style="color:#5dc0fe">And that my dear floater is where you come in.</span></h4>
<p dir="auto">If you find any particular post you want restoring and have a link to any of the content please comment below with the topic in question and a link to the <a href="http://archive.org" rel="nofollow ugc">archive.org</a> page I can use to retrieve the info from.</p>
<hr />
<p dir="auto">If you're wondering how to even start finding content on the wayback machine then <a href="https://web.archive.org/web/*/https://community.onewheel.com/topic/*" rel="nofollow ugc">here is a link that will help</a>. It's a pre-filled wildcard specifically looking at topics. Just modify the query with the topic number and see if the posts you're after were crawled by them.</p>
<p dir="auto">Then it's just a case of linking the page here with the topic you want me to validate and I'll get around to merging it back into the archive.</p>
<p dir="auto">Good luck ;)</p>
<p dir="auto"><img src="http://media.tumblr.com/15e7024ceaf0e4e5404d64f742d062f1/tumblr_inline_mlk5zkoQff1qz4rgp.gif" alt="alt text" class=" img-responsive img-markdown" width="500" height="246" /></p>
]]></description><link>http://owforum.co.uk/topic/158/missing-posts</link><guid isPermaLink="true">http://owforum.co.uk/topic/158/missing-posts</guid><dc:creator><![CDATA[Lia]]></dc:creator><pubDate>Sat, 13 Aug 2022 15:50:23 GMT</pubDate></item></channel></rss>