Navigation

    The Onewheel Forum

    Onewheel Forum

    • Register
    • Login
    • Search
    • Categories
    • Recent
    • Popular
    • Users
    • Groups
    • Rules
    • Archive

    Organising the Archive

    The Archive
    archive old forum
    8
    121
    8135
    Loading More Posts
    • Oldest to Newest
    • Newest to Oldest
    • Most Votes
    Reply
    • Reply as topic
    Log in to reply
    This topic has been deleted. Only users with topic management privileges can see it.
    • Lia
      Lia GT XR Pint Plus V1 DIY last edited by Lia

      Code works. Thanks @biell
      Squee.gif
      It's churned through all the topics. Total is just over 1000 currently which is pretty decent.
      Made a really simple .bat to run the perl script and spit out the console into a txt so I have the page stats to hand for later refining :3

      However I am now manually adding the links to those pages onto the homepage.
      Wish me luck...


      9767aa7d-d586-4b9a-8270-637723fb9a3a-image.png
      65cc8971-44c3-4c43-8156-8c28c171cb5e-image.png


      Updating this as I go

      Topics 0-9600 links have been added
      Only what exists of course. (1180/1180)

      B 1 Reply Last reply Reply Quote 5
      • B
        biell Plus GT DIY @Lia last edited by

        @lia Great news, I have been worried about this for a bit. If you run into any issues, don't hesitate to reach out.

        Lia 1 Reply Last reply Reply Quote 3
        • Lia
          Lia GT XR Pint Plus V1 DIY @biell last edited by Lia

          @biell I'm steamrolling adding the links to the homepage now they're practically all there :) Thank you for getting me to the home stretch! Hope you enjoy seeing your contribution in action as much as I am with all the pages pretty much being restored :D

          Sorry the delay worried you. The past month has been brutal D:

          I think I somehow broke 2 bits but not sure what, maybe me editing some bits busted the formatting.
          Timestamps aren't generating and some avatars are being broken :( The script mentions line 344 has a regex error or something.

          The slightly modified script is below. I altered the value for "logo_ht" from 80 to 60 and changed the image on my end to a newer version.

          #!/usr/bin/perl
          
          =head1 NAME
          
          forum-archive - Put a google chached community.onewheel.com thread back together
          
          =cut
          
          use IO::Handle;
          use HTTP::Request;
          use LWP::UserAgent;
          use File::Copy;
          use File::Path;
          use File::Find;
          use Pod::Usage;
          use POSIX;
          
          my($HEADER, @POSTS, $FOOTER, $COUNT, $WEIRD);
          my(%META)=(
          	'base'		=> 'https://archive.owforum.co.uk/',
          	'logo'		=> 'http://archive.owforum.co.uk/Images/OWForumArchive.png',
          	'logo_ht'		=> '60',
          	'profiles'	=> '../../../assets/uploads/profile',
          	'resources'	=> '../../../assets/resources',
          	'system'		=> '../../../assets/uploads/system',
          );
          
          my(%RESOURCES)=(
          	'fonts'		=> 'https://cdnjs.cloudflare.com/ajax/libs/font-awesome/5.15.4/css/all.min.css',
          	'style'		=> 'https://owforum.co.uk/assets/client-darkly.css',
          	'icon'		=> 'https://owforum.co.uk/assets/uploads/system/favicon.ico',
          	'broken'		=> 'https://icon-icons.com/downloadimage.php?id=5390&root=39/PNG/128/&file=brokenfile_5952.png',
          );
          
          my(%TOPICS)=();
          my(%SEARCH_PATH)=();
          
          =head1 SYNOPSIS
          
          forum_archive <directory1> [directory2] [directory3] [...]
          
          =head1 DESCRIPTION
          
          This script takes a set of files downloaded from the google page cache
          for the community.onewheel.com NodeBB forum and tries to put it back
          together.
          
          =head2 Content Management
          
          A small number of resources are available from the internet.  B<forum_archive>
          will download these assets, if needed, for inclusion to the archive file
          structure.  If assets have been downloaded recently (within the last day)
          then new downloads are not attempted.  This should keep B<forum_archive> from
          slamming remote resources during testing phases.
          
          =cut
          
          sub wget {
          	my($src, $asset, $dst)=@_;
          	my($file)=IO::Handle->new;
          	my($req)=HTTP::Request->new( 'GET' => $src );
          	my($get)=LWP::UserAgent->new;
          	my($response);
          
          	$asset=~s|^[/.]+||;
          	
          	if(!-s "$asset/$dst" || -M "$asset/$dst" > 1 ) {
          		$response=$get->request($req);
          		if($response->is_success) {
          			File::Path::make_path($asset, { 'chmod' => 0755 });
          			open($file, '>', "$asset/$dst");
          			print $file $response->decoded_content;
          			close($file);
          		}
          	}
          }
          
          =pod
          
          B<forum_archive> will dynamically create the C<assets> and C<topic>
          directory structures as needed to store content found within the
          post.  In an effort to increase efficiency for commonly used content,
          such as avatars, actual copying of these files will not occur each time
          such content is seen, after its initial copy.
          
          =cut
          
          sub location {
          	my($file)=@_;
          	my($try);
          
          	if(-r $file) {
          		return($file);
          	} else {
          		foreach my $dir (keys(%SEARCH_PATH)) {
          			$try=join('/', $dir, $file);
          			return($try) if(-r $try);
          		}
          	}
          
          	&wget($RESOURCES{'broken'}, $META{'resources'}, 'broken-file.png');
          	return(join('/', $META{'resources'}, 'broken-file.png'));
          }
          
          sub copy {
          	my($src, $asset, $dst)=@_;
          
          	$asset=~s|^[/.]+||;
          
          	if(!-s "$asset/$dst" || -M "$asset/$dst" > 1 ) {
          		File::Path::make_path($asset, { 'chmod' => 0755 });
          		File::Copy::copy(&location($src), "$asset/$dst");
          	}
          	
          }
          
          =pod
          
          Avatar image are stored in a central location, shared by the entire
          archive.  This lowers the space requirements of the archive and
          increases page load times and browser cache efficiency.
          
          =cut
          
          sub avatar {
          	my($img)=@_;
          	my($dst)=$img; $dst=~s|^.*/||;
          
          	&copy($img, $META{'profiles'}, $dst);
          
          	return(join('/', $META{'profiles'}, $dst));
          }
          
          =pod
          
          Uploaded images which are stored in the archive may be named slightly
          differently on the archive than on the original.  NodeBB has gone through
          a couple iterations about how to handle this conflict, and B<forum_archive>
          tries to handle this by using the more unique C<ALT> tag element parameter
          name.  When that doesn't work, the original name is kept.  Images are also
          grouped by post, to avoid naming conflicts between different posts.
          
          Additionally, if an image is referenced in a post, but is not contained in
          the archive, a standard broken file image is substituded.
          
          =cut
          
          sub upload {
          	my($src, $alt)=@_;
          
          	if($alt=~m/\.\w+$/) {
          		&copy($src, $META{'path'}, $alt);
          
          		return(sprintf('<img src="%s" alt="%s"', $alt, $alt));
          	} else {
          		my($new)=$src; $new=~s|^.*/||;
          
          		&copy($src, $META{'path'}, $new);
          		return(sprintf('<img src="%s" alt="%s"', $new, $alt));
          	}
          }
          
          =head2 Archive Display
          
          One major change in the archive from the original is the banner.  The original
          banner is replaced by one tailored to the archive, to set it apart from
          the original forum and make it clear it is a wholly different entity.
          
          =cut
          
          sub banner {
          	my($start, $img, $end)=@_;
          	return($start.qq{
          		<div class="container">
          		  <div class="navbar-header">
          		    <a href="http://archive.owforum.co.uk">
          		      <img alt="The Archive homepage" src="$META{'logo'}" height="$META{'logo_ht'}">
          		    </a>
          		  </div>
          		  <div class="navbar-header pull-right">
          		    <p class="text-right" style="padding-top: 10px">
          		      This page is an archived copy of the old Onewheel Forum.
          		    </p>
          		  </div>
          		</div>
          	}.$end);
          }
          
          =pod
          
          One of the differences which makes the archive different is, unfortunately,
          that some posts are missing.  When this occurs, B<forum_archive> inserts a
          break in the timeline with a note about the message IDs which are absent.
          
          =cut
          
          sub missing_post {
          	my($id)=@_;
          
          	return(qq{
          	  <li component="topic/necro-post" class=" necro-post timeline-event" data-index="$id">
          	    <small class="timeline-text">Post(s) $id missing from the archive</small>
          	  </li>
          	});
          }
          
          =pod
          
          An interactive, HTML5 based NodeBB forum requires a lot of javascript to
          work.  Since the archive is a static copy of that data, all of the javascript
          is removed, and the archive works nearly identically on all platforms.
          
          =cut
          
          sub global {
          	s|https?://community.onewheel.com/|$META{'base'}|sg;
          
          	s|<noscript>.*?</noscript>||sg;
          	s|<script>.*?</script>||sg;
          	s|<script .*?></script>||sg;
          
          	s|\s+<div component="topic/reply/container" .*?</div>||s;
          	s|\s+<a component="topic/reply/guest" .*?</a>||m;
          
          	s|class="posts"|class="posts timeline"|mg;
          	s|\n\n<hr>\n||sg;
          	if(m|<span component="topic/post-count".*?>(\d+)</span>|m) {
          		$COUNT=$1;
          	}
          }
          
          =pod
          
          B<forum_archive> assumes that all the headers from all the source files
          are identical, and uses the first one it finds.  With that content,
          the new banner is inserted, interactive metadata and buttons are removed,
          and the new style is setup. B<forum_archive> also collects important
          information like the page path and total message count.
          
          =cut
          
          sub header {
          	local($_)=@_;
          
          	#Cleanup to a reasonable starting header only
          	s/(<ul component="topic" class="posts timeline" .*?>\s+).*$/$1/s;
          
          	s/(<body .*?>).*$/$1/m;
          
          	#Grab some important info
          	if(m|<link rel="canonical" href="($META{'base'}(.*?))">|) {
          		$META{'url'}=$1;
          		$META{'path'}=$2;
          	}
          
          	#reset links
          	#strip out unneeded content
          	s|(<meta property="og:url" content=".*?)/\d+\?.*?">|$1">|mg;
          
          	s|\s+<meta name="msapplication-\w+" .*?>||sg;
          
          	s|\s+<link rel="icon" sizes=.*?>||sg;
          	s|\s+<link rel="prefetch" .*?>||sg;
          	s|\s+<link rel="prefetch stylesheet" .*?>||sg;
          	s|\s+<link rel="manifest" .*?>||sg;
          	s|\s+<link rel="search" .*?>||sg;
          	s|\s+<link rel="apple-touch-icon" .*?>||sg;
          	s|\s+<link rel="alternate" .*?>||sg;
          	s|\s+<link rel="next" .*?>||sg;
          	s|\s+<link rel="prev" .*?>||sg;
          
          	&wget($RESOURCES{'icon'}, $META{'resources'}, 'favicon.ico');
          	s|(<link rel="icon" type="image/x-icon" href=").*?">|$1$META{'resources'}/favicon.ico">|mg;
          
          	&wget($RESOURCES{'style'}, $META{'resources'}, 'client-darkly.css');
          	s|<link rel="stylesheet" .*?>|<link rel="stylesheet" href="$META{'resources'}/client-darkly.css">\n\t<link rel="stylesheet" href="$RESOURCES{'fonts'}">|s;
          
          	if(m|forum-logo" src="(.*?/site-logo.png)"|m) {
          		&copy($1, $META{'system'}, 'site-logo.png');
          		s|forum-logo" src=".*?"|forum-logo" src="$META{'system'}/site-logo.png"|mg;
          	}
          
          	s|(<h1 component="post/header" .*?)>|$1 style="padding-top: 50px;">|m;
          
          	s|\s+<section class="menu-section".*?</section>||s;
          
          	#Insert new banner
          	s|(<nav class="navbar navbar-default navbar-fixed-top header".*?>).*?<img alt="Onewheel Home Page" class=" forum-logo" src="(.*?)">.*?</nav>|&banner($1, $2, '</nav>')|se;
          
          	#Remove unnecessary buttons
          	s|\s+<a class="hidden-xs" target="_blank".*rss.*</a>||mg;
          	s|\s+<div title="Sort by" .*?</div>||s;
          	s|<li>[^RL]+<span>Register</span>.*?</li>||gs;
          	s|<li>[^RL]+<span>Login</span>.*?</li>||gs;
          	s|<a component="topic/reply/guest" .*?</a>\s*||s;
          	s|<ol class="breadcrumb">.*?</ol>||s;
          
          	s|<span class="hidden-xs">Loading More Posts</span> <i .*?</i>||mg;
          
          	s|class="slideout-panel" style=".*?"|class="slideout-panel"|m;
          
          	return($_);
          }
          
          =pod
          
          A lot of cleanup occurs within each forum post.  Firstly, and with the
          javascript removed, all times are calculated and coded directly in UTC.
          Interactive buttons are removed, and links to content (such as user
          pages) not contained in the archive are also removed.  Other interactive
          content (e.g. online status) is removed, too.
          
          Media, such as avatars and uploaded content is collected and placed
          properly into the new archive filesystem structures.
          
          =cut
          
          sub post {
          	local($_)=@_;
          	my($time);
          	
          	if(m/data-timestamp="(\d+)"/s) {
          		$time=POSIX::strftime("%e %B %Y, %H:%M UTC", gmtime($1/1000));
          		s|(><span class="timeago") title="(.+?)">|$1 title="$time" datetime="$2">$time|sg;
          	}
          
          	s|<span class="replies-last .*</span>||mg;
          	s|<a component="post/parent" .*?>(.*?)</a>|$1|mg;
          	s|<i component="user/status" .*?></i>||mg;
          	s|<a href=".*?/user/.*?">(.*?)</a>|<span class="btn-link">$1</span>|sg;
          	s|<a class="plugin-mentions-user .*?>(.*?)</a>|<span class="btn-link">$1</span>|mg;
          	s|<a href="[^"]+/user/.*?">\s+(<span class="avatar.*?>)\s+</a>|$1|sg;
          	s|(?<= component="user/picture" src=")([^"]+)|&avatar($1)|meg;
          	s|(?<= component="avatar/picture" src=")([^"]+)|&avatar($1)|meg;
          	s|<img src="(.*?)" alt="(.*?)"(?= \s*class="\s*img-responsive)|&upload($1, $2)|meg;
          	s|<a (component="post/reply-count".*? href=").*?/(\d+)[?#].*?(".*?)>|<a $1#$2$3>|mg;
          	s|\s+<i component="post/edit-indicator".*?</i>||mg;
          	s|\s+<i class="fa fa-fw fa-chevron-right".*?</i>||mg;
          	s|\s+<i class="fa fa-fw fa-chevron-down hidden".*?</i>||mg;
          	s|\s+<i class="fa fa-fw fa-spin fa-spinner hidden".*?</i>||mg;
          	s|\s+<small class="pull-right">\s+<span class="bookmarked">.*?</span>\s+</small>||sg;
          
          	s|(?<= class="avatar" src=")([^"]+)|&avatar($1)|meg;
          	s|(?<= component="user/picture" data-uid="\d{1,5}" src=")([^"]+)|&avatar($1)|meg;
          	s|(<img component="user/picture")|$1 class="avatar  avatar-sm2x avatar-rounded"|mg;
          	s|(data-uid="\d+") class="user-icon"|$1 class="avatar  avatar-sm2x avatar-rounded"|mg;
          	s|(title="\w+") class="user-icon"|$1 class="avatar  avatar-xs avatar-rounded"|mg;
          	s|id="[^"]*google-cache-hdr"||sg;
          	s|This is Google's cache of||sg;
          
          	return($_);
          }
          
          =pod
          
          Similarly to the header, the HTML after all the posts is based on the first
          file seen and removes some of the content better suited to an interactive
          stite than a static, archive site.
          
          =cut
          
          sub footer {
          	local($_)=@_;
          
          	s|<div class="progress-bar"></div>||s;
          	s|<div class="spinner" role="spinner"><div .*?</div></div>||s;
          	s|<div id="nprogress">.*?</div></div></div>||s;
          
          	return($_);
          }
          
          =head2 Data Import Process
          
          Each downloaded F<.html> file from the forum is read and separated into
          3 sections, a header, a list of posts, and a footer.  The first file's
          header will be processed and used as the archive files header, same with
          the footer.
          
          Each post is pulled into an array.  If a post occurs in multiple downloaded
          cache files, then the last one read is kept.  Each one is processed and
          prepared for the final archive topic.
          
          =cut
          
          sub ingest {
          	my($source)=@_;
          	my($html)=IO::Handle->new;
          	my($move, $category_url, $category);
          	local($/)="\n\t\t\t\t</li>\n\t\t\t";
          
          	open($html, '<', $source);
          	while(<$html>) {
          		&global;
          
          		if(m/^<!DOCTYPE html>/) {
          			if(m/class="nprogress-busy"/) {
          				$WEIRD=0;
          			} else {
          				$WEIRD=1;
          			}
          
          			if(!$HEADER || !$WEIRD) {
          				$HEADER=&header($_);
          			}
          			s/^.*<ul component="topic" class="posts timeline" .*?>\s+\n//s;
          		}
          
          		if(m|</html>$|) {
          			if(!$FOOTER || !$WEIRD) {
          				$FOOTER=&footer($_);
          
          				if($FOOTER=~s|(<div class="post-bar">.*\n<hr>\n\t\t</div>)||s) {
          					$move=$1;
          					($category)=($HEADER=~m|<meta property="article:section" content="(.*?)">|m);
          					($category_url)=($HEADER=~m|<link rel="up" href="(.*?)">|m);
          					$category=~s/&amp;/&/g;
          
          					$HEADER=~s|</h1>\n|</h1>\n$move|s;
          					$HEADER=~s|<div class="tags pull-left">.*<div class="topic-main-buttons pull-right">|<div class="topic-main-buttons pull-left"><a href="$category_url">$category</a>|s;
          
          					$HEADER=~s|class="stats hidden-xs"|class="stats text-muted"|mg;
          					$HEADER=~s|(<span component="topic/post-count" class="human-readable-number" title="\d+">\d+</span>)<br>\s+<small>Posts</small>|<i class="fa fa-fw fa-pencil" title="Posts"></i>$1|s;
          					$HEADER=~s|(<span class="human-readable-number" title="\d+">\d+</span>)<br>\s+<small>Views</small>|<i class="fa fa-fw fa-eye" title="Views"></i>$1|s;
          				}
          
          			}
          			last;
          		}
          
          		if(m/data-index="(\d+)"/) {
          			$POSTS[$1]=&post($_);
          		}
          	}
          	close($html);
          }
          
          =head2 Execution
          
          The script expects a directory structure of HTML files which have valid links
          to media files.  Other than that, it is pretty agnostic about the structure
          of the directory.  It will read the header to find out what the name of the
          document should be, create it, and write to it.
          
          In addition to processing archived posts, a special post is inserted for
          anything missing.  B<forum_archive> will also produce a report on F<STDOUT>
          with information on missing posts.
          
          =cut
          
          sub process {
          	my($html)=IO::Handle->new;
          	my($posts, $total)=(0, 0);
          	my(@missing)=();
          
          	$HEADER="";
          	@POSTS=();
          	$FOOTER="";
          	$COUNT=0;
          
          	foreach my $entry (@_) {
          		&ingest($entry);
          	}
          
          	for(my $i=0; $i<$COUNT; $i++) {
          		if(!exists($POSTS[$i])) {
          			my($begin);
          
          			for($begin=$i; !exists($POSTS[$i+1]) && $i<$COUNT; $i++) {
          				$posts++;
          				$total++;
          			}
          
          			if($i==$begin) {
          				$POSTS[$i]=&missing_post($i);
          			} else {
          				$POSTS[$i]=&missing_post("$begin-$i");
          				push(@missing, "$begin-$i");
          			}
          
          			$posts++;
          		}
          		$total++;
          	}
          
          	if($total) {
          		printf("%s, Total: %d, Coverage: %d%%, Missing: %s\n", $META{'path'},
          			$total, (1-$posts/$total)*100, join(' ', @missing) || 'None');
          	} else {
          		printf("%s, Total: %d\n", $META{'path'}, $total);
          	}
          
          	File::Path::make_path($META{'path'}, { 'chmod' => 0755 });
          	open($html, '>', join('/', $META{'path'}, 'index.html'));
          	print $html $HEADER;
          	print $html @POSTS;
          	print $html $FOOTER;
          	close($html);
          }
          
          if($ARGV[0] =~ m/^-+h/i) {
          	pod2usage(-verbose => 2, -exitval => 0);
          } elsif(! -d $ARGV[0]) {
          	pod2usage(-verbose => 1, -exitval => 0);
          }
          
          find(sub {
          	$File::Find::prune=1 if(m/^assets$/);
          	$File::Find::prune=1 if(m/^topic$/);
          
          	if(m/(\d+)\s+.*\.html$/) {
          		push(@{$TOPICS{$1}}, $File::Find::name);
          		$SEARCH_PATH{$File::Find::dir}=1;
          	}
          }, @ARGV);
          
          foreach my $topic (sort({ $a <=> $b }  keys(%TOPICS))) {
          	&process(sort(@{$TOPICS{$topic}}));
          }
          
          
          =head1 NOTES
          
          B<forum_archive> is basically a conglomeration of regular expressions.  This
          is by no means the best way to manage and manipulate complext HTML files.
          However, given the static nature of this content and its relative complexity,
          using regular expressions requires a substantially smaller code base and
          interpretation of the original source files.  Essentially, in this case,
          it is too much easier to strip out the junk you know you don't want than
          to understand the entire document schema fully enough to make the meaningful
          changes the right way.
          
          =pod
          

          B 1 Reply Last reply Reply Quote 2
          • B
            biell Plus GT DIY @Lia last edited by biell

            @lia Line 344 does have to do with avatars

            s|(?<= component="user/picture" data-uid="\d{1,5}" src=")([^"]+)|&avatar($1)|meg;
            

            That should be OK. Maybe the version of perl you are using has an issue with variable width look-behind, but I purposely used \d{1,5} instead of \d+ so it wouldn't be infinite (which should cause an issue).

            What version of perl perl -v do you have?

            Edit: misspelled "width"

            Lia 1 Reply Last reply Reply Quote 2
            • Lia
              Lia GT XR Pint Plus V1 DIY @biell last edited by Lia

              @biell Look-behind was the message it gave, well caught.
              I'm using Strawberry Perl 5.32.1.1-64bit on a Windows10 machine

              Looks like my version of perl doesn't work with either. Gave \d+a a shot and it failed to run. What version do you run and I'll see if I can use that. Might explain why timestamps didn't load either :3

              NotSure B 2 Replies Last reply Reply Quote 1
              • NotSure
                NotSure XR Pint @Lia last edited by NotSure

                @lia said in Organising the Archive:

                @biell Look-behind was the message it gave, well caught.
                I'm using Strawberry Perl 5.32.1.1-64bit on a Windows10 machine

                Looks like my version of perl doesn't work with either. Gave \d+a a shot and it failed to run. What version do you run and I'll see if I can use that. Might explain why timestamps didn't load either :3

                virtualized/containerized i hope. with a proxy. @biell

                XR's got what plants crave!

                Lia 1 Reply Last reply Reply Quote 1
                • Lia
                  Lia GT XR Pint Plus V1 DIY @NotSure last edited by Lia

                  @notsure Work laptop, completely isolated from everything including work since I don't have the VPN hooked up.
                  I use the laptop as a little sandbox that I reformat every now and then when it gets a little funky :)

                  Back to adding more links to the homepage. Dropped off at 4am last night ...this morning? Oh dear more energy drinks needed.

                  NotSure 1 Reply Last reply Reply Quote 1
                  • NotSure
                    NotSure XR Pint @Lia last edited by

                    @lia said in Organising the Archive:

                    I use the laptop as a little sandbox that I reformat every now and then when it gets a little funky :)

                    vm. containerize. proxy. time-gated, off-site backups. @biell !!!

                    XR's got what plants crave!

                    1 Reply Last reply Reply Quote 2
                    • Lia
                      Lia GT XR Pint Plus V1 DIY last edited by Lia

                      So going through these posts I'm seeing a ton of amazing pics that unless you actually go digging through threads most will never see. like look at this one from DreamTour in the "Your Pint Shipping Date".
                      alt text

                      Anyone think it might be a cool idea to have a gallery page on the archive that carousels some random pics while also giving access to a gallery of pics ranging in time uploaded?

                      NotSure 1 Reply Last reply Reply Quote 3
                      • B
                        biell Plus GT DIY @Lia last edited by

                        @lia I added two fixes here: https://drive.google.com/file/d/1FFmb1LVADMPUvIMumuJdX50xRmqIq7iM/view

                        I assume Strawberry perl is having trouble with the zero-width look-behind, so I removed it and replaced it with a slightly less efficient construct.

                        I am also assuming that Strawberry perl doesn't have a proper POSIX::strftime (Windows isn't POSIX compliant, so this wouldn't be surprising). I just added my own little function to do the same thing for this specific instance:

                        my(@MONTH)=qw(
                             January February March April May June July
                             August September October November December
                        );
                        sub utctime {
                             my($epoch)=int($_[0]/1000);
                             my($sec, $min, $hr, $day, $month, $year, $wd, $jd, $dst)=gmtime($epoch);
                        
                             return(sprintf("%d %s %d, %02d:%02d UTC",
                                  $day, $MONTH[$month], $year+1900, $hr, $min));
                        }
                        

                        You should be able to just rerun this version and it will overwrite the old files with fixed ones. Topic 22 had examples of the messed up avatar.

                        Also, LET'S GO, SPURS!!!

                        Lia 1 Reply Last reply Reply Quote 2
                        • Lia
                          Lia GT XR Pint Plus V1 DIY @biell last edited by

                          @biell Amazing, it worked :D Thank you so much!

                          I can't seem to get the favicon to work for some reason.
                          Tried to strip out the type="image/x-icon" from the below but that seems to cause the favicon section to then pull extra text and break.

                          s|(<link rel="icon" type="image/x-icon" href=").*?">|$1$META{'resources'}/favicon.ico">|mg;
                          

                          Spent an hour trying to figure it out but I'm no good with interpreting code lol.

                          Currently the end result is the following line:
                          <link rel="icon" type="image/x-icon" href="../../../assets/resources/favicon.ico">
                          Could it be tweaked to instead generate:
                          <link rel="icon" href="../../../assets/resources/OWForumArchiveIcon.png">

                          B 1 Reply Last reply Reply Quote 1
                          • NotSure
                            NotSure XR Pint @Lia last edited by NotSure

                            @lia said in Organising the Archive:

                            Anyone think it might be a cool idea

                            duh!

                            XR's got what plants crave!

                            1 Reply Last reply Reply Quote 3
                            • B
                              biell Plus GT DIY @Lia last edited by

                              @lia So, that was my bad. I was downloading to assets/resources, but looking for it in assets/system/uploads :(

                              Updated now so that the META config at the top points to the location you want for 'icon'. This way you can change it in the future if you want to, e.g. put it in /Images with OWForumArchive.png.

                              https://drive.google.com/file/d/1FFmb1LVADMPUvIMumuJdX50xRmqIq7iM/view

                              Lia 1 Reply Last reply Reply Quote 2
                              • Lia
                                Lia GT XR Pint Plus V1 DIY @biell last edited by Lia

                                @biell Ah found the issue, there's something in the topmost tags that's busting the favicon.

                                Surrounding the top <style> tag is some <body> and <head> which although don't mention an icon I think they're causing the browser to ignore the tag later. Used topic 20 to test as an example below.

                                Starts with a <html> tag that we can keep

                                <!DOCTYPE html>
                                <html lang="en-US" data-dir="ltr" style="direction: ltr;" class="nprogress-busy">
                                

                                Then the element that seems to break it starts.

                                <head><meta http-equiv="Content-Type" content="text/html; charset=UTF-8"><!--<base href="https://archive.owforum.co.uk/topic/20/france-onewheel-riders/990?page=110&amp;lang=en-US">--><base href=".">
                                

                                In between is a style tag that we need to keep as removing this bit actually breaks some elements.

                                <style>body{margin-left:0;margin-right:0;margin-top:0}#bN015htcoyT__google-cache-hdr{background:#f8f9fa;font:13px arial,sans-serif;text-align:left;color:#202124;border:0;margin:0;border-bottom:1px solid #dadce0;line-height:16px;padding:16px 28px 24px 28px}#bN015htcoyT__google-cache-hdr *{display:inline;font:inherit;text-align:inherit;color:inherit;line-height:inherit;background:none;border:0;margin:0;padding:0;letter-spacing:0}#bN015htcoyT__google-cache-hdr a{text-decoration:none;color:#1a0dab}#bN015htcoyT__google-cache-hdr a:hover{text-decoration:underline}#bN015htcoyT__google-cache-hdr a:visited{color:#4b11a8}#bN015htcoyT__google-cache-hdr div{display:block;margin-top:4px}#bN015htcoyT__google-cache-hdr b{font-weight:bold;display:inline-block;direction:ltr}</style>
                                

                                Then a closing section to the earlier <head> and <body> which needs removing.

                                </head><body class="page-topic page-topic-20 page-topic-france-onewheel-riders page-topic-category-2 page-topic-category-general-discussion parent-category-2 page-status-200 user-guest skin-noskin">
                                

                                I've "repaired" topic 20 is you want to take a peak
                                https://archive.owforum.co.uk/topic/20/france-onewheel-riders/index.html

                                B 2 Replies Last reply Reply Quote 2
                                • B
                                  biell Plus GT DIY @Lia last edited by

                                  @lia Sorry, I didn't realize all that junk was up there. Firefox was honoring a lot of that meta content outside of HEAD, which it shouldn't have, so I didn't notice it. The latest version with a fix (same google drive link) is updated and ready.

                                  Are you sure that <style>...</style> section is necessary, what did it break? I tried wiping all of it out and couldn't find any issues.

                                  Lia 1 Reply Last reply Reply Quote 2
                                  • B
                                    biell Plus GT DIY @Lia last edited by

                                    @lia I also just uploaded a version where the dates are no longer "permalink" items to post links which don't exist.

                                    1 Reply Last reply Reply Quote 2
                                    • Lia
                                      Lia GT XR Pint Plus V1 DIY @biell last edited by Lia

                                      @biell If I inspect the page with F12 and remove the style a few errors show up so presumably that section is referenced. Probably not worth trying to find and remove the references so I think keeping that small <style> section will be fine.

                                      B 1 Reply Last reply Reply Quote 2
                                      • B
                                        biell Plus GT DIY @Lia last edited by

                                        @lia I removed the post hamburger menu and made the upvote/downvote chevrons no longer links (if you clicked on any them the do nothing except take you to the top of the page).

                                        Also, in doing that, I noticed that sometimes the timestamp is to the left because different posts have different HTML layouts. So, I fixed that too.

                                        Same google drive link.

                                        Lia 1 Reply Last reply Reply Quote 2
                                        • Lia
                                          Lia GT XR Pint Plus V1 DIY @biell last edited by Lia

                                          @biell That's amazing thank you!

                                          I noticed that sometimes the timestamp is to the left because different posts have different HTML layouts. So, I fixed that too

                                          I did notice this but felt it wasn't worth bothering you over. Really appreciate you going out of your way to fix that. I assume that's from the pages that didn't get saved normally.

                                          Currently re-running the script and preparing to replace the live data after. Finished and swapped out to the new data, looks good!!!

                                          Just got to make my way though adding the links. So many that I might paginate the homepage a little. Anyone finding any issues with load times or overall experience so far using the archive?

                                          B 1 Reply Last reply Reply Quote 2
                                          • B
                                            biell Plus GT DIY @Lia last edited by biell

                                            @lia For the main page, I feel like it should have a filter so you can type in something and search for topics. If you are interested, I did a mock-up here:

                                            https://drive.google.com/file/d/1KCtjxqxqH3dJZPpe7QhAcb4v0qUWGS6c/view

                                            I just threw this together in a few minutes, so the input box is literally just thrown on there in some semi-reasonable place. Essentially, as you type, it goes through the list items and changes anything which doesn't match to "display: none" and anything which does match to "display: block".

                                            You can ignore the <base> tag, as I just needed that to make my copy work. Then, I have a new <style> which should be moved to your style.css, and a <script> which can stay there or go into it's own file. After that, I added an id to your <ul> element so I could find it, and put the <input> element on the page to type into.

                                            Lia 1 Reply Last reply Reply Quote 2
                                            • First post
                                              Last post