Organising the Archive
-
@lia Thanks, I downloaded it and was expecting to find HTML files also, but all I seem to have ones ancillary/media files. Did I miss something. You will have to excuse me, I have been supporting a change this weekend and it didn't go well, so I have been working for about 12 hours so far today.
-
@biell Oh balls it didn’t compress the actual html links. I’ll fix that in the morning, sorry about that D:
Hope what you’re working on goes well, I know the feeling getting stuck on a work project that just doesn’t go anywhere.
-
@biell There we go, put all the files in this time (and I checked it again just in case lol).
20 FRANCE OneWheel Riders _ Onewheel Forum_files.zipThe HTML files have a corresponding folder that contains images and other dependencies that was downloaded when I saved the pages.
For the one I repaired I just merged all the folders into 1 then made sure all the references to the files in the master HTML were pointing at the master folder and not the individual ones if that makes sense.
-
*clears throat *
Haven't given the site an SSL certificate yet buuuuut...
archive.owforum.co.uk
Totally got Will Smith to show this one off for me. Thx Will x
.
Seems to work fine on mobile too so far. I'm tempted to put an iframe window on the right side to preview the clicked topic but need to figure out how to do that.
For now any feedback would be great (be gentle lol).
To Do List
- Add a custom header to all archived topics to return to the main page.
- Maaaaybe figure out an okay-ish search facility
- Git gud
-
@lia -- I am in awe!!! -- of your drive, of your dedication, and of your abilities!!!
-
@s-leon Thank you <3
You all give me the strength and desire to do all of this. I can't think of anything else that would have me attempt even a fraction of this.
-
-
@lia So, this should be pretty easy. The posts are all contained within a single giant unordered list. Each post is a list item, and there is metadata with a "data-index". So, both getting the correct post order and deduplicating entries from the different files will be a breeze.
All the files media/aux files under the _files structure are identical across all posts by name (e.g. 2170-profileimg.jpg is always the same picture).
If you are still interested, it would only take an hour or two to write the program, and then it could stitch all of this together. I would write it in perl, because I'm old. So, either you could upload zip files like this or I can easilly write it so the most basic perl install will work for you to run it yourself.
I can also program to rewrite the header to clean it up, for example I could remove the "Register" and "Login" links. I can also remove the text added by Google Cache, if you would like.
I can also remove the upvote/downvote links while I am at it, if you want. Or, I can completely remove that section, including the vote-count.
Let me know how you want it, I can then write version 1, send you the output, and we can tweek from there. But, this should all be very straight forward.
-
@biell music to my ears, removing those bits will be fine as I plan to go in and add a template header that’ll call to some master .css for the rest. Getting all the broken topics together would be a dream.
Would it be possible to have it spit out a txt or something per completed topic of missing post ids? I’ll go in after and add a placeholder list item and later try locating them on archive.org
I think we can keep the up/downvotes, Those might be interesting to see still and I’m working to repair the missing icons and later a way to do the timestamps.
Thanks for taking the time to have a look, will be sure to give you some credit on the archive page for helping simplify this mammoth task.
-
@lia Providing a list of missing posts should be very easy.
If you copy over from the current forum the "/assets" directory structure into an "/assets" in the archive website, then the missing icons should be fixed. From what I can tell, that is where they were, and where I hope for them to be looked for after I de-googleify the pages. Or, if you like, I could try to point the missing assets to this forum.
-
@lia What's wrong with the timestamps? Do you just want them rendered in UTC, or do you want them rendered localtime? If you want it in UTC, I can just hard code it into the page from the metadata and put that in the correct place. If you want it rendered in localtime, I should be able to add a touch of javascript to render that, but it will take longer to get working.
-
@biell sorry I meant the icons like up arrow, down arrow, pinned, online, offline etc. You probably noticed the header nav icons are missing. They appear to currently rely on some internal site icons from FontAwesome but I plan to add custom ones and reference them instead so they render.
Time stamps don’t render currently, seems like the data is there but compared to a working page the html elements are not there. They seem to exist as a meta tag only. If I gave you the template for a working timestamp element would it be simple to insert onto the posts using the meta data stored in these tags?
I’m genuinely impressed with how capable this all seems. I was previously looking to try doing something in python but it seemed way out of my league since I’m not even an amateur D:
-
@lia Send me what you want for the timestamp template, and I will see what I can do. There are lots of options. The best way to do all of this is to write the program to make all the changes you want, then it can just do the work for you all at once, straight from the source files. Then, if we want to tweak anything, we just update the program and re-run it against the source files. If you find more pages to fill in gaps, you just re-run the program with the new files in place, and it recreates everything exactly how you want it.
Those icons at the top are done the same way, they are esentialy from a font, linked to from the CSS file, and the forum stores that data under /assets. Here is an example, navigate to this page, then "View Source": https://owforum.co.uk/assets/fonts/glyphicons-halflings-regular.eot
-
-
@biell said in Organising the Archive:
Those icons at the top are done the same way, they are esentialy from a font, linked to from the CSS file, and the forum stores that data under /assets. Here is an example, navigate to this page, then "View Source": https://owforum.co.uk/assets/fonts/glyphicons-halflings-regular.eot
Good find, I did notice some of it appears to be in the client.css file but the icons don't show regardless. I'll likely simplify it and just create the icons myself then refer to them instead.
For the time stamps I've dug some more and think I found it, there's a class "timeago" that I assume might be missing and stopping it render.
Going to dig around and see if I can either find it or rebuild it.Edit: Ah never mind it's a Jquery D:In that case since they all exist like below:
<span class="visible-xs-inline-block visible-sm-inline-block visible-md-inline-block visible-lg-inline-block"> <a class="permalink" href="https://community.onewheel.com/post/3836"><span class="timeago" title="2015-09-09T19:09:53.329Z"></span></a>
Can it be replaced with this format:
</small> <small class="pull-right"> <span class="visible-xs-inline-block visible-sm-inline-block visible-md-inline-block visible-lg-inline-block"> <a class="permalink" href="">2015-09-09T19:09:53.329</span></a>
This closes the above "pull left" and sets the timestamp to pull right relying on the already existing closing statement. The href gets null'd along with the timestamp becoming detatched from the Jquery as just text. Ends up looking a bit like this:
(I've imported the client-darkly.css for this hence the difference in color)
If it's possible to have Perl re-write that as something more human readable like "9 September 2015, 09:53" that would be ideal :)
-
@lia said in Organising the Archive:
If it's possible to have Perl re-write that as something more human readable like "9 September 2015, 09:53" that would be ideal :)
lol uh oh @biell ... u know where this train goes...
-
@notsure Oh no is that an issue?
-
@lia said in Organising the Archive:
@notsure Oh no is that an issue?
lol no its probably super easy.
but it always starts with innocuous little requests. next thing u know, code everywhere...
-
@notsure Aha I get that, these should really be all that's needed due to the sheer volume of repetitive edits involved. I've narrowed the scope of the archive to purely the basics. Stripping everything other than just the content to later be improved if needed since I keep a copy of the raw un-edited posts I gathered. Bossman wonders why I have a 17TB Nas... this is why lol.
The rest I'm happy to manually scrub to clean a handful of things that would probably be a pain to automate due to edge cases.
-
@lia Yes, writing it in any format will be easy. The HTML code also has the time as milliseconds in epoch, so I will just use that with whatever strftime string is necessary for the format you want (probably "%d %B %Y, %H:%M UTC"). Are you cool with 24-hour time, or do you want AM/PM? Personally, I prefer 24-hour time. In pass 1, I will just print it in UTC. Given that this is an archive, that should be sufficient.
-
@lia said in Organising the Archive:
The rest I'm happy to manually scrub to clean a handful of things that would probably be a pain to automate due to edge cases.
You should at least let me know what they are, because if I can code for them, I can save a lot of time, and allow you to re-run the join later if we want to change something on lots of pages.
My rule on a computer is to only ever do something 3 times.
- Learn the process
- Code the process while performing it
- Run the code and fix up all the bugs
After that, run the automation every time.