• The development of any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system, is prohibited using the contents and materials on this website.

Calling all nerds: Is there a way to download (even a stripped-down) archive of an entire forum?

NecroJoe

Stool Chef
Joined
Apr 12, 2005
Messages
24,073
Location
San Francisco area, CA, USA
Car(s)
2015 Mazda 3 S GT, 2015 VW e-Golf
I believe I posted about this in the "random thoughts" thread a while back, but the replies were over my head then, so I moved on without pursuing it any further. However, as of today, it's officially "do or die" time, and I'd appreciate any thoughts/advice I can get.

My favorite forum related to one of my hobbies is closing up shop at the end of this month. It's run by a manufacturer, and they are planning to simply let it vanish, and not archive any of it, when they disconnect from this old server (their new website has been totally re-vamped and on different servers, and they just want all of this to happen on social media rather than their (infrequently-visited) BBS, especially since their staff no longer really monitors it.

I was "leaked" the information about the closing, and it's got me really really down. My own posts make up just over 5% of all of the posts there despite only making up only .00659% of the user base. :ROFLMAO:😅😳🤔:cry:

I know nearly nothing about the back-end of a forum/bbs, but I know vBulletin and phpBB are two similar things, and I believe this forum is based on phpBB.

Question 1: As a non-admin (and without the admin's help), is there any way to archive this, even in a huge batch of text documents?

Question 2: Is something the admin could do that doesn't require much/any effort on their end, something that could make this way way way easier/faster/possible, what would I need to ask for specifically? Like, if what I want to do is literally impossible, or let's say if the only data I'd get would purely be HTML source code unless the admin flipped some "switch" or enabled some setting that meant the data could've much more useable, or downloaded more-automatically or faster...what would that look like?

The BBS dates back to 2007, and there are 392,003 total posts in 26,223 total topics, in case that helps figure out the scale of the job.

In my head, a GREAT end result would be a zip of 26,223 .txt documents, one for each topic, with all of the visible post info: titles, members, dates, and text body. If it could retain some formatting, even better. (With images? Even better better...but I assume that'd be impossible). Or if not a .zip, some sort of way to let a batch-download process run and save them all individually into a folder on my machine.

Any thoughts would be appreciated.
 
As an admin, exporting the database [sans user table] and the software should be possible, but probably leads to a) work b) legal questions if they're an official company operation.

As a user, you could wget everything: https://en.wikipedia.org/wiki/Wget
Would pretty much run similar to a search engine's spider bot, downloading each page and crawling it for more links to hop to. You'd end up with a massive directory of html files.
 
Is there anything on the internet archive to eliminate the need for doing this? If so, maybe it's possible to get an archive from them.

Edit: Powershell can download pages, the issue would be navigating through the links to get everything.
 
Last edited:
httrack might work (free and open-source). Make sure not to hammer the site.

IDM and bulk image download might help with images and files
internet download manager or bulk image downloader (paid)

Jdownloader2 (adware free version) can help with files


There are probably other tools available. Maybe search for 'website copy' or 'website clone'
 
Last edited:
A similar piece of software

Then, if you want to make it widely available is to use archive.ph or start saving them on the internet archive.

This is, of course, unless you can persuade someone to give you a SQL dump, yes.
 
I appreciate all of the suggestions. I like to think I'm not stupid, but I wasn't able to figure out a solution. Some of the ones I tried, I just didn't understand well enough to use it, spending a lot of time looking up definitions of words and processes...and some I thought I did, but I just couldn't get it to work right. *sigh* Humbling, to say the least!

I was also trying to do it while school work was ramping up, so I wasn't able to dedicate enough time to it, and as if this morning, the forum's gone. :cry:
 
If it;s just the front page that is down, httrack could probably copy some of it
 
If it;s just the front page that is down, httrack could probably copy some of it

Nah, all of my thread bookmarks are dead, and all of the Google image search results are at that, like, blurry 3% quality from the source no longer being availble.
 
Top