Downloading from Wayback: Difference between revisions
(Updated Wayback Download patch for Core 10) |
(Clarifications) |
||
Line 8: | Line 8: | ||
First we have the Wayback Machine '''base URL''': <code><nowiki>https://web.archive.org/web/</nowiki></code>. This is present at the beginning of all Wayback Machine capture URLs. | First we have the Wayback Machine '''base URL''': <code><nowiki>https://web.archive.org/web/</nowiki></code>. This is present at the beginning of all Wayback Machine capture URLs. | ||
Next is the '''date code''': <code>20150118221400</code>. This indicates the | Next is the '''date code''': <code>20150118221400</code>. This indicates the date (2015-01-18) and the time (22:14:00) that the capture was taken. Wayback Machine automatically finds the closest capture to any date code you give it. It treats any incomplete date code as a range and searches for the closest capture to the ''end'' of that range. Here are some examples: | ||
* <code>https://web.archive.org/web/2/http://example.com/</code> will load the ''latest capture'' of example.com. | * <code>https://web.archive.org/web/2/http://example.com/</code> will load the ''latest capture'' of example.com. | ||
* <code>https://web.archive.org/web/0/http://example.com/</code> will load the ''earliest capture'' of example.com. | * <code>https://web.archive.org/web/0/http://example.com/</code> will load the ''earliest capture'' of example.com. | ||
Line 14: | Line 14: | ||
* <code>https://web.archive.org/web/2005/http://example.com/</code> will load the ''capture closest to the end of the year 2005''. | * <code>https://web.archive.org/web/2005/http://example.com/</code> will load the ''capture closest to the end of the year 2005''. | ||
Next is the '''modifier suffix''': <code>if_</code>. | Next is the '''modifier suffix''': <code>if_</code>. A suffix can be appended to the date code of a capture URL to change Wayback's behavior. | ||
* The most useful suffix is the <code>id_</code> suffix. | * The <code>if_</code> suffix removes the Wayback toolbar from the top of the archived page. But it does not disable other Wayback modifications, such as URL rewriting. | ||
* The most useful suffix is the <code>id_</code> suffix. It allows you to download the original, unmodified file! When downloading files for curating in Flashpoint, you should always use the <code>id_</code> suffix. | |||
Finally, the last part of the capture URL is the URL from the original site. In this case it is <code><nowiki>http://www.teagames.com/games/crazygolf2</nowiki></code>. | Finally, the last part of the capture URL is the URL from the original site. In this case it is <code><nowiki>http://www.teagames.com/games/crazygolf2</nowiki></code>. |
Revision as of 04:10, 7 April 2022
This page describes how to download web game/animation files from Wayback Machine. Once the files are downloaded, they can be curated using Flashpoint Core.
Understanding Wayback Machine Capture URLs
Let's break down this example of a typical Wayback Machine capture URL:
https://web.archive.org/web/20150118221400if_/http://www.teagames.com/games/crazygolf2
First we have the Wayback Machine base URL: https://web.archive.org/web/
. This is present at the beginning of all Wayback Machine capture URLs.
Next is the date code: 20150118221400
. This indicates the date (2015-01-18) and the time (22:14:00) that the capture was taken. Wayback Machine automatically finds the closest capture to any date code you give it. It treats any incomplete date code as a range and searches for the closest capture to the end of that range. Here are some examples:
https://web.archive.org/web/2/http://example.com/
will load the latest capture of example.com.https://web.archive.org/web/0/http://example.com/
will load the earliest capture of example.com.https://web.archive.org/web/1/http://example.com/
will load the capture closest to the end of the year 1999.https://web.archive.org/web/2005/http://example.com/
will load the capture closest to the end of the year 2005.
Next is the modifier suffix: if_
. A suffix can be appended to the date code of a capture URL to change Wayback's behavior.
- The
if_
suffix removes the Wayback toolbar from the top of the archived page. But it does not disable other Wayback modifications, such as URL rewriting. - The most useful suffix is the
id_
suffix. It allows you to download the original, unmodified file! When downloading files for curating in Flashpoint, you should always use theid_
suffix.
Finally, the last part of the capture URL is the URL from the original site. In this case it is http://www.teagames.com/games/crazygolf2
.
- The original URL (not the full capture URL!) will always be your launch command in Flashpoint.
Searching Wayback Machine for Captures
There are two main methods of searching Wayback Machine for captures. Keep in mind that for unknown reasons, sometimes only one method will bring up a certain result. If you can't find a URL or asset using one method, try the other.
Quick Search
Let's say you want to search for all captures of this URL: http://www.teagames.com/games/crazygolf2
.
Use the same structure as a capture URL, but replace the date code and modifier suffix with an asterisk: https://web.archive.org/web/*/http://www.teagames.com/games/crazygolf2
.
This will show a calendar with colored circles for each capture. Hover over a circle and click a time code to access a capture.
- A blue circle means the file was found at the requested URL (200 OK). This is generally what you want.
- A green circle means there was a redirect (301 or 302).
- A yellow circle means the URL was not found (4xx).
- A red circle means there was a server error on the original website (5xx)
Now let's say you want to look at all the SWFs in TeaGames's swf
directory.
As before, use an asterisk instead of a date code in the capture URL. But now, you also need to put an asterisk at the end of the original URL portion. Here is what you will get: http://web.archive.org/web/*/https://teagames.com/swf/*
Use the "filter results" box on the top-right to search for specific filenames or URLs. You can also search for specific MIME types, such as application/x-shockwave-flash
.
CDX Search
This section goes over some basic examples of searches with Wayback's CDX API. The full documentation is here.
This search will list all captures of URLs starting with www.bbc.co.uk/cbbc/games/
:
http://web.archive.org/cdx/search?url=www.bbc.co.uk/cbbc/games/&matchType=prefix&collapse=urlkey&filter=!statuscode%3A[45]&fl=original
This search will list all captures with status 200 OK from the domain www.happyfeet-game.com
. These portions of the URL accomplish that: matchType=domain
and statuscode%3A200
.
https://web.archive.org/cdx/search?url=www.happyfeet-game.com&matchType=domain&collapse=urlkey&filter=statuscode%3A200&fl=original
You can also add a date range to a CDX search. For example, adding &from=2006&to=2008
to the previous search will return only captures from 2006 to 2008:
https://web.archive.org/cdx/search?url=www.happyfeet-game.com/&matchType=domain&collapse=urlkey&filter=statuscode%3A200&fl=original&from=2006&to=2008
Finally, this search will list all captures of SWF, DCR, and CCT files from URLs starting with superdudes.net
:
https://web.archive.org/cdx/search/cdx?url=superdudes.net&matchType=prefix&filter=original:.*(\.swf|\.dcr|\.cct).*&fl=original&collapse=urlkey
Using Wayback Download Mode
Wayback Download Mode is a new mode of Flashpoint Core that allows you to download files directly from Wayback Machine. These files can then be added to your curations.
Setting Up Wayback Download Mode
Wayback Download Mode is compatible with Flashpoint Core 9 and 10. Assuming a compatible version of Core is installed, here are the steps to activate this mode:
- Download the Wayback Download Patch and extract it into your Flashpoint folder. Replace files when prompted.
- Restart Flashpoint Launcher, then switch to the Config tab.
- Click the "Server" dropdown menu and switch to "Wayback Download Mode."
- Click "Save and Restart." Flashpoint Core will now use Wayback Download Mode when you launch games or curations.
Using Wayback Download Mode
- In Flashpoint Core, click New Game on the bottom-right corner of the launcher. If you do not see a New Game button, switch to the Config tab and check "Enable Editing."
- Alternatively, you can switch to the Curate tab, then click New Curation instead of New Game. For more information about using the Curate tab, see Curation Tutorial.
- Enter the title and platform of the game, and any other metadata you wish. For the Launch Command, paste the original URL that you want to download from Wayback.
- Note that if the original URL uses HTTPS, you will need to replace
https
withhttp
.
- Note that if the original URL uses HTTPS, you will need to replace
- For the Application Path, click the dropdown and choose the appropriate application in Flashpoint for the type of game you are curating.
- Double-click the game to launch it. The game should load in the application you specified. Watch the Logs tab of Flashpoint Launcher as you play the game.
- Once all of the assets seem to have loaded, navigate to Flashpoint's
Legacy\htdocs
folder and sort the contents by Date Modified. Determine which files and folders belong to the game you just downloaded. - Copy the files and folders from the htdocs folder to your curation's
content
folder, retaining the same structure. Refer to the Curation Tutorial.
Turning Off Wayback Download Mode
After you're done downloading from Wayback, you will need to turn it off to use Flashpoint Core normally. To do this, click the "Config" tab of Flashpoint Launcher, then switch back to "Apache Webserver" using the "Server" dropdown menu.
Using cURLsDownloader
You can also download files from Wayback Machine using a combination of CDX Search and cURLsDownloader by following the steps below.
- Use CDX Search to obtain a list of original URLs to download from Wayback machine.
- Copy and paste these URLs into a text editor such as Notepad++ or Sublime Text.
- Now you need to turn these original URLs into capture URLs. First, determine a suitable date code.
- For example, if the captures you want are from around January 2014,
201401
is a good date code.
- For example, if the captures you want are from around January 2014,
- Compose a capture URL prefix using the Wayback Machine base URL, your date code, and the
id_
suffix.- An example capture URL prefix would be
https://web.archive.org/web/201401id_/
.
- An example capture URL prefix would be
- Use your text editor's Macros or Find & Replace functionality to insert your capture URL prefix before each original URL in the list.
- An example capture URL would be
https://web.archive.org/web/201401id_/http://example.com/
.
- An example capture URL would be
- That's it! Save the text file, then refer to the Curation Tutorial and the cURLsDownloader Manual to complete your curation.
- After the files are finished downloading, cURLsDownloader will give you the option to automatically move the downloaded files to their original URLs. Be sure to take advantage of this option!