Downloading from Wayback: Difference between revisions

From Flashpoint Datahub
Jump to navigation Jump to search
(Updated Wayback Download patch for Core 10)
(Clarifications)
Line 8: Line 8:
First we have the Wayback Machine '''base URL''': <code><nowiki>https://web.archive.org/web/</nowiki></code>. This is present at the beginning of all Wayback Machine capture URLs.
First we have the Wayback Machine '''base URL''': <code><nowiki>https://web.archive.org/web/</nowiki></code>. This is present at the beginning of all Wayback Machine capture URLs.


Next is the '''date code''': <code>20150118221400</code>. This indicates the year (2015), the month (January), etc that the capture was taken. Wayback Machine automatically finds the closest capture to any date code you give it. Here are some examples:
Next is the '''date code''': <code>20150118221400</code>. This indicates the date (2015-01-18) and the time (22:14:00) that the capture was taken. Wayback Machine automatically finds the closest capture to any date code you give it. It treats any incomplete date code as a range and searches for the closest capture to the ''end'' of that range. Here are some examples:
* <code>https://web.archive.org/web/2/http://example.com/</code> will load the ''latest capture'' of example.com.
* <code>https://web.archive.org/web/2/http://example.com/</code> will load the ''latest capture'' of example.com.
* <code>https://web.archive.org/web/0/http://example.com/</code> will load the ''earliest capture'' of example.com.
* <code>https://web.archive.org/web/0/http://example.com/</code> will load the ''earliest capture'' of example.com.
Line 14: Line 14:
* <code>https://web.archive.org/web/2005/http://example.com/</code> will load the ''capture closest to the end of the year 2005''.
* <code>https://web.archive.org/web/2005/http://example.com/</code> will load the ''capture closest to the end of the year 2005''.


Next is the '''modifier suffix''': <code>if_</code>. This can be appended to the date code of a capture URL to change Wayback's behavior. For example, appending <code>if_</code> will remove the Wayback toolbar from the top of the archived page.
Next is the '''modifier suffix''': <code>if_</code>. A suffix can be appended to the date code of a capture URL to change Wayback's behavior.  
* The most useful suffix is the <code>id_</code> suffix. This allows you to download the original, unmodified file! When downloading files for curating in Flashpoint, you should always use this modifier.
* The <code>if_</code> suffix removes the Wayback toolbar from the top of the archived page. But it does not disable other Wayback modifications, such as URL rewriting.
* The most useful suffix is the <code>id_</code> suffix. It allows you to download the original, unmodified file! When downloading files for curating in Flashpoint, you should always use the <code>id_</code> suffix.


Finally, the last part of the capture URL is the URL from the original site. In this case it is <code><nowiki>http://www.teagames.com/games/crazygolf2</nowiki></code>.  
Finally, the last part of the capture URL is the URL from the original site. In this case it is <code><nowiki>http://www.teagames.com/games/crazygolf2</nowiki></code>.  

Revision as of 04:10, 7 April 2022

This page describes how to download web game/animation files from Wayback Machine. Once the files are downloaded, they can be curated using Flashpoint Core.

Understanding Wayback Machine Capture URLs

Let's break down this example of a typical Wayback Machine capture URL: https://web.archive.org/web/20150118221400if_/http://www.teagames.com/games/crazygolf2

First we have the Wayback Machine base URL: https://web.archive.org/web/. This is present at the beginning of all Wayback Machine capture URLs.

Next is the date code: 20150118221400. This indicates the date (2015-01-18) and the time (22:14:00) that the capture was taken. Wayback Machine automatically finds the closest capture to any date code you give it. It treats any incomplete date code as a range and searches for the closest capture to the end of that range. Here are some examples:

Next is the modifier suffix: if_. A suffix can be appended to the date code of a capture URL to change Wayback's behavior.

  • The if_ suffix removes the Wayback toolbar from the top of the archived page. But it does not disable other Wayback modifications, such as URL rewriting.
  • The most useful suffix is the id_ suffix. It allows you to download the original, unmodified file! When downloading files for curating in Flashpoint, you should always use the id_ suffix.

Finally, the last part of the capture URL is the URL from the original site. In this case it is http://www.teagames.com/games/crazygolf2.

  • The original URL (not the full capture URL!) will always be your launch command in Flashpoint.

Searching Wayback Machine for Captures

There are two main methods of searching Wayback Machine for captures. Keep in mind that for unknown reasons, sometimes only one method will bring up a certain result. If you can't find a URL or asset using one method, try the other.

Quick Search

Let's say you want to search for all captures of this URL: http://www.teagames.com/games/crazygolf2.

Use the same structure as a capture URL, but replace the date code and modifier suffix with an asterisk: https://web.archive.org/web/*/http://www.teagames.com/games/crazygolf2.

This will show a calendar with colored circles for each capture. Hover over a circle and click a time code to access a capture.

  • A blue circle means the file was found at the requested URL (200 OK). This is generally what you want.
  • A green circle means there was a redirect (301 or 302).
  • A yellow circle means the URL was not found (4xx).
  • A red circle means there was a server error on the original website (5xx)

Now let's say you want to look at all the SWFs in TeaGames's swf directory.

As before, use an asterisk instead of a date code in the capture URL. But now, you also need to put an asterisk at the end of the original URL portion. Here is what you will get: http://web.archive.org/web/*/https://teagames.com/swf/*

Use the "filter results" box on the top-right to search for specific filenames or URLs. You can also search for specific MIME types, such as application/x-shockwave-flash.

CDX Search

This section goes over some basic examples of searches with Wayback's CDX API. The full documentation is here.

This search will list all captures of URLs starting with www.bbc.co.uk/cbbc/games/:
http://web.archive.org/cdx/search?url=www.bbc.co.uk/cbbc/games/&matchType=prefix&collapse=urlkey&filter=!statuscode%3A[45]&fl=original

This search will list all captures with status 200 OK from the domain www.happyfeet-game.com. These portions of the URL accomplish that: matchType=domain and statuscode%3A200.
https://web.archive.org/cdx/search?url=www.happyfeet-game.com&matchType=domain&collapse=urlkey&filter=statuscode%3A200&fl=original

You can also add a date range to a CDX search. For example, adding &from=2006&to=2008 to the previous search will return only captures from 2006 to 2008:
https://web.archive.org/cdx/search?url=www.happyfeet-game.com/&matchType=domain&collapse=urlkey&filter=statuscode%3A200&fl=original&from=2006&to=2008

Finally, this search will list all captures of SWF, DCR, and CCT files from URLs starting with superdudes.net:
https://web.archive.org/cdx/search/cdx?url=superdudes.net&matchType=prefix&filter=original:.*(\.swf|\.dcr|\.cct).*&fl=original&collapse=urlkey

Using Wayback Download Mode

Wayback Download Mode is a new mode of Flashpoint Core that allows you to download files directly from Wayback Machine. These files can then be added to your curations.

Setting Up Wayback Download Mode

Wayback Download Mode is compatible with Flashpoint Core 9 and 10. Assuming a compatible version of Core is installed, here are the steps to activate this mode:

  1. Download the Wayback Download Patch and extract it into your Flashpoint folder. Replace files when prompted.
  2. Restart Flashpoint Launcher, then switch to the Config tab.
  3. Click the "Server" dropdown menu and switch to "Wayback Download Mode."
    WaybackDownloadMode.png
  4. Click "Save and Restart." Flashpoint Core will now use Wayback Download Mode when you launch games or curations.

Using Wayback Download Mode

  1. In Flashpoint Core, click New Game on the bottom-right corner of the launcher. If you do not see a New Game button, switch to the Config tab and check "Enable Editing."
    • Alternatively, you can switch to the Curate tab, then click New Curation instead of New Game. For more information about using the Curate tab, see Curation Tutorial.
  2. Enter the title and platform of the game, and any other metadata you wish. For the Launch Command, paste the original URL that you want to download from Wayback.
    • Note that if the original URL uses HTTPS, you will need to replace https with http.
  3. For the Application Path, click the dropdown and choose the appropriate application in Flashpoint for the type of game you are curating.
  4. Double-click the game to launch it. The game should load in the application you specified. Watch the Logs tab of Flashpoint Launcher as you play the game.
  5. Once all of the assets seem to have loaded, navigate to Flashpoint's Legacy\htdocs folder and sort the contents by Date Modified. Determine which files and folders belong to the game you just downloaded.
  6. Copy the files and folders from the htdocs folder to your curation's content folder, retaining the same structure. Refer to the Curation Tutorial.

Turning Off Wayback Download Mode

After you're done downloading from Wayback, you will need to turn it off to use Flashpoint Core normally. To do this, click the "Config" tab of Flashpoint Launcher, then switch back to "Apache Webserver" using the "Server" dropdown menu.

Using cURLsDownloader

You can also download files from Wayback Machine using a combination of CDX Search and cURLsDownloader by following the steps below.

  1. Use CDX Search to obtain a list of original URLs to download from Wayback machine.
  2. Copy and paste these URLs into a text editor such as Notepad++ or Sublime Text.
  3. Now you need to turn these original URLs into capture URLs. First, determine a suitable date code.
    • For example, if the captures you want are from around January 2014, 201401 is a good date code.
  4. Compose a capture URL prefix using the Wayback Machine base URL, your date code, and the id_ suffix.
    • An example capture URL prefix would be https://web.archive.org/web/201401id_/.
  5. Use your text editor's Macros or Find & Replace functionality to insert your capture URL prefix before each original URL in the list.
  6. That's it! Save the text file, then refer to the Curation Tutorial and the cURLsDownloader Manual to complete your curation.
    • After the files are finished downloading, cURLsDownloader will give you the option to automatically move the downloaded files to their original URLs. Be sure to take advantage of this option!