Downloading from Wayback

From Flashpoint Datahub
Jump to navigation Jump to search

This page describes how to download web game/animation files from Wayback Machine. Once the files are downloaded, they can be curated using Flashpoint Infinity.

Understanding Wayback Machine Capture URLs

Let's break down this example of a typical Wayback Machine capture URL:

https://web.archive.org/web/20150118221400/http://www.teagames.com/games/crazygolf2

First we have the Wayback Machine base URL: https://web.archive.org/web/. This is present at the beginning of all Wayback Machine capture URLs.

Next is the date code: 20150118221400. This indicates the date (2015-01-18) and the time (22:14:00) that the capture was taken. Wayback Machine automatically finds the closest capture to any date code you give it. It treats any incomplete date code as a range and searches for the closest capture to the end of that range. Here are some examples:

The last part of the capture URL is the URL from the original site. In this case it is http://www.teagames.com/games/crazygolf2.

  • The original URL (not the full capture URL!) will always be your launch command in Flashpoint.

Downloading the Original Files

By default, Wayback rewrites links and inserts a toolbar at the top of all archived pages. But when curating for Flashpoint, we want the original, unmodified files! To get those, you must add a modifier suffix to the end of the capture's date code. Wayback supports several suffixes, but the one you should use is id_, which means "identical" (to the original). Here is the capture URL from before, with the id_ suffix added:

https://web.archive.org/web/20150118221400id_/http://www.teagames.com/games/crazygolf2

Notice that the Wayback toolbar no longer appears. When downloading files for Flashpoint curations, you should always use the id_ suffix!

Searching Wayback Machine for Captures

There are two main methods of searching Wayback Machine for captures. Keep in mind that for unknown reasons, sometimes only one method will bring up a certain result. If you can't find a URL or asset using one method, try the other.

Quick Search

Let's say you want to search for all captures of this URL: http://www.teagames.com/games/crazygolf2.

Use the same structure as a capture URL, but replace the date code and modifier suffix with an asterisk: https://web.archive.org/web/*/http://www.teagames.com/games/crazygolf2.

This will show a calendar with colored circles for each capture. Hover over a circle and click a time code to access a capture.

  • A blue circle means the file was found at the requested URL (200 OK). This is generally what you want.
  • A green circle means there was a redirect (301 or 302).
  • A yellow circle means the URL was not found (4xx).
  • A red circle means there was a server error on the original website (5xx)

Now let's say you want to look at all the SWFs in TeaGames's swf directory.

As before, use an asterisk instead of a date code in the capture URL. But now, you also need to put an asterisk at the end of the original URL portion. Here is what you will get: http://web.archive.org/web/*/https://teagames.com/swf/*

Use the "filter results" box on the top-right to search for specific filenames or URLs. You can also search for specific MIME types, such as application/x-shockwave-flash.

CDX Search

This section goes over some basic examples of searches with Wayback's CDX API. The full documentation is here.

This search will list all captures of URLs starting with www.bbc.co.uk/cbbc/games/:
http://web.archive.org/cdx/search?url=www.bbc.co.uk/cbbc/games/&matchType=prefix&collapse=urlkey&filter=!statuscode%3A[45]&fl=original

This search will list all captures with status 200 OK from the domain www.happyfeet-game.com. These portions of the URL accomplish that: matchType=domain and statuscode%3A200.
https://web.archive.org/cdx/search?url=www.happyfeet-game.com&matchType=domain&collapse=urlkey&filter=statuscode%3A200&fl=original

You can also add a date range to a CDX search. For example, adding &from=2006&to=2008 to the previous search will return only captures from 2006 to 2008:
https://web.archive.org/cdx/search?url=www.happyfeet-game.com/&matchType=domain&collapse=urlkey&filter=statuscode%3A200&fl=original&from=2006&to=2008

Finally, this search will list all captures of SWF, DCR, and CCT files from URLs starting with superdudes.net:
https://web.archive.org/cdx/search/cdx?url=superdudes.net&matchType=prefix&filter=original:.*(\.swf|\.dcr|\.cct).*&fl=original&collapse=urlkey

Using Wayback Download Mode

Wayback Download Mode is a mode of Flashpoint that allows you to download files directly from Wayback Machine, allowing for easier curation of titles from the Wayback Machine.

Setting Up Wayback Download Mode

Wayback Download Mode is currently on the Development Branch part of version 13, it is enabled in the third tab of the Manager.

  1. Open Flashpoint and switch to the Config tab.
  2. Click the "Curate Server" dropdown menu and switch to "Wayback Download Mode."
    WaybackDownloadMode.png
  3. Click "Save and Restart." Curations in the curate tab will now use Wayback Download Mode when you launch them.

Using Wayback Download Mode

  1. In Flashpoint, switch to the Curate tab, then click New Curation. For more information about using the Curate tab, see Curation Tutorial.
  2. Enter the title and platform of the game, and any other metadata you wish. For the Launch Command, paste the original URL that you want to download from Wayback.
    • Note that if the original URL uses HTTPS, you will need to replace https with http.
  3. For the Application Path, click the dropdown and choose the appropriate application in Flashpoint for the type of game you are curating.
  4. Launch the game. The game should load in the application you specified. Watch the Logs tab of Flashpoint Launcher as you play the game.
  5. Once all of the assets seem to have loaded, navigate to Flashpoint's Legacy\htdocs folder and sort the contents by Date Modified. Determine which files and folders belong to the game you just downloaded.
  6. Copy the files and folders from the htdocs folder to your curation's content folder, retaining the same structure. Refer to the Curation Tutorial.

Turning Off Wayback Download Mode

After you're done downloading from Wayback, you will need to turn it off to use the Curate Tab normally. To do this, click the "Config" tab of Flashpoint Launcher, then switch back to "Apache Webserver" using the "Server" dropdown menu.

Using cURLsDownloader

You can also download files from Wayback Machine using a combination of CDX Search and cURLsDownloader by following the steps below.

  1. Use CDX Search to obtain a list of original URLs to download from Wayback machine.
  2. Copy and paste these URLs into a text editor such as Notepad++ or Sublime Text.
  3. Now you need to turn these original URLs into capture URLs. First, determine a suitable date code.
    • For example, if the captures you want are from around January 2014, 201401 is a good date code.
  4. Compose a capture URL prefix using the Wayback Machine base URL, your date code, and the id_ suffix.
    • An example capture URL prefix would be https://web.archive.org/web/201401id_/.
  5. Use your text editor's Macros or Find & Replace functionality to insert your capture URL prefix before each original URL in the list.
  6. That's it! Save the text file, then refer to the Curation Tutorial and the cURLsDownloader Manual to complete your curation.
    • After the files are finished downloading, cURLsDownloader will give you the option to automatically move the downloaded files to their original URLs. Be sure to take advantage of this option!

Using Wayback Machine Downloader

Wayback Machine Downloader is a command line tool that can automatically download whole websites from the Wayback Machine. You can also use different filters to customize exactly which files are downloaded. See its GitHub page for more information.