Downloading from Wayback: Difference between revisions
(14 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
This page describes how to download web game/animation files from Wayback Machine. Once the files are downloaded, they can be curated using Flashpoint | This page describes how to download web game/animation files from Wayback Machine. Once the files are downloaded, they can be curated using Flashpoint Infinity. | ||
==Understanding Wayback Machine Capture URLs== | ==Understanding Wayback Machine Capture URLs== | ||
Let's break down this example of a typical Wayback Machine capture URL: | Let's break down this example of a typical Wayback Machine capture URL: | ||
https://web.archive.org/web/20150118221400/http://www.teagames.com/games/crazygolf2 | |||
First we have the Wayback Machine '''base URL''': <code><nowiki>https://web.archive.org/web/</nowiki></code>. This is present at the beginning of all Wayback Machine capture URLs. | First we have the Wayback Machine '''base URL''': <code><nowiki>https://web.archive.org/web/</nowiki></code>. This is present at the beginning of all Wayback Machine capture URLs. | ||
Next is the '''date code''': <code>20150118221400</code>. This indicates the | Next is the '''date code''': <code>20150118221400</code>. This indicates the date (2015-01-18) and the time (22:14:00) that the capture was taken. Wayback Machine automatically finds the closest capture to any date code you give it. It treats any incomplete date code as a range and searches for the closest capture to the ''end'' of that range. Here are some examples: | ||
* <code>https://web.archive.org/web/2/http://example.com/</code> will load the ''latest capture'' of example.com. | * <code>https://web.archive.org/web/2/http://example.com/</code> will load the ''latest capture'' of example.com. | ||
* <code>https://web.archive.org/web/0/http://example.com/</code> will load the ''earliest capture'' of example.com. | * <code>https://web.archive.org/web/0/http://example.com/</code> will load the ''earliest capture'' of example.com. | ||
Line 14: | Line 14: | ||
* <code>https://web.archive.org/web/2005/http://example.com/</code> will load the ''capture closest to the end of the year 2005''. | * <code>https://web.archive.org/web/2005/http://example.com/</code> will load the ''capture closest to the end of the year 2005''. | ||
The last part of the capture URL is the URL from the original site. In this case it is <code><nowiki>http://www.teagames.com/games/crazygolf2</nowiki></code>. | |||
* The | * The original URL (not the full capture URL!) will always be your launch command in Flashpoint. | ||
===Downloading the Original Files=== | |||
By default, Wayback rewrites links and inserts a toolbar at the top of all archived pages. But when curating for Flashpoint, we want the original, unmodified files! To get those, you must add a '''modifier suffix''' to the end of the capture's date code. Wayback supports several suffixes, but the one you should use is '''<code>id_</code>''', which means "identical" (to the original). Here is the capture URL from before, with the <code>id_</code> suffix added: | |||
https://web.archive.org/web/20150118221400id_/http://www.teagames.com/games/crazygolf2 | |||
Notice that the Wayback toolbar no longer appears. When downloading files for Flashpoint curations, you should always use the <code>id_</code> suffix! | |||
==Searching Wayback Machine for Captures== | ==Searching Wayback Machine for Captures== | ||
Line 60: | Line 63: | ||
==Using Wayback Download Mode== | ==Using Wayback Download Mode== | ||
Wayback Download Mode is a | Wayback Download Mode is a mode of Flashpoint that allows you to download files directly from Wayback Machine, allowing for easier curation of titles from the Wayback Machine. | ||
===Setting Up Wayback Download Mode=== | ===Setting Up Wayback Download Mode=== | ||
Wayback Download Mode is currently on the Development Branch part of version 13, it is enabled in the third tab of the Manager. | |||
# Open Flashpoint and switch to the Config tab. | |||
# Click the "Curate Server" dropdown menu and switch to "Wayback Download Mode." <br> [[File:WaybackDownloadMode.png]] | |||
# Click "Save and Restart." Curations in the curate tab will now use Wayback Download Mode when you launch them. | |||
# Click the "Server" dropdown menu and switch to "Wayback Download Mode." <br> [[File:WaybackDownloadMode.png]] | |||
# Click "Save and Restart." | |||
===Using Wayback Download Mode=== | ===Using Wayback Download Mode=== | ||
# In Flashpoint | # In Flashpoint, switch to the Curate tab, then click New Curation. For more information about using the Curate tab, see [[Curation Tutorial]]. | ||
# Enter the title and platform of the game, and any other metadata you wish. For the Launch Command, paste the '''original URL''' that you want to download from Wayback. | # Enter the title and platform of the game, and any other metadata you wish. For the Launch Command, paste the '''original URL''' that you want to download from Wayback. | ||
#* Note that if the original URL uses HTTPS, you will need to replace <code>https</code> with <code>http</code>. | #* Note that if the original URL uses HTTPS, you will need to replace <code>https</code> with <code>http</code>. | ||
# For the Application Path, click the dropdown and choose the appropriate application in Flashpoint for the type of game you are curating. | # For the Application Path, click the dropdown and choose the appropriate application in Flashpoint for the type of game you are curating. | ||
# | # Launch the game. The game should load in the application you specified. Watch the Logs tab of Flashpoint Launcher as you play the game. | ||
# Once all of the assets seem to have loaded, navigate to Flashpoint's <code>Legacy\htdocs</code> folder and sort the contents by Date Modified. Determine which files and folders belong to the game you just downloaded. | # Once all of the assets seem to have loaded, navigate to Flashpoint's <code>Legacy\htdocs</code> folder and sort the contents by Date Modified. Determine which files and folders belong to the game you just downloaded. | ||
# Copy the files and folders from the htdocs folder to your curation's <code>content</code> folder, retaining the same structure. Refer to the [[Curation Tutorial]]. | # Copy the files and folders from the htdocs folder to your curation's <code>content</code> folder, retaining the same structure. Refer to the [[Curation Tutorial]]. | ||
Line 83: | Line 84: | ||
===Turning Off Wayback Download Mode=== | ===Turning Off Wayback Download Mode=== | ||
After you're done downloading from Wayback, you will need to turn it off to use | After you're done downloading from Wayback, you will need to turn it off to use the Curate Tab normally. To do this, click the "Config" tab of Flashpoint Launcher, then switch back to "Apache Webserver" using the "Server" dropdown menu. | ||
==Using cURLsDownloader== | ==Using cURLsDownloader== | ||
Line 98: | Line 99: | ||
# That's it! Save the text file, then refer to the [[Curation_Tutorial|Curation Tutorial]] and the cURLsDownloader Manual to complete your curation. | # That's it! Save the text file, then refer to the [[Curation_Tutorial|Curation Tutorial]] and the cURLsDownloader Manual to complete your curation. | ||
#* After the files are finished downloading, cURLsDownloader will give you the option to automatically move the downloaded files to their original URLs. Be sure to take advantage of this option! | #* After the files are finished downloading, cURLsDownloader will give you the option to automatically move the downloaded files to their original URLs. Be sure to take advantage of this option! | ||
==Using Wayback Machine Downloader== | |||
Wayback Machine Downloader is a command line tool that can automatically download whole websites from the Wayback Machine. You can also use different filters to customize exactly which files are downloaded. [https://github.com/hartator/wayback-machine-downloader#wayback-machine-downloader See its GitHub page for more information]. | |||
<noinclude> | |||
[[Category:Other Guides]] | |||
</noinclude> |
Latest revision as of 00:50, 6 May 2024
This page describes how to download web game/animation files from Wayback Machine. Once the files are downloaded, they can be curated using Flashpoint Infinity.
Understanding Wayback Machine Capture URLs
Let's break down this example of a typical Wayback Machine capture URL:
https://web.archive.org/web/20150118221400/http://www.teagames.com/games/crazygolf2
First we have the Wayback Machine base URL: https://web.archive.org/web/
. This is present at the beginning of all Wayback Machine capture URLs.
Next is the date code: 20150118221400
. This indicates the date (2015-01-18) and the time (22:14:00) that the capture was taken. Wayback Machine automatically finds the closest capture to any date code you give it. It treats any incomplete date code as a range and searches for the closest capture to the end of that range. Here are some examples:
https://web.archive.org/web/2/http://example.com/
will load the latest capture of example.com.https://web.archive.org/web/0/http://example.com/
will load the earliest capture of example.com.https://web.archive.org/web/1/http://example.com/
will load the capture closest to the end of the year 1999.https://web.archive.org/web/2005/http://example.com/
will load the capture closest to the end of the year 2005.
The last part of the capture URL is the URL from the original site. In this case it is http://www.teagames.com/games/crazygolf2
.
- The original URL (not the full capture URL!) will always be your launch command in Flashpoint.
Downloading the Original Files
By default, Wayback rewrites links and inserts a toolbar at the top of all archived pages. But when curating for Flashpoint, we want the original, unmodified files! To get those, you must add a modifier suffix to the end of the capture's date code. Wayback supports several suffixes, but the one you should use is id_
, which means "identical" (to the original). Here is the capture URL from before, with the id_
suffix added:
https://web.archive.org/web/20150118221400id_/http://www.teagames.com/games/crazygolf2
Notice that the Wayback toolbar no longer appears. When downloading files for Flashpoint curations, you should always use the id_
suffix!
Searching Wayback Machine for Captures
There are two main methods of searching Wayback Machine for captures. Keep in mind that for unknown reasons, sometimes only one method will bring up a certain result. If you can't find a URL or asset using one method, try the other.
Quick Search
Let's say you want to search for all captures of this URL: http://www.teagames.com/games/crazygolf2
.
Use the same structure as a capture URL, but replace the date code and modifier suffix with an asterisk: https://web.archive.org/web/*/http://www.teagames.com/games/crazygolf2
.
This will show a calendar with colored circles for each capture. Hover over a circle and click a time code to access a capture.
- A blue circle means the file was found at the requested URL (200 OK). This is generally what you want.
- A green circle means there was a redirect (301 or 302).
- A yellow circle means the URL was not found (4xx).
- A red circle means there was a server error on the original website (5xx)
Now let's say you want to look at all the SWFs in TeaGames's swf
directory.
As before, use an asterisk instead of a date code in the capture URL. But now, you also need to put an asterisk at the end of the original URL portion. Here is what you will get: http://web.archive.org/web/*/https://teagames.com/swf/*
Use the "filter results" box on the top-right to search for specific filenames or URLs. You can also search for specific MIME types, such as application/x-shockwave-flash
.
CDX Search
This section goes over some basic examples of searches with Wayback's CDX API. The full documentation is here.
This search will list all captures of URLs starting with www.bbc.co.uk/cbbc/games/
:
http://web.archive.org/cdx/search?url=www.bbc.co.uk/cbbc/games/&matchType=prefix&collapse=urlkey&filter=!statuscode%3A[45]&fl=original
This search will list all captures with status 200 OK from the domain www.happyfeet-game.com
. These portions of the URL accomplish that: matchType=domain
and statuscode%3A200
.
https://web.archive.org/cdx/search?url=www.happyfeet-game.com&matchType=domain&collapse=urlkey&filter=statuscode%3A200&fl=original
You can also add a date range to a CDX search. For example, adding &from=2006&to=2008
to the previous search will return only captures from 2006 to 2008:
https://web.archive.org/cdx/search?url=www.happyfeet-game.com/&matchType=domain&collapse=urlkey&filter=statuscode%3A200&fl=original&from=2006&to=2008
Finally, this search will list all captures of SWF, DCR, and CCT files from URLs starting with superdudes.net
:
https://web.archive.org/cdx/search/cdx?url=superdudes.net&matchType=prefix&filter=original:.*(\.swf|\.dcr|\.cct).*&fl=original&collapse=urlkey
Using Wayback Download Mode
Wayback Download Mode is a mode of Flashpoint that allows you to download files directly from Wayback Machine, allowing for easier curation of titles from the Wayback Machine.
Setting Up Wayback Download Mode
Wayback Download Mode is currently on the Development Branch part of version 13, it is enabled in the third tab of the Manager.
- Open Flashpoint and switch to the Config tab.
- Click the "Curate Server" dropdown menu and switch to "Wayback Download Mode."
- Click "Save and Restart." Curations in the curate tab will now use Wayback Download Mode when you launch them.
Using Wayback Download Mode
- In Flashpoint, switch to the Curate tab, then click New Curation. For more information about using the Curate tab, see Curation Tutorial.
- Enter the title and platform of the game, and any other metadata you wish. For the Launch Command, paste the original URL that you want to download from Wayback.
- Note that if the original URL uses HTTPS, you will need to replace
https
withhttp
.
- Note that if the original URL uses HTTPS, you will need to replace
- For the Application Path, click the dropdown and choose the appropriate application in Flashpoint for the type of game you are curating.
- Launch the game. The game should load in the application you specified. Watch the Logs tab of Flashpoint Launcher as you play the game.
- Once all of the assets seem to have loaded, navigate to Flashpoint's
Legacy\htdocs
folder and sort the contents by Date Modified. Determine which files and folders belong to the game you just downloaded. - Copy the files and folders from the htdocs folder to your curation's
content
folder, retaining the same structure. Refer to the Curation Tutorial.
Turning Off Wayback Download Mode
After you're done downloading from Wayback, you will need to turn it off to use the Curate Tab normally. To do this, click the "Config" tab of Flashpoint Launcher, then switch back to "Apache Webserver" using the "Server" dropdown menu.
Using cURLsDownloader
You can also download files from Wayback Machine using a combination of CDX Search and cURLsDownloader by following the steps below.
- Use CDX Search to obtain a list of original URLs to download from Wayback machine.
- Copy and paste these URLs into a text editor such as Notepad++ or Sublime Text.
- Now you need to turn these original URLs into capture URLs. First, determine a suitable date code.
- For example, if the captures you want are from around January 2014,
201401
is a good date code.
- For example, if the captures you want are from around January 2014,
- Compose a capture URL prefix using the Wayback Machine base URL, your date code, and the
id_
suffix.- An example capture URL prefix would be
https://web.archive.org/web/201401id_/
.
- An example capture URL prefix would be
- Use your text editor's Macros or Find & Replace functionality to insert your capture URL prefix before each original URL in the list.
- An example capture URL would be
https://web.archive.org/web/201401id_/http://example.com/
.
- An example capture URL would be
- That's it! Save the text file, then refer to the Curation Tutorial and the cURLsDownloader Manual to complete your curation.
- After the files are finished downloading, cURLsDownloader will give you the option to automatically move the downloaded files to their original URLs. Be sure to take advantage of this option!
Using Wayback Machine Downloader
Wayback Machine Downloader is a command line tool that can automatically download whole websites from the Wayback Machine. You can also use different filters to customize exactly which files are downloaded. See its GitHub page for more information.