milan3.htm: How to search the web, by fravia+ (¯`·.¸(¯`·.¸ Milan, Linux day ¸.·´¯)¸.·´¯)

http://www.searchlores.org
http://www.fravia.org

				Milan Linux day

Searching, combing, klebing, luring, hacking

Accessing info
I hope you are not bored yet. The different "areas" we have examined above must be accessed using different techniques.
I will try to explain you some of them. This will concentrate on the fundamental issues, since I could else speak hours only to explain you the differences among the main search engines. If you are really interested, the links I offer will bring you far away inside our jungle.

In order to search effectively you must understand various techniques and apply them accordingly.

Searching: Searching yourself, using main, regional and local search engines
Combing: Searching those that have searched and are willing to share
klebing, luring, stalking: Searching those that have searched and are not willing to share
guessing and hacking: entering servers and databases you are not supposed to enter

Searching
In order to search effectively you must first of all understand the basic search approaches.
The web is full of search 'tutorials' and self-proclaimed search experts. Yet you'll soon realize (if you are - as you should - able to evaluate the results of your queries) that most of the information is 'hollow'. Have a look at a famous resource: Chris Sherman's http://www.websearch.about.com for instance. The 'best links' and 'oustanding tutorials' he refers to are, for instance, small and simple collections of elementary notions by Danny Sullivan (http://www.searchenginewatch.com) and by the Libraries of the University at Albany (http://www.albany.edu/library/internet/boolean.html).
There are sound (if evil) economic reasons for that: the lore of searching the web is nowaday a very valuable gift: just to make but one example: people gain a LOT of money placing sites in the best positions on the search engines (which is relatively easy once you reverse the pool of algos used by the main engines). Therefore real working (and in depth) searching tips are mostly offered by universities, educational sites and librarians.
But there are (some) other possibilities to gain the knowledge. Apart from the relative help that people that still adhere to the spirit of the "web of old" can offer (give to be given), there are also valuable messageboards (where the search engines spammers exchange their dirty tricks, for instance: http://www.searchengineforums.com/bin/Ultimate.cgi and http://www.webmasterworld.com/), some documents about the 'guts' of the search engines (see for instance, on my site, altavista in depth and and some oddities @ raging). Last, but by all means NOT least, you have also the possibility to build your own search bots.
The main search engines, even if they are spammed, even if they sell the best (top 10) positions and even if they take, of course, note of every activity and search you perform (sniffing is the very reason someone creates and gives you 'free' search engines, duh) can nevertheless be used for 'zeroing in' purposes.
As a simple rule of thumb

USE MORE THAN ONE ENGINE (the real rule goes beyond search engines and reads 'use various resources')
KEEP ON TRACK
USE LOWERCASE AND SEARCH ENGINES 'MATH' (that is +, - and booleans)
USE PECULIAR STRINGS AND SPECIAL KEYWORDS (a typical case on alta is -"domain:com" to avoid commercial crap)
USE PROXIMITY SEARCHES (NEAR, ONEAR etcetera, best search engine for this is Infoseek)
LEARN TO ARCHIVE AND ORGANIZE your search results

La va sans dire that you should also learn how to evaluate the results you get and ditch without mercy all the noise you'll have to wade trough to reach your signal.

Combing
A very effective technique and approach: you search those that have already searched
Since combing includes a lot of techniques, I'll go back to it when examining the local and regional search engines and usenet resources. If you want a short description: you lurk around usenet, maillists and messageboards trying to find some authorities in the matters you are seeking. Then you use the cumulated knowledge of these savy-people to jumpstart on your search. Some social engineering, stalking and klebing capacities are also required.

klebing, luring, stalking
In order to find the outside linkers and the lurkers and leechers' treasures
The 'combing' approach is useful in order to find treasures hoarded by people that are willing to share what they know and have found. Unfortunately, in a more and more 'commercial' web, the number of bastards that 'hide' what they have found is increasing. Most of the juicy hidden sites are in the 'outside linkers' part of the web, with no link pointing back to them.
How do we find them?
The most used method is klebing.
Basically klebing is using the information found inside the referral fields of your loggings when the target visits your site. In a list of referrers taken from my site loggings you can find, for instance, many urls belonging to more or less useful messageboards of the professional search engines spammers, like the url http://www.webmasterworld.com/forum10/112.htm.
The point is that very few people use proxies when surfing, and even less can resist the immediate temptation to click on a link they find on a page or receive per email, especially if it has a juicy promise.
The trick is to lure your target on an interesting page of yours and keep updating it until he will land there COMING FROM HIS OUTSIDE LINKER SITE. Alternatively, a common trick is to caper a public email account of your target, using common luring tricks, stalking/guessing techniques or bruteforcing it.
Once you have done it, read his private stuff until you know where his servers are located better than he does.
Of course this could be slightly illegal, but klebing is an aggressive technique and there are no fixed rules on the web anyway.
If you just wonder how you can caper an email account, and would like to try it out for knowledge purposes, you may try this technique on one of your own accounts.

guessing and hacking
The art to get to the roof.

There are may ways to find info on a target website
- Browse their FTP site looking for hidden directories
- Browse their FTP site looking for stuff out in the open that they have forgotten about
- Use a FrontPage attack (there are many)
- Exploit weaknesses in Active Server Pages
- View the source of pages (especially registering and purchasing online pages)
- And my favorite: Guessing

Guessing is an extremely important lore in a web where "nomen est omen". If you find a database where you have as subdirectory structure something like:
http://www.targetsite.com/juicy/demos/a.zip
I would, if I were you, first of all try to fetch the robots.txt of the site (We already spoke of the simple technique of checking the robots.txt file (the exclusion listing for automated spiders) in order to find 'unallowed' subdirectories. There is also an automated "robots.txt" checker (@ The University of Edinburgh) that can do the job, for instance on Luc's mirror of my site:
http://www.dcs.ed.ac.uk/cgi/sxw/parserobots.pl?site=http%3A%2F%2Fwww.2113.ch%2Frobots.txt
Anyway, failing to fetch it, I would try to guess:
http://www.targetsite.com/juicy/downloads/a.zip
The key to guessing is research. Look around at their website and see what they name things and where they put things. Look at pictures and links and downloads.

Let's, moreover not forget how useful will be our holy software reversing skills each time we'll decide to use some of the many tools that the Web offers to track down our targets (tools that are unfortunately at times crippled or simply too short-lived :-)

But even simple searching is not all that simple: in order to search effectively you must first of all understand some of the most important differences between the main search engines.

Proceed to The main search engines, differences & tricks