|
|
While gathering informations related to an ongoing project, I started to study and slightly reverse WebFerret, which
seemed an interesting source of information and ideas for the above mentioned project.
Although that project wasn't
aimed at improving WebFerret, I thought that the discoveries I made could be worth
an essay on their own.
The point that actually interest me is to figure out how WebFerret manages the query
building and the results parsing
of the differents search engines it support. Indeed, given that Webferret is a
software that runs locally on your
machine and given the fact that search engines often do (or at least at may) change their pages
layout, there must be
a way for WebFerret to keep updated to the last specifications.
I can't imagine that I would have to download a new
version each time a slight change would affect just one single Search Engine.
This made me think that the results page parsing
algorithm cannot just be 'hardcoded' in webferret.
A quick check of WebFerret's options shows that it have built-in support
for proxies. That's a very interesting idea.
Let's launch our favorite local proxy software (I had proxyplus
at hand), tell WebFerret to connect via localhost:4480,
run a simple webferret search session and ... oh oh, what's that ? Here
is the proxyplus log file :
01/28/2001:18:05:57 127.0.0.1 - HTTP "GET http://www.euroseek.net:80/query?iflang=uk&query=fravia&domain=world&lang=world&style=ferret HTTP/1.0" 200 254 254/256 MISS 212.209.54.40 D
01/28/2001:18:05:57 127.0.0.1 - HTTP "GET http://www.search.com:80/search?ferret=1&q=fravia HTTP/1.0" 200 279 279/177 MISS 216.200.247.146 D
01/28/2001:18:05:58 127.0.0.1 - HTTP "GET http://www.altavista.com:80/cgi-bin/query?pg=aq&stype=stext&Translate=on&q=fravia&r=fravia&stq=10 HTTP/1.0" 200 299 299/225 MISS 209.73.180.3 D
01/28/2001:18:05:58 127.0.0.1 - HTTP "GET http://findwhat.com:80/bin/findwhat.dll?getresults&mt=fravia&dc=40&aff_id=7114 HTTP/1.0" 200 128 128/206 MISS 216.216.246.30 D
01/28/2001:18:05:58 127.0.0.1 - HTTP "GET http://www.hotbot.com:80/?MT=fravia&SM=B&DV=0&LG=any&DC=50&DE=2&_v=2&OPs=MDRTP HTTP/1.0" 200 282 282/206 MISS 209.185.151.128 D
01/28/2001:18:05:58 127.0.0.1 - HTTP "GET http://search.excite.com:80/search.gw?s=fravia&c=web&start=0&showSummary=true HTTP/1.0" 200 297 297/205 MISS 199.172.148.11 D
01/28/2001:18:05:58 127.0.0.1 - HTTP "GET http://northernlight.com:80/nlquery.fcg?cb=0&qr=fravia&orl=2:1 HTTP/1.0" 200 544 544/231 MISS 216.34.102.230 D
01/28/2001:18:05:59 127.0.0.1 - HTTP "GET http://search.msn.com:80/results.asp?q=fravia HTTP/1.0" 200 184 184/173 MISS 207.46.185.99 D
01/28/2001:18:05:59 127.0.0.1 - HTTP "GET http://search.aol.com:80/dirsearch.adp?query=fravia&start=web HTTP/1.0" 200 208 208/189 MISS 205.188.180.25 D
01/28/2001:18:05:59 127.0.0.1 - HTTP "GET http://val.looksmart.com:80/r_search?comefrom=izu-val&look=x&isp=zu&key=fravia&search=0 HTTP/1.0" 200 244 244/215 MISS 207.138.42.25 D
01/28/2001:18:05:59 127.0.0.1 - HTTP "GET http://wwwp.goto.com:80/d/search/p/cnet/xml/?Keywords=fravia&maxCount=40 HTTP/1.0" 200 138 138/200 MISS 206.132.152.249 D
01/28/2001:18:06:00 127.0.0.1 - HTTP "GET http://search.icq.com:80/dirsearch.adp?query=fravia&wh=web&bm=0 HTTP/1.0" 200 208 208/191 MISS 205.188.180.249 D
01/28/2001:18:06:01 127.0.0.1 - HTTP "POST http://vorlon.ferretsoft.com:80/update HTTP/1.0" 200 291 127/291 MISS 206.103.246.239 D
01/28/2001:18:06:01 127.0.0.1 - HTTP "POST http://vorlon.ferretsoft.com:80/update HTTP/1.0" 200 291 2798/291 MISS 206.103.246.239 D
01/28/2001:18:06:02 127.0.0.1 - HTTP "GET http://findwhat.com:80/bin/findwhat.dll?getresults&mt=fravia&dc=40&aff_id=7114 HTTP/1.0" 200 281 281/206 MISS 216.216.246.30 D
01/28/2001:18:06:02 127.0.0.1 - HTTP "GET http://www.hotbot.com:80/?MT=fravia&SM=B&DV=0&LG=any&DC=50&DE=2&_v=2&OPs=MDRTP HTTP/1.0" 200 480 480/206 MISS 209.185.151.128 D
01/28/2001:18:06:03 127.0.0.1 - HTTP "GET http://northernlight.com:80/nlquery.fcg?cb=0&qr=fravia&orl=2:1 HTTP/1.0" 200 1026 1026/231 MISS 216.34.102.230 D
01/28/2001:18:06:03 127.0.0.1 - HTTP "GET http://www.search.com:80/search?ferret=1&q=fravia HTTP/1.0" 200 958 958/177 MISS 216.200.247.146 D
01/28/2001:18:06:03 127.0.0.1 - HTTP "GET http://val.looksmart.com:80/r_search?comefrom=izu-val&look=x&isp=zu&key=fravia&search=0 HTTP/1.0" 200 321 321/215 MISS 207.138.42.25 D
01/28/2001:18:06:04 127.0.0.1 - HTTP "GET http://wwwp.goto.com:80/d/search/p/cnet/xml/?Keywords=fravia&maxCount=40 HTTP/1.0" 200 293 293/200 MISS 206.132.152.249 D
01/28/2001:18:06:04 127.0.0.1 - HTTP "GET http://www.euroseek.net:80/query?iflang=uk&query=fravia&domain=world&lang=world&style=ferret HTTP/1.0" 200 3967 3967/256 MISS 212.209.54.40 D
01/28/2001:18:06:05 127.0.0.1 - HTTP "GET http://bcs.zdnet.com:80/ads/ferret-ad?RGROUP=504/BRAND=637/QT=%3Afravia HTTP/1.0" 200 388 388/331 MISS 205.181.112.84 D
01/28/2001:18:06:06 127.0.0.1 - HTTP "GET http://search.excite.com:80/search.gw?s=fravia&c=web&start=0&showSummary=true HTTP/1.0" 200 6729 6729/205 MISS 199.172.148.11 D
01/28/2001:18:06:06 127.0.0.1 - HTTP "GET http://www.webcrawler.com:80/cgi-bin/WebQuery?search=fravia&showSummary=true&src=wc_results HTTP/1.0" 200 351 351/327 MISS 198.3.99.101 D
01/28/2001:18:06:06 127.0.0.1 - HTTP "GET http://search.aol.com:80/dirsearch.adp?query=fravia&start=web HTTP/1.0" 200 4400 4400/189 MISS 205.188.180.25 D
01/28/2001:18:06:07 127.0.0.1 - HTTP "GET http://www.altavista.com:80/cgi-bin/query?pg=aq&kl=XX&r=fravia&search=Search&q=fravia&d0=&d1= HTTP/1.0" 200 5165 5165/288 MISS 209.73.180.3 D
01/28/2001:18:06:08 127.0.0.1 - HTTP "GET http://search.icq.com:80/dirsearch.adp?query=fravia&wh=web&bm=0 HTTP/1.0" 200 2888 2888/191 MISS 205.188.180.249 D
01/28/2001:18:06:08 127.0.0.1 - HTTP "GET http://search.msn.com:80/results.asp?q=fravia HTTP/1.0" 200 4733 4733/173 MISS 207.46.185.99 D
01/28/2001:18:06:09 127.0.0.1 - HTTP "GET http://www.euroseek.net:80/query?iflang=uk&query=fravia&domain=world&lang=world&style=ferret&of=10 HTTP/1.0" 200 254 254/262 MISS 212.209.54.40 D
POST /update HTTP/1.0 Content-type: application/x-www-form-urlencoded Content-length: 96 Pragma: no-cache Accept: */* Host: vorlon.ferretsoft.com X-Forwarded-For: 127.0.0.1 Via: 1.0 Proxy+ (v2.30 http://www.proxyplus.cz) SASF FerretSoft YourName YourCountry YourCompanyHere comes the first discovery: WebFerret implements a malicious 'phone home' feature (cfr the "malwares" lab). It sends back home your name, country and company. I say malicious because this isn't needed at all !!
HTTP/1.0 200 OK <-- 200 OK, hehe we could fake it! Content-Length: 2672 <-- quite a lot on info here Expires: Thu, 01 Dec 1994 16:00:00 GMT Content-Type: image/gif <-- uh? a gif? Pragma: no-cache SASF REGPATCH1.0000 <-- this + what's below clearly shows this is a registry patch file [Web] "RegistryVersion"=number:120 "InstalledEngines"=strings:\ "AltaVista",\ "AOLNetFind",\ "Anzwers",\ "CNET",\ "EuroSeek",\ "Excite",\ "FindWhat",\ "GOTO",\ "HotBot",\ "ICQ",\ "LookSmart",\ "LycosUSA",\ "MSN",\ "SearchUK",\ "WebCrawler" "ActiveEngines"=numbers:\ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 "NorthName"= "NorthHome"= "NorthURL"= "NorthMethod"= "NorthQueryType"= "NorthQueryOps"= "NorthQueryCloseness"= "NorthQueryCommand"= "NorthGrammar"= "SearchDelay"=number:3000 "ExciteQueryCommand"=string:\ "#0; >xx; <urlcloseness; sx~[<null~;>urlname~|3~]; $WebFerret; >httpUser-Agent; $search=; <+urlquerytext; $+&c=web&start=0&showSummary=true&perPage=50; >urlquery" "ExciteGrammar"=strings:\ "R:<li>*.<a href=*.('http://[eh; tb; >url|*.]')*.\">[eh; tb; >title|*.]</a>*.size8>[eh; tb; >abstract|*.]</span>" "FindWhatQueryCommand"=string:\ "<urlcloseness; sx~[<null~;>urlname~|3~]; $WebFerret; >httpUser-Agent; <urlquerytext; ?,:%2C; >urlquerytext; $getresults&mt=; <+urlquerytext; $+&dc=40&aff_id=7114; >urlquery" "GOTOQueryCommand"=string:\ "<urlcloseness; sx~[<null~;>urlname~|3~]; $WebFerret; >httpUser-Agent; <urlquerytext; ?,:%2C; >urlquerytext; $Keywords=; <+urlquerytext; $+&maxCount=40; >urlquery" "SearchUKName"=string:"SearchUK" "SearchUKHome"=string:"http://www.searchuk.com/" "SearchUKURL"=string:"http://uk.searchengine.com/cgi-bin/search" ...... ......I didn't paste the whole answer cause it would make this essay unreadable. For those interested (and you should better be if you'r gonna build your bots on this :-) the whole reply is available here. You better download that file and view it with a good editor cause your browser probably won't render it correctly.
Ok, you certainly guessed it now: The whole bazar is stored in the windows registry. A quick search for 'Excitegrammar' in the registry confirm it.
So, what's left? Well, I spoke above about some binary data being sent along with your private details to the vorlon server. It becomes quickly apparent (especially when you compare that POST request with one sent by an old version -3.0200- of Webferret) that the version number, revision and patch level are included, respectively at offsets FE, FF/100 and 109 in those files. This allows the /update script to send back only the necessary updates to your current version of WebFerret. And this, as opposed to your Name, Company and Country, isn't malicious at all, quite the contrary.
Well, this is exactaly what I was looking for. In the registry I can find all the informations WebFerret uses to
build an url query and to parse the results for each search engines it supports.
At first sight, it seems they uses a mix of regular expressions with embedded scripts.
For example, take this : <a href=*.('http://[eh; tb; >url|*.]')*.\"> . It seems clear that what this do is
to
match the result page against <a href=*.('http://[*.]')*."> and then to assign the content of [ ] to an url
variable (>url), after some unknow 'eh; tb;'
I'll skip my experiments (they were quite boring, much more than what you are actually reading, which is already passably boring) and deliver you my findings on a silver plate:
$search=; | value <= 'search=' |
<+urlquerytext; | add the value of the variable urlquerytext to value
If urlquerytext='fravia', value will be 'search=fravia' |
$+&c=web&start=0&showSummary=true&perPage=50; | add the given string to value
value will now be : 'search=fravia&c=web&start=0&showSummary=true&perPage=50' |
>urlquery | Assign the current value to a variable named 'urlquery' |
Although I figured out the meaning of most of the functions/syntax, I'm convinced there are much more juicy things to learn inside WebFerret itself (like functions that are implemented but not yet used for any search engines). Alas! My reversing capabilities doesn't go that far and i'm lost in the dissasembly (especially when it comes to something written in C++ with classes and so on, which is the case for WebFerret). So, if anyone of you already did that work or is going to investigate this further, I would love to hear about it, as this is actually what does interest me the most (I suppose you already guessed what i'm trying to do :-)
Ok, this is the second discovery and probably what some of you were looking for:
how to add more engines to
webferret. Well, should be quite easy if you followed me up to now: Just
write a little registry patch file.
As an example, we'll add google to the list of engines supported by WebFerret.
Here is what should be added to the registry :
[HKEY_CURRENT_USER\Software\FerretSoft\NetFerret\CurrentVersion\Web] "InstalledEngines"=strings:\ "AltaVista",\ "AOLNetFind",\ "Anzwers",\ "CNET",\ "EuroSeek",\ "Excite",\ "FindWhat",\ "GOTO",\ "HotBot",\ "ICQ",\ "LookSmart",\ "LycosUSA",\ "MSN",\ "SearchUK",\ "WebCrawler",\ "Google" "ActiveEngines"=numbers:\ 1,1,1,1,1,1,1,1,1,1,1,1,1,1,1,1 "GoogleName"="Google" "GoogleURL"="http://www.google.com/search" "GoogleHome"="http://www.google.com/" "GoogleQueryType"="lip" "GoogleMethod"=dword:00000000 "GoogleQueryCommand"="$WebFerret; >httpUser-Agent; $q=;<+urlquerytext; $+&lr=&safe=off&sa=N; >urlquery" "GoogleQueryOps"=strings:" + "," OR " "GoogleGrammar"=strings:\ "R:<p><A HREF=[>url|*.]>[eh;tb;>title|*.]</A><font size=-1><br>[eh;tb;>abstract|*.]<font color=green>",\ "S:<a href=/search\?[tb; >urlquery|*.]>",\ "N:<b>Next</b>"First,we have to add our new engine to the list of installed one (backdrawn: see below). Next we define some new Google specific terms: It's Name, URL, Home URL, Query type, request method, Query command, Query operands and finally the parsing grammar. I won't enter into details, most of those values are self explanatory. However, some have still unknow meanings to me. The QueryType, for example can take values like lip, lpp, sa, sap... But I have no clues what this means, so some experiments on this would be welcome. The Method indicates if WebFerret must use a POST (00000001) or GET (00000000) method.
REGEDIT4 [HKEY_CURRENT_USER\Software\FerretSoft\NetFerret\CurrentVersion\Web] "InstalledEngines"=hex(7):\ 41,6C,74,61,56,69,73,74,61,00,41,4F,4C,4E,65,74,46,69,6E,64,00,41,6E,7A,77,65,72,\ 73,00,43,4E,45,54,00,45,75,72,6F,53,65,65,6B,00,45,78,63,69,74,65,00,46,69,6E,64,\ 57,68,61,74,00,47,4F,54,4F,00,48,6F,74,42,6F,74,00,49,43,51,00,4C,6F,6F,6B,53,6D,\ 61,72,74,00,4C,79,63,6F,73,55,53,41,00,4D,53,4E,00,53,65,61,72,63,68,55,4B,00,57,\ 65,62,43,72,61,77,6C,65,72,00,47,6F,6F,67,6C,65,00,00 "ActiveEngines"=hex:01,00,00,00,01,00,00,00,01,00,00,00,01,00,00,00,01,00,00,\ 00,01,00,00,00,01,00,00,00,01,00,00,00,01,00,00,00,01,00,00,00,01,00,00,00,\ 01,00,00,00,01,00,00,00,01,00,00,01,00,00,00,00 "GoogleName"="Google" "GoogleURL"="http://www.google.com/search" "GoogleHome"="http://www.google.com/" "GoogleQueryType"="lip" "GoogleQueryOps"=hex(7):20,2B,20,00,20,4F,52,20,00 "GoogleQueryCommand"="$WebFerret; >httpUser-Agent; $q=;<+urlquerytext; $+&lr=&safe=off&sa=N; >urlquery" "GoogleGrammar"=hex(7):\ 52,3A,3C,70,3E,3C,41,20,48,52,45,46,3D,5B,3E,75,72,6C,7C,2A,2E,5D,3E,5B,65,68,3B,\ 74,62,3B,3E,74,69,74,6C,65,7C,2A,2E,5D,3C,2F,41,3E,3C,66,6F,6E,74,20,73,69,7A,65,\ 3D,2D,31,3E,3C,62,72,3E,5B,65,68,3B,74,62,3B,3E,61,62,73,74,72,61,63,74,7C,2A,2E,\ 5D,3C,66,6F,6E,74,20,63,6F,6C,6F,72,3D,67,72,65,65,6E,3E,00,\ 53,3A,3C,61,20,68,72,65,66,3D,2F,73,65,61,72,63,68,3F,5B,74,62,3B,20,3E,75,72,6C,\ 71,75,65,72,79,7C,2A,2E,5D,3E,00,\ 4E,3A,3C,62,3E,4E,65,78,74,3C,2F,62,3E,00,00 "GoogleMethod"=dword:00000000Save this registry patch to whatever you fancy and it's ready to be merged into the registry. For your convenience, this file is available here
First let me be clear: I'm not stating that you should use
webferret nor that adding a new engine to
WebFerret is something really worth doing per se.
I personnaly never used WebFerret before nor probably will I ever use it in the future.
The
purpose of this
essay was simply to show you first that
even without any software reversing knowledge you can twickle software to do
what you want it to do. Second, I
tried to show you that there is a lot to learn
by studying some interesting
targets. If I didn't studied WebFerret i would probably still be trying to
figure out how to write a unniversal parsing
script. WebFerret gave me much inspiration on this topic.
I can now apply what I have learned in this context
to what was my original primary target: writing a sort of unniversal parser.
I now know
that some regular expression + some 'very simple language' scripts could be very helpful.
If everything goes fine, I
could end up with something worth publishing again very soon. So stay tuned :-)
As always, but here more than ever, feedbacks and critics, suggestions on this topic are really welcome. You can reach me
at phplab@2113.ch.
Thank you for reading this essay, hope it was worth it.
(c) Laurent 2001
|
|