|
|
|
Copernic 2001 Pro (Version 5.0)
Light Version from: http://wwww.copernic.com/
[Use it to find its bigger brother ;)]
W32Dasm 8.93 - Recommended HexWorkshop - Essential Tool Filemon - Essential Tool C Compiler - Language for Tool Writing
I have been on a quest to find the query URL's and structure of queries as part
of my quest for data for my local search bot. After my last essay was finished
and the targets data has been extracted. With a fresh set of data in my hands,
I sat down and started writing a converter to put the data into a common file format.
This was where this essay begins, I had decided on a basic subset of the data
to use, but thought I should check it against other sources (in other bots),
first on the pile was webferret, a search-bot about which
Laurent has written and essay that you will find
here.
As is my usual trend I did not let the software within wire distance of the
internet, so did not get the updates and the dataset provided as standard is
pretty poor - so threw it in the bin.
Laurent had mentioned to me that I might find copernic interesting. Umm
Could this be a good target, I had heard of it, but had until recently steered
clear of all these search-bot programs. This was because I know you do not get anything
for nothing, and the thing that makes them money is knowing your searches, and
being able to make you sit through advert after advert after advert...
So off to the web, do a search for copernic and read some reviews. Seems like
another of these local search bots, where the main advantage is it knowing how
to talk to the search engines and co-ordinate the replies and present them to
the user in a nice simple way. This sounded interesting and it seemed to support
a large number of search engines but no specific numbers were given. I went to
some lengths to avoid visiting any of the copernic sites, for reasons, which will
become apparent later.
So the target was picked, next step was to go find it on the web.
So off to the web and Grabbed the Pro version, did not even go near their
site, so if they are busy checking logs you will not find me ;)
The Pro version came with a key - nice!
Out came the clean PC. This machine was not connected to any network or the internet,
after all we did not want any uncontrolled data to go out ;). Filemon was started
and left running and then copernic was installed on the pc. After the installation
the program was not run, and the installation process finished. The filemon log
of installation was then saved for later reference. So now to clear the Filemon log
and leave it running, to log files accessed by program.
Next step is to run the program and set it to point to the local proxy. Right - first
thing it does it ask you some registration details, when all data has been entered and
proxy set up it
tries to connect to get an update. [This is very optimistic of the company - that
all people who install and run it first time will be connected to the internet]
Right, so look at logs on proxy and there are a number of requests to "updates.copernic.com"
Now lets try a search, for 'searchlores' . At this point I know it is not going to get
any results, as the proxy does not connect to the internet, just returns 404 for every
request, as though routing was broken. So did the search. Look at proxy logs and in
amongst the requests for search engine pages, there is one that stands out to
"regcards.copernic.com".
Now follows an explanation of these requests, as they are quite interesting. They go
to the copernic.com domain so they must contain some user data or be used to track
users of this program in some way.
Firstly lets look at the update requests:HEAD http://updates.copernic.com/copernic2001upd/copernic2001plus.cui HTTP/1.1
This is the request sent:
HEAD http://updates.copernic.com/copernic2001upd/copernic2001plus.cui HTTP/1.1 Host: updates.copernic.com Accept: */* Connection: close User-Agent: Copernic Pragma: no-cacheSecond it does a : GET http://updates.copernic.com/copernic2001upd/copernic2001plus.cui HTTP/1.1
GET http://updates.copernic.com/copernic2001upd/copernic2001plus.cui HTTP/1.1 Host: updates.copernic.com Accept: */* Connection: close User-Agent: Copernic Pragma: no-cacheWhy do a HEAD, if when it fails you go on to do the GET anyway, why not simply do a GET, this seems very pointless ;)
GET http://www.copernic.com/cgi-bin/nph-osnvs2.pl?ns=##########################&iu=%7B********-****-****-****-************%7D&lo=http://updates.copernic.com/copernic2001upd/copernic2001plus.cui&cl=0 HTTP/1.1 Host: www.copernic.com Accept: */* Connection: close User-Agent: Copernic Pragma: no-cacheThe field marked with '*'s will be explained in the next request as it is a common parameter which is passed in both requests. The field marked with '#'s also seems to be a number of some form to be sent to their server.
Now lets look at the regcard information: POST http://regcards.copernic.com/cgi-bin/regcard HTTP/1.1
This is the request sent:
POST http://regcards.copernic.com/cgi-bin/regcard HTTP/1.1 Host: regcards.copernic.com Accept: */* Connection: close User-Agent: Copernic Content-Type: application/x-www-form-urlencoded Content-Length: 129 %5Ejohndoe%40mort.somewhere%5EUnited%20States%5E12345%5E0%5E0%5EENGPRO%5E5001%5E%********-****-****-****-************%7D%5EFrom%20web%20site%5E%5E0%5EJohn%20DoePlain text of last line: ^johndoe@mort.somewhere^United States^12345^0^0^EENGPRO^5001^{********-****-****-****-************}^From the web site^^0^John Doe
Value | Description |
johndoe@mort.somewhere | Email Address |
United States | Country |
12345 | Zip Code |
0 | Unknown |
0 | Unknown |
ENGRPRO | Version of Software |
5001 | Registration Card Version |
{********-****-****-****-************} | GUID |
from web site | Referrer for Product |
Unknown | |
0 | Unknown |
John Doe | Username |
"http://regcards.copernic.com/cgi-bin/regcard" "http://updates.copernic.com/copernic2001upd/" "http://www.copernic.com/cgi-bin/nph-osnvs2.pl" "www.copernic.com"The first ones can be nullified by writing "http://127.0.0.1/" at the start of the strings. This then will prevent all accesses to their servers. This is a good alternative to the hosts file, as the program seems to bypass the hosts if using a proxy and just sends the requests straight to the proxy.
So next step is to close the program, save the filemon log and have a look around my system.
I had a browse through the install filemon log file and made a note of the location of files
added to my system. The first thing that hit me was a load of '.csf' files which had
the names of search engines, and a list of '.ssf' files which seemed to represent
categories.
The next thing is to look at the run filemon log, it seems to read the .ssf and .csf files
and then create a set of files, under the directory 'data' which seems to be a user profile
with the users name as the folder name. Ummm, so some kind of translation or copying going
on, but a lot fewer files get written than read.
So to open up the main executable in our favourite hex viewer and have a quick browse, but
first to extract all the strings from the file. Had a browse through the strings and it
looks like it was coded in DELPHI. This was just a hunch and I remembered having a copy of
DFM-Explorer around , so tried it on the file and sure enough out came all the resources,
so it is for sure delphi. so the task is now to find a delphi decompiler. My thinking here
was that even though it might not be needed, if it is then it might make the program code
a bit easier to understand. Also better to check this option to start with rather than
later. As a teacher once told me "Always get all your tools ready before starting any task!"
The catch is : this is a delphi application, warning bloatware imminent. I had thought that
the executable was a bit on the large side for something so seemingly simple, and this explained
it. No extra DLL's or files, so the delphi libs must be statically linked. I remember when
applications used to fit on a floppy, now the icon files will not ;(.
First step is to grab ye ole webbrowser and search for a delphi decompiler (I must admit shame
and say I had never used one before). Right the one that pops up the most in the list when
ranked is 'DeDe' by DaFixer!. Ok so lets grab it and let it rip.
A few sips of my drink later and it has finished downloading, so lets run it and see what
it comes up with. DeDe recognises the file and does its stuff, and yes it is delphi because
I now have the forms and pascal code nicely disassembled on my HD. So a quick browse through
them to get an idea of the structure. umm
I noticed that DeDe also supports exporting all its references to a W32dasm project. Since
one of the steps I was going to do was to disassemble the file, I ran Wdasm and generated
a project file, then pointed DeDe to it and let it do its stuff. Hopefully when it finishes
it will leave a nice big file with the combined references, so that should make life easier
later on. Being able to see the references to the Pascal and Delphi bits should make the code
a bit easier to follow.
While that was running (it takes some time) my next step was to search all the .pas files
for references to 'ssf' and 'csf' to find where it loaded the data files, I did not find
any references of these strings in any of the .pas files. Ok time to load up the W32Dasm
project and have a look in that file. OK PROBLEM! - the project is still being accessed
during the combining of references, so that option is out for an hour or so, as it seems
to take quite some time (35Mb File to process).
So lets have a look around, there are some DLL's in the directory, so lets check them out:
c4dll.dll is Database Engine Library (Sequiter CodeBase Components for Delphi)
xcdunz32.dll is a Zip Library [Xceed Zip Compression Library]
SSCE5253.dll is the Sentry Spelling-Checker Engine [Wintertree Software]
Zip Library - is this just there for the installation or unpacking updates, or might it
be used on the data files? Time to check, if the data files are zipped then they should
be fairly easy to unpack. That would make life very easy ;)
So lets look at the files that were generated when the program was run, the files in
what looked like a profile directory.
channel.ctb seems the most likely candidate, and matches (by some coincidence) roughly
the size of all the .ssf and .csf files. (1,158,690 bytes)
All .ssf - category files (73,718 bytes). All .csf - engine files (1,131,657 bytes)
This seems a strange coincidence, as opening up this file shows it does have the engine
names and the category names (from filenames) but also contains a LOT of space characters,
so given this is in a directory called after the user, this should be the users preferences
for searches or something similar.
Back to the data files, as the only files looking good candidates are the '*.*sf' files
which fit the bill perfectly. So opened one up in notepad and it looks unreadable.
So right, copied three .ssf and three .csf files of different sizes to a temporary
directory to start looking at them. Opened the first one in a hex viewer and noticed
that it is not plain text, ok so it was expected they would be packed or encrypted
in some way, they would not leave their whole product out in the open. But one thing
that did jump out was the pattern of the characters.
Here is an excerpt from one of the files: (Boxes are unprintable characters)
Sssx?y[SSsS3SrQSSSSSSSSsx;SSss= SS'3rrrQPSsS3SrQsS3rpSSssx;[yzys3| xySSss\_yX[yyx;yxSSss?[[ۜ SSss=yzX|SSss;x3SSSSSssx ;xSSss}۸[ySSss=Xy?X|S Sss=Xy?3X|SSs"SSs|xSSss?9X [;y9Xs3xxy99xx{zyٛ;99xSSss; ;__ysSSSss;}xyӐSQP9[yx2Q?|Q ӐSs2Qxٸ;QrRpSSss;Pyp?yy8ظ98{ 'SSss;Pypy8ҙQy90=XP3p0} xy0=XP3p0Q;x0=XP3p0s0=XP3p0'SSss;P };;pNotice the repeated 'SS','SSs' and 'SSss' sequences. Instinct at this point says that this is not a packed file as these repeats would have been eliminated by the compression process. There are other repeated sequences present in the encoded text.
This is the header common to the 1K category files: Auctions and Buyhardware
9D9D5373F41473F414DF78F8F93FDB79 F85BF213535373F05333F073F3515353 125353125353F414535373F31FF9B978 3BDBF91BF41453537373BE3DB8989BF2 11D3535313F0923372727251505373F0 5333F073 . . . . (more data) F414This is also the same in Buysoftware which is a 2k file, apart from one byte
9D9D5373F41473F414DF78F8F93FDB79 F85BF213535373F05333F073 72 [changed F3 to 72] 515353 125353125353F414535373F31FF9B978 3BDBF91BF41453537373BE3DB8989BF2 11D3535313F0923372727251505373F0 5333F073 . . . . (more data) F414This seems the only difference but is not the same in all 2k files...in the copernic.csf file it is: 9D9D5373F41473F414DF78F8F93FDB79 F85BF213535373F05333F0 53 [changed 73 to 53] 72 [changed F3 to 72] 515353 125353125353F414535373F31FF9B978 3BDBF91BF41453537373BE3D . . . . (more data) F414different after this..
So this looks like they are all encoded with the same method, and this is some kind of common header to the files.. Also all files seem to end with 'F414'
This does not look like an xor'd pkzip.. as the header is wrong. IF this was a zip file with a zip header, you would expect more bytes to be different, if this was a zip file with the header removed then the data would not show the same repetitive patterns at such regular intervals. This lead me towards thinking they were just encrypted in some way. This was backed up by the observation that they are all sizes from 926 bytes to 3,000 bytes (in all steps) so they are not a fixed structure. (but they do have a header and a footer which seems to be common, could just be some text at start of file, or could designate something else - seems to me like it would be a constant bit at the start of the decoded file, rather than being a packed header or else more of it would change.. so it looks like they are just mildly encrypted and are not packed? hopefully anyway. ;)
The 'F414' sequence bothered me as soon as I saw it, the spacing throughout the file and also the positioning of it, together with the fact that it appeared in the header made me think that this could be '0d0a' or a newline in a text file. This fits with the decoded file being plain text. So made a little tool which copied the file and just changed those bytes over - the result was a file with what looked like reasonable line lengths for a text configuration file. So I was on the right track, or so it seemed.
Here is a snippet of the above file: (with line splits inserted)
Ss s x?y[SSsS3SrQSSSSSS SSsx; SSss=SS'3rrrQPSsS3SrQsS3rp SSssx;[yzys3|xy SSss\_yX[yyx;yx SSss?[[ۜ SSss=yzX| SSss;x3SSS SSssx;x SSss}۸[y SSss=Xy?X| SSss=Xy?3X| SSss;Pypy8ҙQy90=XP3p0}xy0=XP3p0Q;x0=XP3p0s0=XP3p0' SSss;P};;pThis seems to fit the structure of a configuration file, short line lengths. Later in the file are longer lines, about the size of a query URL, so this seems right ;) There is also a pattern to the characters at the start of the line, and notable is that the repeated 'SS' combination appears at the end of strings - this means (hopefully) that it is not a position dependent (or offset) substitution.
After a bit of thinking I was convinced that these files are protected by a substitution cipher, and more looking at the file content seemed to back this up as there are many repeating patterns, as you would expect to see in a file with URL's inside it. So the target was to find the translation function or table. I by this time had discounted a packed format and had also discarded a binary file, it is a plain text file - this may seem like a jump but if you had been sitting on my shoulder you would have seen it the same way.
So there are two methods they could use to achieve this, the first would be to use a lookup table to do the translation and the second would be to use a function to do the same thing. In order to confirm some options, another look at the running program was required, when viewed it seemed they did include all lower and uppercase chars and also European characters - this was important as it means they have to use all 8 bits of the character and cannot throw any away in the function, whereas if they had not included any European characters they might be able to throw a bit away somewhere in the function and this could affect the findings dramatically. It was also obvious that they used normal ASCII characters as the patterns would have been different if they had used some form of unicode or multi-byte character set. This gives us more ammunition for the coming hunt.
One thing I must add at this point is that there are many known attacks on substitution ciphers - these were discarded because they assume a language and work from character occurence probability tables. They are very effective but were discarded for this target as the contents of the configuration file was known not to match normal text as it would be using (presumably) repeated keywords and values which would either be meta tags and/or url's, this meant that they might give some results but would probably not. So I discounted them to save time!
Getting Hands Dirty
DeDe has now finished, so we can start looking at the assembler for the file. First task is to hunt down the references to any .ssf or .csf files. When looking through the file you will find a few references to this string. These were used as a starting point and breakpoints were set on them.
I shall take a wander here - bear with me! When I started looking at DeDe, I was intending to work from the disassembled files and track through the code in order to find the decryption routine which would restore the files to plaintext. Now my priorities had changed somewhat, what I was now after was a portion of the plaintext file and hopefully all of one of the files in memory so that it could be saved. The fact that the cipher seemed to be a substitution one from the data shown above means that although to find the decryption routine would be nice, to find a portion of the plaintext would be just as nice in helping find the result. If they have used a table then hopefully once we have a portion of the plaintext and what it maps to in the encrypted file, finding the table in memory would be very easy. This seems a nicer and quicker approach that reading through page after page of disassembled code trying to put it together. This point is made more by the fact that the app is in delphi, so a simple instruction could quite easily call many functions all over the place.
So trying to stop the urge to go through the code and reassemble what happens, which is very hard. I start the code running in W32Dasm with breakpoints set on every instance of a string that ends in '.ssf' and '.csf'. It soon breaks on one of them. At this point I set auto-api stop, and show parameters for local and system calls and set it running again. What I am hoping for is one of the calls to have a pointer to the plaintext in the call to it.
Here is the bit of code that loads 'Copernic.csf', which is thought to be the master configuration file.
* Possible StringData Ref from Code Obj ->"Copernic.csf" | :52A00A BAB8A75200 mov edx, 52A7B8 :52A00F E8FCA0EDFF call 404110 :52A014 8B55E0 mov edx, dword ptr [ebp-20] :52A017 8B45FC mov eax, dword ptr [ebp-04] :52A01A 8B4020 mov eax, dword ptr [eax+20] :52A01D 8B08 mov ecx, dword ptr [eax] :52A01F FF5158 call [ecx+58] :52A022 8B45FC mov eax, dword ptr [ebp-04] :52A025 8B4020 mov eax, dword ptr [eax+20] // This following call seems to handle the // file and contains a call which exposes the // plaintext :52A028 E8970AFAFF call 4CAAC4 // HANDLEFILE :52A02D 85C0 test eax, eax :52A02F 7425 je 52A056 :52A031 6A00 push 0 :52A033 6A00 push 0 :52A035 A1C4255B00 mov eax, dword ptr [5B25C4] :52A03A 8B00 mov eax, dword ptr [eax] :52A03C 8B4050 mov eax, dword ptr [eax+50] :52A03F BA02000000 mov edx, 2The code below is the start of the HANDLEFILE routine:
* Referenced by a CALL at Addresses: |:4EB84D, :52A028, :599F7B, :59A81A :4CAAC4 55 push ebp . ... next part is further down the function. . :4CAAFA 8D55E8 lea edx, dword ptr [ebp-18] :4CAAFD 8B45FC mov eax, dword ptr [ebp-04] :4CAB00 8B08 mov ecx, dword ptr [eax] :4CAB02 FF511C call [ecx+1C] :4CAB05 8B45E8 mov eax, dword ptr [ebp-18] :4CAB08 BA01000000 mov edx, 1 // This function has the plain text for the // line from the file passed into and outof // it, so the decoding must happen before this!!! :4CAB0D E892EDFFFF call 4C98A4 // [ebp-10] points to the start of text, both into // and out of this function
So we have found a function that is called with one of the parameters as the plaintext for the file currently being handled. This is what we were after, so remove all other breakpoints and set a new breakpoint on 0x004CAB0D and make sure we tick the display parameters to local calls in W32Dasm. Right now every time we hit this function filemon tells us which file we are reading and the parameter display gives us the location of the string.
After placing the breakpoint and grabbing a string of plaintext,
The start of the plaintext is: "FF01" - 0x46 0x46 0x30 0x31 0x0d 0x0a
While looking at this, I noticed a bit of code further down the disassembly listing, which jumped out at me as some possible plaintext.
This is the code that seems to handle parsing the configuration files:
* Possible StringData Ref from Code Obj ->"DisplayName" :599FA0 BA14A65900 mov edx, 59A614 :599FA5 8B45E4 mov eax, dword ptr [ebp-1C] :599FA8 E8AB4DF2FF call 4BED58 :599FAD 8D45C4 lea eax, dword ptr [ebp-3C] :599FB0 33D2 xor edx, edx :599FB2 E8B5B6E6FF call 40566C :599FB7 8D4DC4 lea ecx, dword ptr [ebp-3C] this code is repeated with the following string references: * Possible StringData Ref from Code Obj ->"Description" * Possible StringData Ref from Code Obj ->"HomePage"So this bit of code is parsing a file of some kind looking for the identifiers given in the string references, and so that means our file MUST contain some of the above strings, as they do not seem to be used in any other files.
Decoding files
So now we have a portion of the plaintext written down (or in a file) and this looks very good, and seems to confirm a lot of things. The string pointed to is shown below, and when looking for the first time you should also refer back to the previous text and see what bells ring ;)
A portion of the plaintext:
FF01 0015Register 0011_Conv="4002->3999 (01-03-09, 10:37:59)" 0011DisplayName="123India" 0011HomePage="http://www.altavista.in/"The order is slightly changed from the order in the file (only a couple of entries swapped) but note the line lengths as these are a giveaway. So we now know for sure that we are on the right track - GOOD! Now you can call me stupid if you want, but '0011' looks a bit like 'SSss' and also the '001' would mean more with the 'SSs' occurences as well.
So this data was saved to a file, and a file was created with the lines mixed and grouped in pairs of matching line length. Then a bit of code to read the lines in and generate a mapping table from the characters in an encoded line to the matching character in the decoded file. This table was then saved to a file as a 256 byte list. Obviously this did not include all characters from the table as the chances were that not all characters would be used in this one file, but the thought was that as I stated above it would either give enough of a clue to find the lookup table in memory, or a clue to the function. It was more appealing than running through lines and lines of code. So the map table was created and any holes were left with their original values, so that errors could be spotted and added. Then this substitution lookup was loaded into the decoder and compiled ready for use. At this point I decided to view the encrypted values with the decrypted values in the form of the table, luckily there was a good spread in the table and luckily I had picked a file with European characters inside it so there were some of those represented in the table.
The original encoded file was then decoded using this partial table as a sortof proof-of-concept for the code and the idea. Rightly so the file was decrypted and shown in total plain text. So I had proved to myself that I was on the right track and I had not even bothered to hunt the disassembly file for the decode routine.
The next step was to check for a lookup table in any of the files, so I took a portion of the substitution table that contained proper plaintext values and did a search of all the files in the root folder for copernic. NOTHING! - so it seems they either do not have it in the files, they generate it or the data is encoded by a function. This was good news, because the last two options both mean that it is created by a function without a lookup table, which means there has to be a simple logic to it, as there are only so many ways to scramble 256 entries using code and without loosing any entries or values.
Now at this point I should really have dived into the dead listing and tried to find the routine, but I took a different approach. I instead turned my attention to the output of my lookup table creator, and the results it had given me. I was trying to look for a pattern within the mapping
This is a partial dump of the lookup table and values, showing the relationship between the encoded and decoded characters: (all values are HEX)
Encoded Decoded 10 2a 11 22 12 3a 13 32 14 0a Encoded Decoded 18 6a f8 6d 19 62 f9 65 1a 7a fa 7d 1b 72 fb 75 1c 4a fc 4d 1d 42 fd 45 1e 5a fe 5d 1f 52 ff 55 38 6b 58 68 39 63 59 60 3a 7b 5a 78 3b 73 5b 70 3c 4b 5c 48 3d 43 5d 40
It did not take long for one to jump out at me, did you pay attention to the above table, did any bells go off? I left holes in the table on purpose so you had to look at it. Have you seen the pattern, it is a nice one I must admit - if you just arrange the table with the characters showing instead of the hex, a pattern does jump out, but not as much as when viewing the hex bytes. Hopefully you should agree with me when I now say that the dead listing approach suddenly lost a LOT of its appeal for this target.
This is a regular pattern based substitution, done by a bit of code which is not very complex or large. I have already gone down the road of abandoning the dead listing, and it is now firmly in the bin. So to reverse this encoding we simply need to analyse the pattern.
It also appears as though the resulting value is made up from two separate nibbles (4bits) and they are bolted together, this is shown by the way they seem to change out of step with each other. Pseudo code:
Variables: IN_A = encoded_byte IN_H = encoded_byte_high_nibble IN_L = encoded_byte_low_nibble OUT_H = decoded_byte_high_nibble OUT_L = decoded_byte_low_nibble to set up the code do the following: IN_A = read_from_file(); IN_H = IN_A & 0xf0; IN_L = IN_A & 0x0f; before exiting: OUT_A = OUT_H | OUT_L;Taking the examples: 0x38 -> 0x6B and 0x39 -> 0x63 It seems like there are two values for the lower nibble, and these seem to be offset by 8, so no matter what the lower value is the higher one is that plus 8. (Look at the table above to confirm this) The use of this value seems to be dependent on the lower bit of IN_A. So the final step is to take the low bit of IN_A and if it is clear to add 0x08 to the output byte.
You can also see that the lower nibble of decoded char (OUT_L) is related to the upper nibble of encoded data (IN_H). And that the upper nibble of decoded char (OUT_H) is related to lower nibble of encoded char (IN_L).
Look at the 0x*8 and 0x*9 values they all map to 0x6*, just like 0x*A and 0x*B values map to 0x7*, and like 0x*E and 0x*F map to 0x5*. Now look at 0xff, the lower value for the lower nibble is '5' so 0xf* -> *5 and 0x*F -> 0x5*.
If you do more checking it will reassure you, what is of interest is that these mappings seem to be the same for both halves, which should make life a lot easier. So now that we have isolated the components, lets create a mapping for the nibbles, just taking the values from the previous table.
Original Nibble Output Nibble 0x0,0x1 0x2 0x2,0x3 0x3 0x4,0x5 0x0 0x6,0x7 0x1 0x8,0x9 0x6 0xA,0xB 0x7 0xC,0xD 0x4 0xE,0xF 0x5
So Putting this together gives us:
Variables: IN_A = encoded_byte IN_H = encoded_byte_high_nibble IN_L = encoded_byte_low_nibble OUT_H = decoded_byte_high_nibble OUT_L = decoded_byte_low_nibble LOOKUP = [2,2,3,3,0,0,1,1,6,6,7,7,4,4,5,5] to set up the code do the following: IN_A = read_from_file() IN_H = (IN_A & 0xf0)>>4 // Get high nibble into low nibble IN_L = IN_A & 0x0f // Isolate low nibble OUT_H = lookup[IN_L]<<4 // To get into high nibble OUT_L = lookup[IN_H] // this is low nibble OUT_A = OUT_H | OUT_L; // merge the two if ((IN_A & 0x01) == 0) // This does the offset on OUT_A = OUT_A + 0x08 // the lower nibble
This can be simplified to the code below:char lookup[]={2,2,3,3,0,0,1,1,6,6,7,7,4,4,5,5}; int decode_character(int encoded) { if (encoded & 0x01) return( (lookup[encoded&0xf]<<4) + lookup[(encoded&0xf0)>>4] ); else return( (lookup[encoded&0xf]<<4) + lookup[(encoded&0xf0)>>4] +8 ); }
I have not looked in the executable for this code or the bit that does the same function as that does not matter. If you use the above function as a decoder for each character in all the '*.ssf' and '*.csf' files within the programs directorys it will convert them to the plaintext (unencoded) versions.
So I had the files in plain text form and they were all text configuration files as I had thought, so I counted (in the version I have) 754 search engines or URL's - that is quite a lot of data, and also this product has also got them grouped nicely, which will help with the problem of how to organise them, its already done.
So at this point I am pretty happy with how things have gone, I have a routine which decodes their input files and have converted them all to plain text, so the data is now usable. And to think this has been achieved with only minimal time in front of code, only the period when scanning for the plain text.
Scripting Language
When examination of the decoded files was started, one of the first files looked at was 'copernic.csf' as this sits in the approot and is named the same as the application, this was a good choice for master configuration or some kind of global parameters file.
You should remember from earlier that most lines in the conf files seem to have a 4 digit number (0011) of varying value at the start of the line. The example given earlier did not show this as clearly as the following example hopefully will. This is an instruction for the internal scripting language to tell it how to handle the rest of the line.
This is the decoded version of 'copernic.csf':
FF01 1 TimeStamp=2001-03-09 00:00:00 0015Register 0011ChannelSet="Ad" 0011ChannelSet3="Ad" 0011Version=2525 0011FileVersion=0 0011SoftwareVersions="eng;engplus;engpro;fra;fraplus;frapro" 0016 0015Init 0011UseCookies=True 1001 0011SearchQuerySeparator="+" 1003 0011Key=SearchQuery 0011RNDSEED="" 0018Length(RNDSEED)<>12 0011RNDSEED=String(Random(99999999)*Random(9999)) 0019 0011T=Random(999999) 0011PromoT=Numeric(Substring(RNDSEED,8,1)) 0011PromoTI=Numeric(Substring(RNDSEED,9,1)) 0011Random100=Numeric(Substring(RNDSEED,10,2)) 0011SourceFLYCAST=Replace("ENG|1|http://ad-adex3.flycast.com/server/_img/Copernic/software/$RANDOMNUMBER$|http://ad-adex3.flycast.com/server/click/Copernic/software/$RANDOMNUMBER$","$RANDOMNUMBER$",String(T)) 0011Source247ENG=Replace(Replace("ENG|1|http://connect.247media.ads.link4ads.com/serv/2/Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$?$KEY$|http://connect.247media.ads.link4ads.com/click/2/Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T)) 0011Source247FRA=Replace(Replace("FRA|1|http://connect.247media.ads.link4ads.com/serv/2/fr-Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$?$KEY$|http://connect.247media.ads.link4ads.com/click/2/fr-Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T)) 0011SourceUFS="UFS|1|http://banner.unifiedweb.com/cgi-bin/getimage.exe/copernic?GROUP=copernic|http://banner.unifiedweb.com/cgi-bin/redirect.exe/copernic" 0011SourceVALUECLICK="VALUECLICK|1|http://kansas.valueclick.com/cycle?host=hs0136917&b=1&noscript=1|http://kansas.valueclick.com/redirect?host=hs0136917&b=1&v=0" 0011SourceVALUECLICKOLD="VALUECLICK|1|http://kansas.valueclick.com/cycle?host=hs0194203&size=468x60&b=indexpage&noscript=1|http://kansas.valueclick.com/redirect?host=hs0194203&size=468x60&b=indexpage&v=0" 0011SourceSERVERFRA4552=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/fra/recent/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/fra/recent/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T)) 0011SourceSERVERENG4552=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/eng/recent/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/eng/recent/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T)) 0011SourceSERVERFRA4551=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/fra/old/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/fra/old/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T)) 0011SourceSERVERENG4551=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/eng/old/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/eng/old/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T)) 0012Find("ENGUFS",Edition)<>0 0011SourceUrl=Entry(3,SourceUFS,"|") 0011TargetUrl=Entry(4,SourceUFS,"|") 0013 0012(Find("PLUS",Edition)<>0)or(Find("PRO",Edition)<>0) 0012BuildNumber>4551 0011SourceUrl=Entry(3,SourceVALUECLICK,"|") 0011TargetUrl=Entry(4,SourceVALUECLICK,"|") 0013 0011SourceUrl=Entry(3,SourceVALUECLICKOLD,"|") 0011TargetUrl=Entry(4,SourceVALUECLICKOLD,"|") 0014 0013 0012BuildNumber>4551 0011SelfPromoPercent=0 0013 0012Substring(Edition,1,3)="FRA" 0011SelfPromoPercent=0 0013 0011SelfPromoPercent=10 0014 0014 0012Random1004551 0012Substring(Edition,1,3)="FRA" 0011SourceUrl=Entry(3,SourceSERVERFRA4552,"|") 0011TargetUrl=Entry(4,SourceSERVERFRA4552,"|") 0013 0011SourceUrl=Entry(3,SourceSERVERENG4552,"|") 0011TargetUrl=Entry(4,SourceSERVERENG4552,"|") 0014 0013 0012Random100>54 0012Substring(Edition,1,3)="FRA" 0011SourceUrl=Entry(3,Source247FRA,"|") 0011TargetUrl=Entry(4,Source247FRA,"|") 0013 0011SourceUrl=Entry(3,Source247ENG,"|") 0011TargetUrl=Entry(4,Source247ENG,"|") 0014 0013 0011SourceUrl=Entry(3,SourceVALUECLICKOLD,"|") 0011TargetUrl=Entry(4,SourceVALUECLICKOLD,"|") 0014 0014 0014 0014 0014 0011RotationInterval=120000 0016 11A2
This is a table giving the function for each command string:
String COMMAND Description 0011 SET SET variable=value 0012 IF IF expression THEN 0013 ELSE ELSE 0014 ENDIF ENDIF 0015 FUNC Function Definition Start 0016 ENDFUNC End Function Def 0018 WHILE WHILE expression DO 0019 WEND End While Loop
Also there are some functions:
Replace(String A,String B,String B)
This takes the string A, it then finds all occurrences of string B and replaces them with the string in C. So Replace("ABCCCBA","CCC","YYY) would return "ABYYYBA"
Substring(String A,Number B,Number C)
This takes the string A and grabs C characters, starting at position B. So Substring("ENGPRO",1,3) would return "ENG"
Numeric(Number A)
This returns the number represented in A as a string. So Numeric("100") would return 100
Length(String A)
This returns the length of the String passed in. So Length("ENG") would return 3
Random(Number A)
This returns a random number between upto the value of A. So Random(99999) could return 99999.
String(Number A)
This returns the string representation of the Number A. So String(100) would return "100"
Find(String A,String B)
This returns true if string A is found in string B. So Find("PRO","ENGPRO") would return true
Entry(3,Source247FRA,"|") Entry(Number A, String B, String C)
This returns an entry in a string which contains delimited values. A is the number of the data segment to return. B is the string which holds the data. C is the character used for the separator.
Using the example Entry(NUM,"AAA|BBB|CCC|DDD","|")
if NUM is set to 1 it would return "AAA", if NUM is 2 then "BBB", if NUM is 3 then "CCC".
Using the above command table, if we translate the script into normal code language we get the script below:
FF01 1 TimeStamp=2001-03-09 00:00:00 FUNC Register SET ChannelSet="Ad" SET ChannelSet3="Ad" SET Version=2525 SET FileVersion=0 SET SoftwareVersions="eng;engplus;engpro;fra;fraplus;frapro" ENDFUNC FUNC Init SET UseCookies=True 1001 SET SearchQuerySeparator="+" 1003 SET Key=SearchQuery SET RNDSEED="" WHILE Length(RNDSEED)<>12 SET RNDSEED=String(Random(99999999)*Random(9999)) WEND SET T=Random(999999) SET PromoT=Numeric(Substring(RNDSEED,8,1)) SET PromoTI=Numeric(Substring(RNDSEED,9,1)) SET Random100=Numeric(Substring(RNDSEED,10,2)) SET SourceFLYCAST=Replace("ENG|1|http://ad-adex3.flycast.com/server/_img/Copernic/software/$RANDOMNUMBER$|http://ad-adex3.flycast.com/server/click/Copernic/software/$RANDOMNUMBER$","$RANDOMNUMBER$",String(T)) SET Source247ENG=Replace(Replace("ENG|1|http://connect.247media.ads.link4ads.com/serv/2/Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$?$KEY$|http://connect.247media.ads.link4ads.com/click/2/Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T)) SET Source247FRA=Replace(Replace("FRA|1|http://connect.247media.ads.link4ads.com/serv/2/fr-Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$?$KEY$|http://connect.247media.ads.link4ads.com/click/2/fr-Copernic/ros/468x60/40543;uniq=$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T)) SET SourceUFS="UFS|1|http://banner.unifiedweb.com/cgi-bin/getimage.exe/copernic?GROUP=copernic|http://banner.unifiedweb.com/cgi-bin/redirect.exe/copernic" SET SourceVALUECLICK="VALUECLICK|1|http://kansas.valueclick.com/cycle?host=hs0136917&b=1&noscript=1|http://kansas.valueclick.com/redirect?host=hs0136917&b=1&v=0" SET SourceVALUECLICKOLD="VALUECLICK|1|http://kansas.valueclick.com/cycle?host=hs0194203&size=468x60&b=indexpage&noscript=1|http://kansas.valueclick.com/redirect?host=hs0194203&size=468x60&b=indexpage&v=0" SET SourceSERVERFRA4552=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/fra/recent/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/fra/recent/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T)) SET SourceSERVERENG4552=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/eng/recent/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/eng/recent/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T)) SET SourceSERVERFRA4551=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/fra/old/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/fra/old/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T)) SET SourceSERVERENG4551=Replace(Replace("BANNERSERVER|1|http://bannerpush.copernicserver.com/RealMedia/ads/adstream_nx.cgi/copernicclient/free/eng/old/$RANDOMNUMBER$?$KEY$|http://bannerpush.copernicserver.com/RealMedia/ads/click_nx.cgi/copernicclient/free/eng/old/$RANDOMNUMBER$","$KEY$",String(Key)),"$RANDOMNUMBER$",String(T)) IF Find("ENGUFS",Edition)<>0 // if ENGUFS version SET SourceUrl=Entry(3,SourceUFS,"|") SET TargetUrl=Entry(4,SourceUFS,"|") ELSE IF (Find("PLUS",Edition)<>0)or(Find("PRO",Edition)<>0) // PRO or PLUS IF BuildNumber>4551 // BUILD > 4551 SET SourceUrl=Entry(3,SourceVALUECLICK,"|") SET TargetUrl=Entry(4,SourceVALUECLICK,"|") ELSE // BUILD <= 4551 SET SourceUrl=Entry(3,SourceVALUECLICKOLD,"|") SET TargetUrl=Entry(4,SourceVALUECLICKOLD,"|") ENDIF ELSE IF BuildNumber>4551 // BUILD > 4551 SET SelfPromoPercent=0 // clear addshow variable ELSE IF Substring(Edition,1,3)="FRA" // FRENCH SET SelfPromoPercent=0 // clear addshow variable ELSE // ENGLISH SET SelfPromoPercent=10 // set addshow to 10% ENDIF ENDIF IF Random100<SelfPromoPercent // if random < addshow SET SourceUrl=Entry(3,SourceSERVERENG4551,"|") SET TargetUrl=Entry(4,SourceSERVERENG4551,"|") ELSE // if random >= addshow IF BuildNumber>4551 // BUILD > 4551 IF Substring(Edition,1,3)="FRA" // FRENCH SET SourceUrl=Entry(3,SourceSERVERFRA4552,"|") SET TargetUrl=Entry(4,SourceSERVERFRA4552,"|") ELSE // ENGLISH SET SourceUrl=Entry(3,SourceSERVERENG4552,"|") SET TargetUrl=Entry(4,SourceSERVERENG4552,"|") ENDIF ELSE // BUILD <= 4551 IF Random100>54 // if random > 54 IF Substring(Edition,1,3)="FRA" // FRENCH SET SourceUrl=Entry(3,Source247FRA,"|") SET TargetUrl=Entry(4,Source247FRA,"|") ELSE // ENGLISH SET SourceUrl=Entry(3,Source247ENG,"|") SET TargetUrl=Entry(4,Source247ENG,"|") ENDIF ELSE // random <= 54 SET SourceUrl=Entry(3,SourceVALUECLICKOLD,"|") SET TargetUrl=Entry(4,SourceVALUECLICKOLD,"|") ENDIF ENDIF ENDIF ENDIF ENDIF SET RotationInterval=120000 ENDFUNC 11A2So this is a script which seems to control all the adverts, so surely a bit of creative writing is called for. As we already have a decoder we can simply reverse the process to encode the file after we have created the new one.
We can also figure out a couple of other things, the first is that the following segment is the header for each file, this does not seem to contain any of the found script commands, or even the characters for them. This segment seems to be present at start of all the files:
FF01 1 TimeStamp=2001-03-09 00:00:00The second is this entry at the end of the file, which seems to be a footer of some kind - when first looked at it appears that is possibly some form of CRC.
11A2How about if you are told that the length of this file in HEX is 0x11C4. Another example is a file with 03AC and a file length of 0x3CE.
So if we do 0x11c4 - 0x11a2 we get 0x22 , and 0x3CE - 0x3AC = 0x22, this means that this entry is the length of the file minus 0x22 (34 dec). So if we are to alter the config file (with the hope of replacing it) then we should put the correct value into this entry as well as encoding the file.
It should be noted that in experiments the file was not parsed and loaded unless this filelength value was correct, so copernic probably uses it to parse the input file, to strip the header and so it must give the data length within the file. This value should be set to the correct value!
Search Query Spying
It should be noted that all adverts that are grabbed from the two servers "bannerpush.copernicserver.com" and "connect.247media.ads.link4ads.com" contain the user query variable from the script in the request. This means that if your parameters cause adverts to be grabbed from either of these two locations then they are getting details on what you are searching for.
Your can verify this for yourself by looking at the above script and finding the entries for these two servers.Advert Removal
Even though the 'PRO' version has a tick box to turn off adverts, the assumption was made that the free version probably displays loads of adverts. Also why would anyone with the pro version have the tick box turned on - that really puzzles me, apart from if they use the same dialog and just have it set so it is ticked and disabled in the free version so the user cannot change it - I will not verify this. But this gave me an idea, if all versions use the config files then we can make a new one for the free version, thus removing that part of the whole advert problem.
So the task was to create a new version of 'copernic.csf' which has the references to the advert servers removed, because I was not sure of the effect of returning empty strings, I chose to instead point the requests to the local machine. This should at least save remote requests and also save the user the bandwidth in getting the advert images.
This is my version of the script:FF01 1 TimeStamp=2001-03-09 00:00:00 0015Register 0011ChannelSet="Ad" 0011ChannelSet3="Ad" 0011Version=2525 0011FileVersion=0 0011SoftwareVersions="eng;engplus;engpro;fra;fraplus;frapro" 0016 0015Init 0011UseCookies=True 1001 0011SearchQuerySeparator="+" 1003 0011SelfPromoPercent=0 0011SourceUrl="http://127.0.0.1/" 0011TargetUrl="http://127.0.0.1/" 0011RotationInterval=120000 0016 11A2
We should not forget to change the size value at the end, so set it to the length of the file minus 0x22, and write the encoded file to 'copernic.csf'.
Also 'updates.copernic.com', 'regcards.copernic.com' and 'www.copernic.com' should be added to your hosts file as local host, or to the banned list for your local proxy ;) This is to stop any updates or personal data transfer from happening. This should stop the software from any phone home tactics and hopefully should remove all adverts without having to touch any of the code. After all we are simply using the programs scripts against itself.
I have not tested this but it should work, and I see no reason why it would not have the desired effect!
Adding a Group
Looking at the decoded .ssf and .csf files you will see that they share the same scripting language with a few additions. So the thought was, as it parses all the files in the set directories and not specific ones, could a new file or files be added and so add engines and groups to the copernic engine. This would mean that we are no longer tied to the ones they supply it would also prove how it works.
Using one of the groups file as an example, the following file was created:
FF01 1 TimeStamp=2001-03-15 00:00:00 0015Register 0011_Conv="4002->3999 (01-03-15, 10:58:42)" 0011DisplayName="Custom" 0011DisplayNames("FRA")="Custom French" 0011DisplayNames("DEU")="Custom German" 0011DisplayNames("ITA")="Custom Italian" 0011DisplayNames("ESP")="Custom Spanish" 0011DisplayNames("POR")="Custom Portugese" 0011Description="Custom Search Group" 0011Descriptions("FRA")="Custom Search Group" 0011Descriptions("DEU")="Custom Search Group" 0011Descriptions("ITA")="Custom Search Group" 0011Descriptions("ESP")="Custom Search Group" 0011Descriptions("POR")="Custom Search Group" 0011ResultsPerChannel=10 0011TotalResults=1000 0011Version=3000 0011FileVersion=1 0011AutoUpdate=True 0011SearchType="keywords" 0016 0015AfterDownload 0016This file was saved as 'Custom.ssf' , encoded using the encode routine and placed in the 'Categories' directory. Now to run the application and see if the group is now in the lists. The puzzling thing was that the group did not appear in the drop down of groups, or the main tab on the left giving all the groups, but if we do a search and then in that screen browse the groups it is there at the bottom of the list. This might be because we have no search engines assigned to this group. When we find the group setting in the category dialog it shows no engines under the group. This is a good sign.
Note that the group appears only at the end of the list in the categories dialog until you have either done a search using that group or closed the program and reopened it, then it seems to be alpha sorted into the list.
Adding a Search Engine
So to create a search engine file, I will use searchlores own Namazu engine as an example, the following file was created:
FF01 1 TimeStamp=2001-03-09 00:00:00 0015Register 0011_Conv="4002->3999 (01-03-09, 10:52:49)" 0011DisplayName="Namazu" 0011HomePage="http://www.searchlores.org/" 0011SupportNew=True 0011Category="Custom" 0011Version=3000 0011FileVersion=2 0011AutoUpdate=True 0011ChannelSet="Custom" 0011ChannelSet3="Custom" 0011SupportOr=True 0011SupportAnd=True 0011SupportQuotes=True 0016 0015Init 0011SourceUrl="http://www.searchlores.org/cgi-bin/search?query=" 0011ResultsPerPage=20 100A("") 1004("searchlores.org") 0011Rules("Range").StartMarker="Search Results for" 0011Rules("Range").EndMarker="" 0011Rules("Address").Key=True 0011Rules("Title").StartMarker=">" 0011Rules("Title").EndMarker="" 0011Rules("Title").StartLine=0 0011Rules("Title").NbLines=1 0011Rules("Description").StartMarker="" 0011Rules("Description").EndMarker="" 0011Rules("Description").StartLine=0 0011Rules("Description").NbLines=1 0011SearchQuerySeparator="+" 1003 0016 0015BeforeDownload 1001 1002("query="+SearchQuery) 1002("result=normal") 1002("sort=score") 1002("max=20") 0016 0015AfterDownload 0016This file was saved as 'Namazu.csf' , encoded using the encode routine and placed in the 'Categories\Engines' directory. Now to run the application and see if the group is now in the lists.
Nope the group is not in the normal lists, but is still in the category dialog, and also if you click on a group to do a search it is in the dropdown box, and when viewing it you can see the Namazu engine within the group. So that worked quite well, still have to figure out how to get it in the quick groups dropdown and the left hand list in the main view.
But I can select the group and also the search engine, and the request does seem to go out (to local proxy). So the engine configuration and group configuration will add in any files you place in the app directorys. This is really nice and opens up a lot of possible routes.
It should be noted that file above file for namazu is not quite complete as the results parsing bit has been taken from another file and may not match but the parameters passed in are correct. Examination of the engine configuration files is recommended as their scripting language allows some very nice things to be performed and is certainly powerful enough for the task required.
After a bit of looking round the menus in copernic (I had not used it before) I spotted in the Tools Menu, Options. In options there is a button labelled 'Category Bar' settings. Ok so lets click on it. So ok we have all the other groups on the right hand side as being part of the category bar (the groups shortcut menus) and Custom sitting alone on the right hand side (not included) so this seems simple. Select the group and add it to the other list using the supplied button, use up or down to put it where you want. Right now exit from this dialog. LO and BEHOLD the groups list on the right hand side now contains the group 'Custom' and if we look inside Custom there is 'Namazu'. So adding groups and engines is now possible with copernic.
Conclusions
My aim was not to take the program apart too much, just to get to the data on the search engines, without spending hours looking at assembler code. But during this task I have found many things out about how this program does other things - some are good and some are bad. There is a lot of hardcoded bits, especially to do with language and syntax (lexicon) which cannot be updated by updates as it is hardcoded, or at least that is how it appears to be. I do not like at all the intrusive phone home features of this product - at least this product uses the proxy you give it for these requests and does not try to bypass it like some similar products.
I was very disappointed with the encryption on the data files, mind you the application was coded in delphi. But seriously you would have thought the developers would have put a bit more in, after all if you are going to put some encryption in, at least make it worthwhile. The task was also made a bit easier by the fact that the filenames and directory structure of the configuration files told you exactly what group or engine each file related to and what to expect in each file. It seems like the author wants you to get the data out of the program, or at least not make our task too hard.
On hindsight (always a good thing) once it had been decided that the method of encryption was a substitution cipher, if the request URL's from the proxy server, the strings from the executable and the details in the groups files were collected it would have been possible to do a known plaintext attack on the encoded files and got enough data to recover the encoding method. This would have worked equally as well as the path I chose to follow, but might have taken a bit longer - but would have had the same result and without having to even touch a disassembler or debugger. I chose to grab the plaintext from the program, so a whole file of plaintext could be grabbed in one go, and a translation table built easily but a partial plaintext lookup generator program would have worked equally as well.
The scripting language they have included interested me most ,it has some nice ideas in it, even though it seems to have its roots in a BASIC type language. Bot writers and OSLSE project fans should examine this and how it works to learn many things. It can provide many pointers and ideas to programmers of VSL's for Bots and other such programs, as it can be very versatile and is simple in concept but offers expandability and flexibility. It also seems a lot more flexible than a simple macro type vsl, where you include commands into strings and then parse them out, as in webferret. This is not meant to mean that one is better or worse than the other, but that both are interesting and that it would be easier to include the webferret idea into this than the other way around. From looking at it, it would be very simple to parse and implement because of its defined structure and the flexibility of being text based and not some form of microcode. This also makes it very suitable for inclusion in a format such as XML, as an embedded script.
Final Thoughts
Firstly I would like to point out that you should try and learn about how your target works before trying to take it apart, reading the essay you should hopefully have seen how the clues picked up early on helped later in the process. While you are installing LOG what the program does. When you run the program for the first and subsequent times LOG what the program does. These log files will not cost you anything to make (apart from the time to start filemon and regmon) and will save you doing it later. Then when a question comes up you do not have to think - oh I must uninstall and reinstall to get a log of every change - not all may be removed or put back on - it depends on the program. So do it the first time. Pick your target and work it, right from the start.
After the script code I realise that I was trying to over complicate matters and produce some fancy parsing macro type thing for the parsing part of my bot, seeing this has brought me back to a simple but very expandable idea, which will be much easier to implement and expand as development requires. Sometimes it takes seeing another point of view to bring some clarity to your thoughts and put you back on the right track.
If you are going to write a paper on a subject you normally would research other works on the same subject first, surely the same should be done if you are working on some software. This might save you from reinventing the wheel as a square. I am not saying use their ideas exactly as they do, but you should observe and learn from them, then create a solution which brings all the parts most suited to your task together.
I would also like to point out that people tend to download and use software without really understanding what it does, or what data about them goes where. You should take care of what software you use and should understand the hidden datas that they send about you. A prime example is the entry in the advert request in this product which gives them what you are searching for, quite apart from the update and regcard information. Most products of this type seem to conduct this form of activity and the users should be made aware of this before using the products.
The use of adverts in products is actually robbing, yes robbing the users of their precious bandwidth, while they are showing adverts you are loosing bandwidth and I believe that reducing the advert shown to a 1x1 image or simply hiding the advert is not a solution as you are still using bandwidth the only proper method of advert removal is to make sure the request never gets out, or at least not as far as your internet connection.
Disclaimer
I must point out that during the writing of this essay, at no point was Copernic allowed to interact with the internet in any way shape or form. It has now been removed from the PC it was installed on and will not be returning.
A lot information was gained from log files, and some reversing of course! ;).
Hope you enjoyed reading.Copyright (c) 2001, WayOutThere
Back to essays
Back to bots lab (c) III Millennium: [fravia+], all rights reserved