Courtesy of            
http://www.searchlores.org 
Updated in January 2004





          -------------------           -------------------                                
A discussion about the utility of the searchers' library


the library (09/12/03 11:18:43)
    what can we do with it?

    i have printed a number of the articles, and they are of a very high
    academic level.

    let's say we'd want to implement a personalized spider that makes use
    of one of the probabilistic algorithms explained in so many of these
    pdfs. we would still need a huge amount of research, to even understand
    the article.
    well ok, depending on your background you may need more or less, varying
    from a beginners course in statistics and graph theory to just a small
    checkup on what was the SAT satisfiability problem again.

    i'm trying to print the localsearch.pdf now, but i have a hard time convincing
    the printer script that it should print it as size A4, not Letter :)

    anyway, this is hard stuff. any ideas on how we can put such theory to
    work for our searching needs?

    just read it - ok, that's easy. slightly boring, even. but perhaps if you read
    a large number of these essays you get a general feeling for the developing
    of search algorithms, and that might prove very useful indeed.
    pick one and work on it - you need to pick just one, because it's a lot of
    work. you need to do background research to figure out the algos that are
    refered but you don't understand, etc.

    dus. :)

    - ritz
ritz

Re: the library (09/12/03 22:00:55)
    Yeah, or we can go one more step, and try to use tools which able to "summarize" those articles. Oh first we need to find/setup/evaluate/ those ones...
    I feel EVERY JOB, even the ones we doin' for fun is just chained together with a lot of other jobs, and so on... like a web :-)?
    But the interest question IMO is - in this wonderful technoworld of ours, where we have these supercomputers and everything from the net, - what is the shortcut to knowledge? Translate everything to your own language, AND automatically summarize/index them? I am librarian-type, I love to read and learn, but able to read only small part of the stuff I collect. THE ULTIMATE PERSONAL SOFTWARE IS some kind of secretary I think.
    And also a nice but different path is reinventing things. Just build your own tool without reading anything, refine it, learn from its weaknesses.

    You can try some pdf2txt they goes with ghostscript&TeX I think - if you don't need the fancy printing, only text.
have

Re: Re: the library (09/12/03 23:05:24)
     Yeah, or we can go one more step, and try to use tools which able to "summarize" those articles. Oh first we need to find/setup/evaluate/ those ones...
    I feel EVERY JOB, even the ones we doin' for fun is just chained together with a lot of other jobs, and so on... like a web :-)?


    .. so we need to find a way to organize a huge web of knowledge.. hmm
    i think i know some pdfs that discuss such a thing.. ^_^

    and we're back to start..

    perhaps we can put a bit of an inductive loop here somewhere.. ehm
    bootstrapping anyone? :))))

     But the interest question IMO is - in this wonderful technoworld of ours, where we have these supercomputers and everything from the net, - what is the shortcut to knowledge? Translate everything to your own language, AND automatically summarize/index them? I am librarian-type, I love to read and learn, but able to read only small part of the stuff I collect. THE ULTIMATE PERSONAL SOFTWARE IS some kind of secretary I think.
    And also a nice but different path is reinventing things. Just build your own tool without reading anything, refine it, learn from its weaknesses.

    You can try some pdf2txt they goes with ghostscript&TeX I think - if you don't need the fancy printing, only text.


    yeah i got pdf2txt, but have you tried to read some of those scientific
    papers? they're full of formulas (LaTeX indeed) ..

    no, my experience is that these are read better in a comfortable chair, on
    paper.. or in the bus, or waiting for it.. or being somewhere else, pretending
    you're reading something highly interesting (which it is) :)

    i have printed localsearch.pdf, reading it whenever i have time (which is
    not right now) .. looks interesting, it's about traversing a 'small world'
    exponential linkage graph (like the web, or p2p networks) .. perhaps i can
    implement it in a simple spider.. although i never got the spider example
    on searchlores to work.. (didn't try really hard, though)..

    if anyone else has read any of those library pdfs, i'd love to hear about
    their thoughts..

    perhaps write some small summary or whatever about it, to link on the
    library page?

    - ritz
ritz

"Just build your own tool without reading anything, refine it, learn from its weaknesses." (10/12/03 10:41:03)
    My sentiments exactly. I wanted to implement a proximity search algo and tried a couple of 'academic' papers which were just a load of dingo's kidneys, so I scrapped them and started from scratch. Had two minor rewrites since - to allow for fuzzy and exact string searching.

    On the other hand, for the exact string search I will (when I find time to complete it) use an academic paper (also 'crappy' in a way - the guy released three almost identical papers for three concurent years... also, I'm sure that this is a standard algorythm, already existing from some years). Well, it's actually for *substring* searching in a string, but the algo is just what I need.

    The major problem I see with these papers is that they are made to sound pompous in order to impress the other empty 'academic' heads. For a person that just wants to make the damn thing work they are tedious to read, hard to quickly evaluate and can probably be replaced with an hour or so serious thinking on the problem. Now, with the more avant-garde problems they may be the only source of reliable knowledge, which sadly means that one has to swallow the tons of crap to find the important bits in there, but hey - nobody said it should be easy :)

    The scholars just a decade ago had to battle with library catalogues, countless issues of scientific magazines etc. - we're just lucky :))
Mordred

Re: "Just build your own tool without reading anything, refine it, learn from its weaknesses." (10/12/03 15:31:47)
    The odd paper or two is actually readable. And this is the only way to popularize a topic. See how Paul Graham popularized naive bayes by explaining the algo nicely in 'a plan for spam'.

    Then there is Terry Welch who did lzw, and actually made it readable in 84 I think... this started off the mad lzw boom iirc. Before then no one implemented it coz it was locked up in academic-crap-papers.

    There are a million other algorithms that are wonderful to behold and that make me shed a single tear. but most of these are also locked up.

    I have a nice idea, for a popularized algo that still hasn't hit the masses, but is rather cool. It is for visualization, and there is a tool called 'spacemonger' that implements it on win9x. This algo, this visualization algo, has much merit. Check it out and think about how it can be used to represent search matches -- clustering, etc.

    Enjoy and get cracking!

rai.jack

Re: Re: "Just build your own tool without reading anything, refine it, learn from its weaknesses." (10/12/03 19:41:25)
    It seems like many of the writers of the "old school" are much more readable than modern technical authors. Good technical writing is characterized by plain speaking and clear thinking. For instance, Claude Shannon's papers are among the most readable you could ask for, as is Knuth, or Henry Baker. Also, all Paul Graham's writings, Ron Rivest, Jeremy Gibbons (functional programming guy), are all excellent technical writers.

    There is a correlation between the writing ability and the quality of the ideas, I think. Overwrought writing is maybe a result of trying to inflate poor and undercooked ideas.
sonof

Re: Re: Re: Conclusion (10/12/03 22:13:39)
    "The major problem I see with these papers is that they are made to sound pompous in order to impress the other empty 'academic' heads. For a person that just wants to make the damn thing work they are tedious to read, hard to quickly evaluate and can probably be replaced with an hour or so serious thinking on the problem. Now, with the more avant-garde problems they may be the only source of reliable knowledge, which sadly means that one has to swallow the tons of crap to find the important bits in there, but hey - nobody said it should be easy :)"
    "There is a correlation between the writing ability and the quality of the ideas, I think. Overwrought writing is maybe a result of trying to inflate poor and undercooked ideas."

    So there is a chance to write:
    1. a parser, which classify the document and ring the alarm if it is "Overwrought"? Make a diff between wordlists from "good" and "bad" articles, fish the "stopwords" from the bad one.
    2. Or a dumb summarizer/distiller which change exuberant terms to simply ones ( or if they mean nothing then to nothing ). A such-working tool is Solvay's Newspeak.
have

Alternatively, you could play bingo with them :) (11/12/03 16:34:21)
    http://www.hobotraveler.com/wankwordbingo.htm
Mordred

Re: Re: Spacemonger (10/12/03 21:48:34)
    I love Spacemonger, and use it, even thought about make static pictures from the different levels of its output, and use them like imagemaps on the front of a CD-navigating system. I made something like a software-encyclopedia ( software organized in some logic). Now that's interest with Spacemonger you can see that the class "A" is half-size of the class "B", "C" is the tenth of the whole. Great program. In my collection in the same level like Spacemonger there is "Scanner" by Steffen Gerlach, check out that one if you want.
have

http://www.tigerbliss.com/disk_analysis.html (11/12/03 00:31:15)
    Can you explain your CD-navigating system and your software-encyclopedia? It sounds interesting. Thanks!

rai.jack

yet another: http://www.methylblue.com/filelight/ (n/t) (11/12/03 00:52:46)

sonof

Re: explain... (12/12/03 00:13:37)
    Now I remember Sequiaview, I don't like it so much like Spacemonger.
    The navigating system is nothing interest. If you do not understand it from my post above it is my fault. So how you "navigate" your HD with Spacemonger is clean for you. Now if you make a CD that is a static thing. You can make screenshots from Spacemonger's output on every level of the directory-tree from your wished (virtual)CD-root, then make those pictures to imagemaps on html-pages, organize the pages to some structure, burn them together with the original data. What you have in the end is Spacemonger's functionality through a browser without using the program itself ( make the CD a bit more platform-independent ).
    The software-encyclopedia is the same line like up here: I didn't see any solution I loved, so I worked on my own. The target ( was ), to pick the best/freeware tools to doing fundamental things on PC and organize them some reusable form for the younger ones, teach them to not fear the machine, to find simple solutions to problems which look too difficult for a 'user/student', to encourage them. It is a sketch, I wish to split it to a 'necessary' and a 'recommended' pack maybe ( or kid and adult ). The future big thing maybe include docs and programs in each section, so you can read the iso-doc, and try the tool like isobuster. The main nodes are:
    1byteorganization ( partitions, fileformats, crc, backup )
    2packing ( data-in/out, program-in/out )
    3visualization ( viewers, editors )
    4analyze ( data, program, diffs )
    5search ( offline, online, database )
    6convert ( automatic-data/program, manual-sed,awk )
    7tools ( offline, online )mostly the programs which cleans up windows's shit
    8advanced ( registry, filemanagers, inctrl )
    9progtools ( dummys, winshow-likes, PE-muckers )
    Ainterne ( defense, offense, lowlevel )
    Bteaching ( geogr., radcarbon, ET-count, pi, calculators )
    Csecurity ( some crypto-tool )
    AppendixA ( needed, missed dlls, maybe later with scripts )
    AppendixB BOOKS ( comp-related )
    AppendixC WEB_MIRROR

have








Portal

© 1952-2032 Fravia's searchlore, all rights reserved, all wrongs reversed