Ask, and it shall be given you; seek, and ye shall find; knock, and it shall be opened unto you.
First published @ Searchlores in June 2007 | Version 0.01 | By Nemo
Although search engines often have the data you want, that very same data is not presented in the way you need. For instance search engines show a snapshot of each document on search results, but do not show frequency data about words appearing on those snapshots. In some cases engines' search syntax is not sufficiently versatile. For instance you do not have a way of excluding dynamic documents (those having a '?' in the URL) or of sorting search results by date to see who was the first to have a given idea, even if last modified date is shown on search results... Post processing search results may save the day by presenting / sifting them in a more adequate way. In other cases the information you want is scattered among several services and you need to combine them to do your research.
Thanks to Yahoo! APIs, which offer JSON access with a callback function, and Yahoo! pipes, which offer JSON access with a callback function and can interface with foreign APIs, you can do these tasks browser side using javascript and share those script with others.
Introduction Hello World! Example Security |
With the advent of web 2.0, more and more sites offer APIs (Application Programming Interface) to access some services. Those services range from plain vanilla RSS feeds (a sortof of live bookmarks, which sites update when they have new content and recent browsers automatically check) to more sophisticated ones such as Google or Yahoo! APIs. Until the end of 2005 all but one of those APIs (del.icio.us, bought by Yahoo!) required a server side proxy due to XMLHTTPRequest' cross domain hassles, but everything have changed, because Yahoo! now offers JSON (JavaScript Object Notation) access to its APIs and all you need is a not too old javascript enabled browser (IE 5.01+, Netscape 7.2+, Opera 7.54+, Firefox 1.06+ or Safari 1.3+).
Lets see an easy example to get started on JSON wizardry. The main ingredients to do the magic are the following:
<script type="text/javascript" src="http://differentsite.com/script.js"></script>
which means that the url can be used to encode some actions previously taken by the user. In our example we will use Yahoo's web and image search APIs to encode those actions.var script=document.createElement('script'); script.src=url; script.type="text/javascript"; document.getElementsByTagName('head')[0].appendChild(script);
http://developer.yahooapis.com/TimeService/V1/getTime?appid=YahooDemo&output=json
To see it in action, click in the following button: WARNING: it will not work, if you have javascript disabled. The &rnd=" + Math.random(); is a workaround for IE and Opera which cache the script, when script's url doesn't change (Yahoo ignores unknown variables). Usually script caching is not a problem, unless on those rare cases where the url remains the same, but script's content changes during browser's session.<html> <head> <title>Yahoo Time</title> <script type="text/javascript"> function get_time() { var script=document.createElement('script'); script.src="http://developer.yahooapis.com/TimeService/V1/getTime?appid="; script.src+="YahooDemo&output=json&callback=YahooTime&rnd=" + Math.random(); script.type="text/javascript"; document.getElementsByTagName('head')[0].appendChild(script); } function YahooTime(json) { theDate= new Date(json.Result.Timestamp * 1000); alert("Time: " + theDate.toGMTString()); } </script> </head> <body> <input type="submit" value="Get Time" onclick="get_time()">; </body> </html>
With the previous ingredients we can build our tool [Seekers' Oracle], which add two tricks: multiple calls to Yahoo!'s JSON APIs and the remotion of unneeded script tags as we do not want to push memory limits of browsers due to continual use. What this tool do is building a list of words appearing in search results, be it web or images search results, by consulting Yahoo!'s web and image search APIs (in the later case, what images search API is showing you is an image's thumbnail and text from points 1, 3 and 4 as explained here). For those interested in knowing how the tool was made, use viewsource; for the others, just use it.
The purpose of this tool is twofold:
Patient: Doctor, it hurts when I do this way...
Doctor: Then, don't do that way!
JSON and the dynamic script tag have been bad mouthed for its security issues, but those problems are easy to handle:
(c) Nemo 2007 nemo vitam meam regit@yahoo.com replace white spaces by underscores.