useful Babel
  
Version 01.01, September 2004
Das grosse européenne bellissimo search
by fravia+
(Taking advantage of free polylinguistic tools when searching)

first published at searchlores in September 2004, in fieri

Part of the searching essays and of the Seekers' Linguistic Station sections.


Introduction
First example: "resale right"
Second example: "web searching"


Das grosse européenne bellissimo search 



The importance of a regional approach when performing some long term searches is well known among seekers, and has been demonstrated ad abundantiam, for instance through the "The 'Moundarren' field case" examined in The synecdochical searching method, where we saw that simply using the term haïku (with the ï dieresis) at the place of haiku brought us new and amazing "french" target constellations instead of the (relatively easy to find) anglo-saxon ones.

We have moreover already seen, in a previous essay by il-li, how to fetch on the fly any EU-document, in any of the older 11 (and more recently 20) EU official languages
There were eleven official languages of the European Union:
Danish, Dutch, English, Finnish, French, German, Greek, Italian, Portuguese, Spanish, Swedish.
9 more languages are slowly coming in since Mid-2004 (slowly from a "document finding" point of view: it takes time to find enough translators)
Czech, Estonian, Hungarian, Latvian, Lithuanian, Maltese, Polish, Slovakian and Slovenian.

Kinda Babel, but it still seems to work and quietly produces millions of translations: Stultorum plena sunt omnia.

As our readers may and will easily imagine, this opens -for us- an incredible wealth of documents of all sorts, on all possible matters, that have been translated in 20 languages and that will at once deliver us FURTHER interesting search arrows even in languages that we may not know at all.

Alas, first of all, in order to search for it, and for its multiple and various linguistic versions, we need the REFERENCES of a given document. Of course, as always on the web, nomen est omen (and reference is reference).

In order to fetch the documents corresponding to a specific text, you may use the following three forms:

input text
modify begin date if needed
      Anything:
   
text: from: to:
input text
modify begin date if needed
   Legislation:
   
text: from: to:
input text
modify begin date if needed
   Preparation
   
text: from: to:


WARNING: the forms above may not work if your Opera + Proxomitron configuration is too "harsh" with javascript scripts and redirections. Should this be the case, just try these forms out with a more "weak" browser ā la Mozilla or M$ie

Else, for your direct text searching needs, you may jump directly to the eurovoc thesaurus and/or to the Glossary

Alternatively (better) you can use the CELEX menu: "Access to Celex menu search is free of charge from 1 July 2004. Nevertheless the use of a login and a password is temporarily required. Please use the login enlu0000 and the password europe."


So, basically, what we are trying to do here is to zap one of the greatest (if not THE GREATEST) free linguistic resources of the web in order to ameliorate (and how!) our searches. Note that it is not simply a matter of using some "dictionary" approach, like those described in our Linguistic Station, for instance trough the use of the available free on-line Vocabularies, though you can of course always use them as well for simple searches, nossir, the point is quite different.

We have here literally MILLIONS of documents in 10 languages (and, since 2004, in 20 languages, duh) that cover sectors as different as "food security" or "animal protection". You are BOUND to find, for any search in your own language (provided it is among the 10 or, later, the 20 official ones of the Union) the english, german, french and spanish equivalent (plus italian, dutch and whatnots).

This opens to our sharp seekers' eyes millions of TARGET web-pages we would not have found with a monolinguistic search!
And thus you'll be able to REPEAT YOUR SEARCHES on the main or regional search engines using the correct linguistic definition for the exact target you'r seeking with a chainsaw of different languages... "Scusate se č poco".

The importance of this both for queries that start in a language different from english (and of course need the english equivalent to be effective on the web) and also at the same time for queries that start in english but often enough badly need some opening to other languages should not be underestimated. So do not underestimate it :-)

The fact that the European Union allows free access to this huge (truly immense) linguistic database is tantamount -in terms of importance- to the US-Pentagon having allowed people to use the military GPS services for free in order -inter alia- to sail around the world :-)

Now let's see an application of this "polylinguistic" approach to some sample searches. As we have seen in the past, there are TWO possible query scripts for the European Union documents: the RenderServlet? script and the s97.vts script.

Let's begin with a broad search for... copyright, a concept which should mean "the right to copy", but has been recently ruthlessly "privatized" by the enemies of knowledge: the commercial powers that be and their political lackeys.

Note the exact formulation of the (broad) searchstring for the RenderServlet? script (btw: it seems that the from and to date fields are unneccessary repeated inside the form):

http://europa.eu.int/servlet/portail/RenderServlet?
search=Query&lg=en&nb_docs=25&domain=&coll=&in_force=NO&title=&party=&year_from=2001&month_from=01&day_from=01
&year_to=2004&month_to=01&day_to=01&text=copyright&from=01%2F01%2F2001&to=01%2F01%2F2004

Here you could change on the fly both document language (&lg=en) and querystring (&text=copyright).
This specific query (copyright) gives us (at the moment) 178 documents.

Of course you can use analogous strings with more terms and -of course- you could also use the s97.vts script instead, for instance
&queryText=re-use+and+commercial+exploitation+of+public+sector+documents,
thus narrowing the results to three documents.

Here you would have to automate the document language selection (awqwardly present in much too many occurrences inside the s97.vts search strings), still you can always modify on the fly the querystring itself (&queryText=re-use+and+commercial+exploitation+of+public+sector+documents).

Of course you can use this ad hoc s97 form, but read the First aid, and then Re-read it.

SEARCH DOCUMENTS on the Europa server
Formulate your query:
  How to formulate your query?
Number of documents to display:
Retrieve only documents updated after: (date can be left blank)

(dd/mm/yyyy)
Document types: Only HTML
Multiple (HTML, PDF, Word, ...)
   




In BOTH approaches, however, the real point is that once you have identified the documents with the exact jargon that interests you for a given search-project, you can at once retrieve them in 10 (since 2004: in 20) different languages and thus replenish your seeker's quiver with new (and powerful) arrows :-)

Let us for instance imagine that the result from the previous "broad search" that happens to interest you most is the following one:

119. Directive 2001/84/EC of the European Parliament and of the Council of 27 September 2001 on the resale right for the benefit of the author of an original work of art
      
  Official Journal L 272 , 13/10/2001 P. 0032... which will translate as the following string: "2001l272" for the "Fetch a JO" form below.


You can then immediately use the form below to fetch the document in PDF format, in english (or in any other language).

Fetch a JO on the fly  ("l" or "c")
(Build a string like 1999l138 or 2001c011)
  
   string → 
                             (Leading zeroes MUST be MANUALLY added)


Note that you will have to click on the link representing the number of the page (in this case the beginning page is pag. 32) in order to fetch the PDF version of this document (yes, pdf formats are VERY annoying, but there are ways to bypass them :-)

Still, of course, you can fetch its html version as well using a DIFFERENT approach: The equivalent HTML version reference would be, as we saw in a previous essay by il-li the following one:
http://europa.eu.int/smartapi/cgi/sga_doc?smartapi!celexapi!prod!CELEXnumdoc&lg=EN&numdoc=32001L0084&model=guichett
in this case "32001L0084" = document number 2001/84 in sector 3, subsector L)

Just change "EN" from the "&lg=EN" substring above to anyone of the following codes and you'll at once fetch the chosen language version:
FR - DE - IT - NL - ES - EL - PT - SV - DA - FI
and since 2004 are also possible:
CS - ET - LV - LT - HU - MT - PL - SK - SL

First example

Forward with this small essay. Let's imagine that your search was all about "resale right" (just to make it quick, I am using a part of the title of the previous document, but of course the real juicy terminological arrows would be found inside the legislative text, as you will soon be able to ascertain if you use this "polylinguistic" searching approach).

Well, in that case, here we go:
resale right
droit de suite
Folgerecht des Urhebers
diritto sulle successive vendite
volgrecht ten behoeve
and so on, and so on... your original "resale right" search is now (or at least could now be) MUCH more powerful, both in scope and in depth, independently from your specific linguistic knowledges.

Second example

Well, I see from your face that you are still unconvinced. Let's try again, let's imagine that we start a search for "web searching" on the europa server :-)
As you can see, using the s97 form above, we get (at the moment) as answer: Your search for ""web searching"" matched 3 of 1504180 documents. Uuhh, seems like the EU-bureaucrats haven't yet realized how imporant this is, try "tobacco" instead and you get 6000 documents :-(

The first document is: Privacy on the Internet - An integrated EU Approach to On-line Data Protection" which is called wp37en.pdf and has of course all its correspondent linguistic versions, that you'll obtain just changing -- inside your address bar -- the two-letters code en for english into another language code, as always, on the web, nomen est omen:
wp37de.pdf     wp37fr.pdf     wp37fi.pdf     wp37it.pdf     wp37es.pdf     wp37pt.pdf
wp37sv.pdf     wp37nl.pdf     wp37da.pdf     ...and even...     wp37el.pdf


Things are complicated by the silly use of pdf files, but we can quickly - and of course automatically - port them to -say- rtf format for our working needs :-)
Now pay attention: The snippet "web searching" is here in wp37en.pdf:
"Portal site
A portal site provides an overview of weblinks in an ordered way. Via the visited portal the Internet user can easily visit selected websites of other content providers. Modern portals are "supersites" that provide a variety of services including web searching, news, white and yellow pages directories, free e-mail, discussion groups, online shopping and links to other sites."

Note that this is part of a "GLOSSARY OF TECHNICAL TERMS" at the bottom of the document, a "bingo added value" per se, and that many more juicy terms await us there... yet since the first commandment when searching is to keep an iron concentration and do not, never, go astray (see tip 2), thus we'll remain on our worthy web searching path :-)

Let's see the german version:
"Portalseite
Eine Portalseite bietet in geordneter Form einen Überblick über die Web-Verknüpfungen. Über das besuchte Portal im Internet kann der Nutzer leicht ausgewählte Websites anderer Anbieter von Inhalten besuchen. Moderne Portale sind "übergeordnete Standorte" ("supersites"), die eine Vielzahl von Dienstleistungen bieten, etwa die Suche im Netz, Neuigkeiten, weiße und gelbe Telefonbücher (=Personenund Branchenverzeichnisse), freie E-Mail-Adressen, Diskussionsgruppen, "Online-shopping" und Links zu anderen Standorten."


Let's see the spanish version:
"Portal
Los portales proporcionan una vista general de los vínculos web de una manera ordenada. Pasando por un portal, el usuario de Internet puede visitar fácilmente otros sitios web seleccionados de otros proveedores de contenidos. Los portales modernos son "supersitios" que ofrecen una serie de servicios tales como búsqueda en la Web, noticias, guías de páginas blancas y amarillas, correo electrķnico gratuito, grupos de debate, compras en línea y vínculos con otros sitios.

And now let's see even the swedish version! I do not know swedish (unfortunately) and I did not know that in swedish web searching was "webbsökning":
"Portalplats
En portalplats erbjuder en ordnad översikt över webblänkar. Internet-användaren kan enkelt besöka andra innehållsleverantörers webbplatser via den besökta portalen. Moderna portaler är "superwebbplatser" som tillhandahåller en rad tjänster, exempelvis webbsökning, nyheter, vita och gula sidor, gratis e-post, diskussionsgrupper, inköp och länkar till andra webbplatser."

And why should that be important? Because now you can go regional and find all those darn elusive swedish seekers sites: webbsökning

Such is the slow, clever linguistic path of the überseekers :-)

Enjoy !!



Petit image

(c) 3rd Millennium: [fravia+], all rights reserved.
Copyright (in the sense of "the right to copy"): fravia 2004