Using the TIGERSearch treebank query program in Windows
Ciarán Ó Duibhín
TIGERSearch is a treebank query program developed by Wolfgang Lezius at the University of Stuttgart in the period 1999–2003. It was created for use with the TIGER German Treebank, but it can be used with treebanks (i.e. syntactically-analysed corpora) in several formats.
This document gives some information about setting up TIGERSearch in Windows. Since it was written, the Stuttgart webpages on TIGERSearch referenced above have been rewritten, and may render the present document superfluous.
The program is written in Java and requires a Java Runtime Environment to be installed on your computer. (I am using Java 6 Update 23 in Windows Vista Home Premium SP1.)
TIGERSearch 2.1.1 is best downloaded from the version held by Ciprian Gerstenberger at the University of Tromsø. Unless you want the program source, use the link there under Tools labelled TIGERSearch and TIGERRegistry [tar.gz], which link is reproduced here.
The download is an archive file called TIGERSearchTools.tar.tar It cannot be unpacked using WinZip (at least, not using version 9.0 SR-1), but can be unpacked by WinRAR (version 3.80). The contents consist of a folder TIGERSearchTools, which it is suggested be placed in your Program Files folder (the manual, which describes an earlier version, hints that Java programs may have problems with folder names containing a space, so you may wish to create a ProgramFiles folder for this purpose, but I have had no trouble with Program Files).
The unpacked folder TIGERSearchTools contains two .jar files (Java programs, for search and registry) and two .sh files (a batch file to run each program), as well as subfolders at several levels. Beyond preserving this folder structure during unpacking, no further installation is necessary. Uninstallation involves only removal of the unpacked folders and files.
The two .sh files should have their extensions changed to .bat for Windows. They can then be run, eg. by a double-click on an icon,
or from the command-line. Both however refused to run at first for me: runTSearch.bat reported:
The TIGERSearch configuration could not be loaded. Error in building: no protocol: tigersearch.dtd. A default configuration can be used instead. Continue? (I chose not to continue.)
while runTRegistry.bat reported:
Could not start TIGERRegistry due to error(s) reading the configuration file. Error in building: no protocol: tigerregistry.dtd.
Both the named files, tigersearch.dtd and tigerregistry.dtd, were present in subfolder config.
When tried again some days later. both .bat files worked and ran their respective programs. I have no explanation for these errors, which have not recurred.
The download already contains a considerable number of sample treebanks, in various languages, and the search program may be run immediately on any of these. The program contains a Help menu option; a User Manual can be downloaded from Stuttgart, and User Tutorials are available at Stuttgart. The information in all of these sources concerning TIGERSearch installation does NOT apply to the version recommended here for download.
It is suggested that the User Manual be placed in the TIGERSearchTools folder, and that Windows Start Menu shortcuts be created to the batch file for the Search Program, to the batch file for the Registry Program, and to the User Manual.
Any treebank not supplied along with TIGERSearch must be downloaded separately and registered. The treebank must be in a format for which a conversion exists, and conversion will take place in the course of registration.
The full TIGER Corpus may be obtained and registered. The non-commercial licence should be accepted, and this will lead to the download page. I downloaded Release 2.1 in the file tigercorpus2.1.zip, which contains the corpus in two formats, Negra export format (tiger_release_aug07.export) and TIGER-XML format (tiger_release_aug07.xml), as well as some documentation. Either format is acceptable for registration: I have used the Negra export format.
A simple query to the TIGER corpus, to verify that everything is working, might be (in textual mode): [word="Zimmer"] and then click on "Search". This query matches 12 graphs in the corpus.
The NEGRA Corpus can also be registered.
If you intend to use the Options/Software Preferences menu of the Registry program, eg. to increase the amount of memory available to the search or registry programs, you will need two files, TIGERSearch.lax and TIGERRegistry.lax, which are not included in the above download. They are included in the source downloads, or may be obtained directly from the links just given. They should be placed in a \bin subfolder of the folder containing the main TIGERSearch files (eg. C:\Program Files\TIGERSearchTools). When supplying the name of the corpora directory to Options/Software Preferences, give the absolute location, eg. C:\Program Files\TIGERSearchTools\CorporaDir We are advised to place such strings within double quotation marks, but this causes an error in the TIGERRegistry program; if the quotes are really necessary, the .lax files can be hand-edited.
Completely separate from TIGERSearch, Wolfgang Lezius has also written the Morphy morphological processor for German (1999). Morphy will try to assign a part-of-speech tag and lemma to each word of a German text. Installation in Windows is unproblematic — just run the installer. A manual is available.
This page is offered as a facility for corpus analysis on Windows. By using it, you are deemed to accept that the author bears no responsibility for any adverse consequences. Needless to say, he hopes that there will be no such consequences. He will be pleased to receive comments, but cannot promise to act upon them.