Using Babelfy for word sense disambiguation under Windows
Ciarán Ó Duibhín
Babelfy is a system for performing
word-sense-disambiguation (WSD) — that is, assigning senses, in the form of BabelNet synset-IDs,
to words of running text — in a number of languages.
It was created at the Sapienza Università di Roma by a group headed by Roberto Navigli.
For example, the BabelNet synset-ID for "dictionary" is "bn:00026967n" ("n" for noun).
(Compare WordNet synset-IDs for "dictionary": 04227476, 06333079, 06418901, 06430544 and 06482309 in WordNet 1.5, 2.1, 3.0, 3.1 and English WordNet 2020 respectively.)
The same synset-ID "bn:00026967n" should be assigned to "dictionnaire" in French text, or to "Wörterbuch" in German text, or to "словарь" in Russian text.
The difficulty in WSD lies in choosing the correct sense for a polysemous word, from the context. Babelfy is state-of-the-art for this task.
Babelfy can no longer (since version 1.0) be installed on a PC. Instead, the resources are held on a server, and are accessed via the internet, either in a web browser or by program.
The Babelfy guide page, http://babelfy.org/guide, contains a sample HTML file (in PHP, Javascript, Ruby and Python versions)
which you can copy and save to a local file on your own machine. Opening this file in a web browser runs Babelfy on the sample text:
BabelNet is both a multilingual encyclopedic dictionary and a semantic network
The Babelfy results for the recognized significant words will appear in the browser, in the form of a JSON document (shown here with whitespace and newlines added for readability):
[{"tokenFragment":{"start":0,"end":0},
"charFragment":{"start":0,"end":7},
"babelSynsetID":"bn:03083790n",
"DBpediaURL":"http://dbpedia.org/resource/BabelNet",
"BabelNetURL":"http://babelnet.org/rdf/s03083790n",
"score":1.0,
"coherenceScore":0.6,
"globalScore":0.09574468085106383,
"source":"BABELFY"},
{"tokenFragment":{"start":4,"end":4},
"charFragment":{"start":19,"end":30},
"babelSynsetID":"bn:00107021a",
"DBpediaURL":"",
"BabelNetURL":"http://babelnet.org/rdf/s00107021a",
"score":0.0,
"coherenceScore":0.0,
"globalScore":0.0,
"source":"MCS"},
{"tokenFragment":{"start":5,"end":5},
"charFragment":{"start":32,"end":43},
"babelSynsetID":"bn:00102202a",
"DBpediaURL":"",
"BabelNetURL":"http://babelnet.org/rdf/s00102202a",
"score":0.0,
"coherenceScore":0.0,
"globalScore":0.0,
"source":"MCS"},
{"tokenFragment":{"start":5,"end":6},
"charFragment":{"start":32,"end":54},
"babelSynsetID":"bn:02290297n",
"DBpediaURL":"http://dbpedia.org/resource/Encyclopedic_dictionary",
"BabelNetURL":"http://babelnet.org/rdf/s02290297n",
"score":1.0,
"coherenceScore":0.4,
"globalScore":0.0425531914893617,
"source":"BABELFY"},
{"tokenFragment":{"start":6,"end":6},
"charFragment":{"start":45,"end":54},
"babelSynsetID":"bn:00026967n",
"DBpediaURL":"http://dbpedia.org/resource/Dictionary",
"BabelNetURL":"http://babelnet.org/rdf/s00026967n",
"score":0.8823529411764706,
"coherenceScore":1.0,
"globalScore":0.3191489361702128,
"source":"BABELFY"},
{"tokenFragment":{"start":9,"end":9},
"charFragment":{"start":62,"end":69},
"babelSynsetID":"bn:00110347a",
"DBpediaURL":"",
"BabelNetURL":"http://babelnet.org/rdf/s00110347a",
"score":1.0,
"coherenceScore":0.2,
"globalScore":0.010638297872340425,
"source":"BABELFY"},
{"tokenFragment":{"start":9,"end":10},
"charFragment":{"start":62,"end":77},
"babelSynsetID":"bn:02275757n",
"DBpediaURL":"http://dbpedia.org/resource/Semantic_network",
"BabelNetURL":"http://babelnet.org/rdf/s02275757n",
"score":1.0,
"coherenceScore":0.6,
"globalScore":0.1276595744680851,
"source":"BABELFY"},
{"tokenFragment":{"start":10,"end":10},
"charFragment":{"start":71,"end":77},
"babelSynsetID":"bn:00057379n",
"DBpediaURL":"",
"BabelNetURL":"http://babelnet.org/rdf/s00057379n",
"score":0.0,
"coherenceScore":0.0,
"globalScore":0.0,
"source":"MCS"}]
You may of course edit your local copy of the HTML file, to try a different text or language.
To go beyond the demo sentence, the user should register at babelfy.org/guide#access. On registering, the user can log in, and will receive a key, in the form of a character-string, and a daily allowance of usage tokens. The key must be added to the url for further queries.
For practical use, we must be able to supply text of our own choice and specify its language, and to retrieve the Babelfy output in a program. Such program is provided at WinBabelfyClient.zip, both as an a Windows executable and as Delphi Pascal source. It will accept the input text, the language and the key from a file, and will show the output, and optionally copy it to a file. If you know how to program in Delphi Pascal you can modify the program source to perform additional processing in accordance with your needs. If not, you may be able to use another programming language to read the file output by this program and manipulate it.
The supplied program invites you to choose an input file.
The input file should be utf8-encoded (with byte-order-mark) and should contain:
• on the first line, your Babelfy key
• on the second line, a language code (e.g. EN for English, FR for French, DE for German, etc.)
• on as many subsequent lines as necessary, the text to be analysed.
The amount of text acceptable at one time varies with language,
currently from about 5669 characters for English to about 1439 characters for Russian.
The results from Babelfy will appear on the screen, suitably formatted.
You then have the choice of saving the results to an output file,
either as they appear on the screen, or in JSON format exactly as they are output by Babelfy.
The output file will be utf8-encoded, with byte-order-mark.
Here is the program output for the sample text:
Text: BabelNet is both a multilingual encyclopedic dictionary and a semantic networkFor users of the Java environment, there is, on the Babelfy website, a client program written in Java.
If Java is not already installed on your Windows machine, you should download and install Java 1.7 JDK or later.
The Babelfy Java client may be downloaded from http://babelfy.org/data/BabelfyAPI-1.0.zip
and unzipped, typically to Program Files (x86) with "use folder names" ticked.
This will extract the files to Program Files (x86)\Babelfy-online-API-1.0 and subdirectories.
Documentation will be placed in Program Files (x86)\Babelfy-online-API-1.0\docs.
Now, edit Program Files (x86)\Babelfy-online-API-1.0\config\babelfy.var.properties to add your key value to the only line in the file, immediately following the = sign.
Edit Program Files (x86)\Babelfy-online-API-1.0\run-babelfydemo.bat, by adding an extra line "pause" at the end.
Double-clicking this .bat file in Windows runs Babelfy in a command-line window, and produces an attenuated form of the sample output above. The output of the client is shown below. The sample text appears to be built-in to the demo, and to be unalterable, as source is not provided.
This page is offered as a facility for corpus analysis on Windows. By using it, you are deemed to accept that the author bears no responsibility for any adverse consequences. Needless to say, he hopes that there will be no such consequences. He will be pleased to receive comments, but cannot promise to act upon them.