Using Babelfy for word sense disambiguation under Windows

Ciarán Ó Duibhín

Babelfy is a system for performing word-sense-disambiguation (WSD) — that is, assigning senses, in the form of BabelNet synset-IDs, to words of running text — in a number of languages.

It was created at the Sapienza Università di Roma by a group headed by Roberto Navigli.

For example, the BabelNet synset-ID for "dictionary" is "bn:00026967n" ("n" for noun).
(Compare WordNet synset-IDs for "dictionary": 04227476, 06333079, 06418901, 06430544 and 06482309 in WordNet 1.5, 2.1, 3.0, 3.1 and English WordNet 2020 respectively.)
The same synset-ID "bn:00026967n" should be assigned to "dictionnaire" in French text, or to "Wörterbuch" in German text, or to "словарь" in Russian text.

The difficulty in WSD lies in choosing the correct sense for a polysemous word, from the context. Babelfy is state-of-the-art for this task.

Babelfy can no longer (since version 1.0) be installed on a PC. Instead, the resources are held on a server, and are accessed via the internet, either in a web browser or by program.

Looking at Babelfy in a web browser.

The Babelfy guide page, http://babelfy.org/guide, contains a sample HTML file (in PHP, Javascript, Ruby and Python versions) which you can copy and save to a local file on your own machine. Opening this file in a web browser runs Babelfy on the sample text:

       BabelNet is both a multilingual encyclopedic dictionary and a semantic network

The Babelfy results for the recognized significant words will appear in the browser, in the form of a JSON document (shown here with whitespace and newlines added for readability):

       [{"tokenFragment":{"start":0,"end":0},
        "charFragment":{"start":0,"end":7},
        "babelSynsetID":"bn:03083790n",
        "DBpediaURL":"http://dbpedia.org/resource/BabelNet",
        "BabelNetURL":"http://babelnet.org/rdf/s03083790n",
        "score":1.0,
        "coherenceScore":0.6,
        "globalScore":0.09574468085106383,
        "source":"BABELFY"},

        {"tokenFragment":{"start":4,"end":4},
        "charFragment":{"start":19,"end":30},
        "babelSynsetID":"bn:00107021a",
        "DBpediaURL":"",
        "BabelNetURL":"http://babelnet.org/rdf/s00107021a",
        "score":0.0,
        "coherenceScore":0.0,
        "globalScore":0.0,
        "source":"MCS"},

        {"tokenFragment":{"start":5,"end":5},
        "charFragment":{"start":32,"end":43},
        "babelSynsetID":"bn:00102202a",
        "DBpediaURL":"",
        "BabelNetURL":"http://babelnet.org/rdf/s00102202a",
        "score":0.0,
        "coherenceScore":0.0,
        "globalScore":0.0,
        "source":"MCS"},

        {"tokenFragment":{"start":5,"end":6},
        "charFragment":{"start":32,"end":54},
        "babelSynsetID":"bn:02290297n",
        "DBpediaURL":"http://dbpedia.org/resource/Encyclopedic_dictionary",
        "BabelNetURL":"http://babelnet.org/rdf/s02290297n",
        "score":1.0,
        "coherenceScore":0.4,
        "globalScore":0.0425531914893617,
        "source":"BABELFY"},

        {"tokenFragment":{"start":6,"end":6},
        "charFragment":{"start":45,"end":54},
        "babelSynsetID":"bn:00026967n",
        "DBpediaURL":"http://dbpedia.org/resource/Dictionary",
        "BabelNetURL":"http://babelnet.org/rdf/s00026967n",
        "score":0.8823529411764706,
        "coherenceScore":1.0,
        "globalScore":0.3191489361702128,
        "source":"BABELFY"},

        {"tokenFragment":{"start":9,"end":9},
        "charFragment":{"start":62,"end":69},
        "babelSynsetID":"bn:00110347a",
        "DBpediaURL":"",
        "BabelNetURL":"http://babelnet.org/rdf/s00110347a",
        "score":1.0,
        "coherenceScore":0.2,
        "globalScore":0.010638297872340425,
        "source":"BABELFY"},

        {"tokenFragment":{"start":9,"end":10},
        "charFragment":{"start":62,"end":77},
        "babelSynsetID":"bn:02275757n",
        "DBpediaURL":"http://dbpedia.org/resource/Semantic_network",
        "BabelNetURL":"http://babelnet.org/rdf/s02275757n",
        "score":1.0,
        "coherenceScore":0.6,
        "globalScore":0.1276595744680851,
        "source":"BABELFY"},

        {"tokenFragment":{"start":10,"end":10},
        "charFragment":{"start":71,"end":77},
        "babelSynsetID":"bn:00057379n",
        "DBpediaURL":"",
        "BabelNetURL":"http://babelnet.org/rdf/s00057379n",
        "score":0.0,
        "coherenceScore":0.0,
        "globalScore":0.0,
        "source":"MCS"}]

You may of course edit your local copy of the HTML file, to try a different text or language.

Using Babelfy from a program. A simple Windows client.

To go beyond the demo sentence, the user should register at babelfy.org/guide#access. On registering, the user can log in, and will receive a key, in the form of a character-string, and a daily allowance of usage tokens. The key must be added to the url for further queries.

For practical use, we must be able to supply text of our own choice and specify its language, and to retrieve the Babelfy output in a program. Such program is provided at WinBabelfyClient.zip, both as an a Windows executable and as Delphi Pascal source. It will accept the input text, the language and the key from a file, and will show the output, and optionally copy it to a file. If you know how to program in Delphi Pascal you can modify the program source to perform additional processing in accordance with your needs. If not, you may be able to use another programming language to read the file output by this program and manipulate it.

The supplied program invites you to choose an input file.
The input file should be utf8-encoded (with byte-order-mark) and should contain:
• on the first line, your Babelfy key
• on the second line, a language code (e.g. EN for English, FR for French, DE for German, etc.)
• on as many subsequent lines as necessary, the text to be analysed.
The amount of text acceptable at one time varies with language,
currently from about 5669 characters for English to about 1439 characters for Russian.

The results from Babelfy will appear on the screen, suitably formatted.
You then have the choice of saving the results to an output file,
either as they appear on the screen, or in JSON format exactly as they are output by Babelfy.
The output file will be utf8-encoded, with byte-order-mark.

Here is the program output for the sample text:

Text: BabelNet is both a multilingual encyclopedic dictionary and a semantic network

Results for 8 objects.

  Object 0: "BabelNet"

     tokenFragment:
       start:0
       end:0
     charFragment:
       start:0
       end:7
     babelSynsetID:bn:03083790n
     DBpediaURL:http://dbpedia.org/resource/BabelNet
     BabelNetURL:http://babelnet.org/rdf/s03083790n
     score:1
     coherenceScore:0.6
     globalScore:0.0957446808510638
     source:BABELFY

  Object 1: "multilingual"

     tokenFragment:
       start:4
       end:4
     charFragment:
       start:19
       end:30
     babelSynsetID:bn:00107021a
     DBpediaURL:
     BabelNetURL:http://babelnet.org/rdf/s00107021a
     score:0
     coherenceScore:0
     globalScore:0
     source:MCS

   Object 2: "encyclopedic"

     tokenFragment:
       start:5
       end:5
     charFragment:
       start:32
       end:43
     babelSynsetID:bn:00102202a
     DBpediaURL:
     BabelNetURL:http://babelnet.org/rdf/s00102202a
     score:0
     coherenceScore:0
     globalScore:0
     source:MCS

   Object 3: "encyclopedic dictionary"

     tokenFragment:
       start:5
       end:6
     charFragment:
       start:32
       end:54
     babelSynsetID:bn:02290297n
     DBpediaURL:http://dbpedia.org/resource/Encyclopedic_dictionary
     BabelNetURL:http://babelnet.org/rdf/s02290297n
     score:1
     coherenceScore:0.4
     globalScore:0.0425531914893617
     source:BABELFY

   Object 4: "dictionary"

     tokenFragment:
       start:6
       end:6
     charFragment:
       start:45
       end:54
     babelSynsetID:bn:00026967n
     DBpediaURL:http://dbpedia.org/resource/Dictionary
     BabelNetURL:http://babelnet.org/rdf/s00026967n
     score:0.882352941176471
     coherenceScore:1
     globalScore:0.319148936170213
     source:BABELFY

   Object 5: "semantic"

     tokenFragment:
       start:9
       end:9
     charFragment:
       start:62
       end:69
     babelSynsetID:bn:00110347a
     DBpediaURL:
     BabelNetURL:http://babelnet.org/rdf/s00110347a
     score:1
     coherenceScore:0.2
     globalScore:0.0106382978723404
     source:BABELFY

   Object 6: "semantic network"

     tokenFragment:
       start:9
       end:10
     charFragment:
       start:62
       end:77
     babelSynsetID:bn:02275757n
     DBpediaURL:http://dbpedia.org/resource/Semantic_network
     BabelNetURL:http://babelnet.org/rdf/s02275757n
     score:1
     coherenceScore:0.6
     globalScore:0.127659574468085
     source:BABELFY

   Object 7: "network"

     tokenFragment:
       start:10
       end:10
     charFragment:
       start:71
       end:77
     babelSynsetID:bn:00057379n
     DBpediaURL:
     BabelNetURL:http://babelnet.org/rdf/s00057379n
     score:0
     coherenceScore:0
     globalScore:0
     source:MCS

Using Babelfy from the Java environment.

For users of the Java environment, there is, on the Babelfy website, a client program written in Java.
If Java is not already installed on your Windows machine, you should download and install Java 1.7 JDK or later.

The Babelfy Java client may be downloaded from http://babelfy.org/data/BabelfyAPI-1.0.zip and unzipped, typically to Program Files (x86) with "use folder names" ticked.
This will extract the files to Program Files (x86)\Babelfy-online-API-1.0 and subdirectories.
Documentation will be placed in Program Files (x86)\Babelfy-online-API-1.0\docs.

Now, edit Program Files (x86)\Babelfy-online-API-1.0\config\babelfy.var.properties to add your key value to the only line in the file, immediately following the = sign.

Edit Program Files (x86)\Babelfy-online-API-1.0\run-babelfydemo.bat, by adding an extra line "pause" at the end.

Double-clicking this .bat file in Windows runs Babelfy in a command-line window, and produces an attenuated form of the sample output above. The output of the client is shown below. The sample text appears to be built-in to the demo, and to be unalterable, as source is not provided.



Disclaimer

This page is offered as a facility for corpus analysis on Windows.  By using it, you are deemed to accept that the author bears no responsibility for any adverse consequences.  Needless to say, he hopes that there will be no such consequences.  He will be pleased to receive comments, but cannot promise to act upon them.


Ciarán Ó Duibhín
Úraithe 2020/07/16
Clár cinn / Home page / Page d'accueil / Hauptseite / Главная страница