Response by Caoimhín Ó Donnaíle to Nominet consultation on IDNs - 2005-08-25

I teach computing at Sabhal Mòr Ostaig, a small college on the island of Skye which offers courses up to University honours-degree level through the medium of Scottish Gaelic. I maintain much of the college website (www.smo.uhi.ac.uk), which is one of the main resources on the Internet for Scottish Gaelic material and is also a major resource for Irish Gaelic. I also have some knowledge of Manx Gaelic and Welsh, and a lesser knowledge of other languages.

Question 1: Do you think the four primary options encompass the possibilities?

I cannot think of any other major options worth considering.

Question 2: Which of the options do you think Nominet should adopt, and why?

Certainly not Option 1 ("Do nothing").

I do not believe that there is any major difference between options 2, 3 and 4, because I believe that on the one hand support for IDNs will be simple and easy for Nominet to provide, and that on the other hand, applicants, given a few simple pointers, could easily find the ACE code they require and register that. I think that Nominet should go for Option 4, but if it were to go for Option 2 and plan to add the other facilities later at its leisure I do not think that that would be a problem.

Question 3: Are you aware of, or do you have experience of, any legal issues that could affect Nominet's policy choice as regards IDNA?

The strongest legislation, both in theory and practice, relates to the Welsh language. In the past, the Welsh language has also had the greatest technical obstacles to the use of the accents it requires (see Question 6), but these are now disappearing very fast with the global implementation of Unicode (see Question 4), so that the full impact of the Welsh legislation can be expected to be increasingly felt in regard to support for accented characters.

The Welsh Language Act 1993 has been interpreted as requiring public bodies and agencies operating in Wales to have a Welsh language "identity" as well as an English language identity, where "identity" includes such things as logos, headed paper, official name in Welsh, and Welsh domain name. In its Illustrated Handbook for Web Management Teams (May 2002), the UK Government advises its departments and agencies in Section 1.10.6 Legal issues - Welsh Language Act that:

"Ministers have committed government departments and agencies to introducing Welsh language schemes under the Welsh Language Act 1993. [...] Departments and agencies should apply the principles of their language schemes [...] to their Internet sites. The aim should be that the public has access to information in both English and Welsh in line with the commitments in schemes."

In another document advising departments and agencies on domain names, its advice on the Welsh Language is that:

"Welsh Language Board advise that departments, councils and agencies providing a service to the public in Wales should, where there is a difference between their Welsh and English names, consider registering the Welsh language equivalent. For example: www.anglesey.gov.uk and www.ynysmon.gov.uk"

This advice has been widely implemented. Some examples of parallel English/Welsh domains are given below (with the correct Welsh spelling, were this available, shown on the bottom line):

www.wales.gov.uk
www.cymru.gov.uk
www.countryside.wales.gov.uk
www.cefngwlad.cymru.gov.uk
www.walesontheweb.org
www.cymruarywe.org
www.cymruarywê.org
www.richardcommission.gov.uk
www.comisiwnrichard.gov.uk
www.anglesey.gov.uk
www.ynysmon.gov.uk
www.ynysmôn.gov.uk
www.pembrokeshire.gov.uk
www.sir-benfro.gov.uk
www.ruthin-wales.co.uk
www.rhuthun-cymru.co.uk
www.cardiff.ac.uk
www.caerdydd.ac.uk

There are of course numerous other examples of organisations with Welsh names whose domain name is "missing" an accent. For example:

It seems to me that Nominet could be given notice under the The Welsh Language Act 1993 to prepare a Welsh Language Scheme. Since this is a Westminster act, not a Welsh Assembly act, it applies to bodies based outside Wales as well as to those based inside Wales. The Act states that

In addition to the Welsh Language Act 1993, other Acts and treaties of relevance are:

Bwrdd yr Iaith Gymraeg (Welsh), Bòrd na Gàidhlig (Scottish Gaelic), Foras na Gaeilge (Irish Gaelic, crossborder) and the UK Committee of EBLUL (UK indigenous minority languages in general) will be able to offer advice on language related legislation and its current workings. Legislation, of course, needs to be interpreted in the context of what is technically and practically feasible.

I have no doubt that Nominet, as the monopoly supplier of most of the .uk domain space, has an obligation to support the requirements of all indigenous UK languages - and possibly also the main immigrant community languages. When only ascii characters were possible in domain names, this obligation clearly could not be taken to include non-ascii characters, but now that the standards for IDNs are in place and are being implemented worldwide, lack of support for them by Nominet would increasingly be seen as providing a substandard service to users of languages other than English. Whether this obligation would be interpreted legally as an immediate obligation on Nominet, or else as an obligation on the Government to oblige Nominet to act, I cannot say. But in either case its impact will be increasingly felt.

Question 4: Do you have any other comments on the options and their advantages and disadvantages?

The computing world is currently in the middle of what I believe is one of the most significant changes and simplifications in its history, namely the changeover to the universal character set, "Unicode" (ISO 10646) (generally implemented on the Internet as "utf-8"). This removes many of the major "fault lines" which had existed up til now in computing - between the Macintosh world and the PC/Unix world; between Western Europe and Eastern Europe, Cyrillic, Greek, Japanese, and so on. Much of the changeover has already taken place without computer users or even many computer professionals being aware of it happening. Copying and pasting Russian, Greek, Arabic or Chinese words from web pages to Google, for example, simply "works". The simplification which Unicode brings is so great that I expect that the changeover will be complete for any maintained software or data within a couple of years at most. Character code difficulties will be a thing of the past. Everything will simply work. Users will be used seeing and handling a wide range of characters, and rather than thinking of them as anything unusual, they will object to any restrictions and demand to know the reason for them. The iso-8859-1 character set which has been the backbone of Western European language computing (except Welsh) for the past ten or fifteen years will be a thing of the past.

I see that the pages of the Nominet website are still encoded as iso-8859-1 (as ours were until a few weeks ago). Nominet ought to convert everything - webpages and database fields - to utf-8. This should be very easy to do. When I converted our website with several thousand pages and all our dictionary and terminology databases to utf-8, I was amazed at how simple it was and how few problems there were. This is something which Nominet ought to be doing anyway, irrespective of IDNs, and the time is now ripe. Indeed, any serious website which will be maintained in the future ought to be converting to utf-8. Another requirement is to ensure that staff have the facilities to work in utf-8, but this is merely a case of ensuring that operating systems and software are up-to-date.

With this groundwork done, the extra tools to turn Option 2 into Options 3 or 4 - punycode conversion, for example - should be trivial to provide, and lots of other TLDs already have them so a modicum of cooperation would save Nominet the work of reimplementing them.

Question 5: If Nominet adopts IDNA, should it restrict the domain names that can be registered (either in terms of the characters allowed in domain names, or otherwise) over and above the limitations of IDNA itself?

Yes, probably. See the following question.

Question 6: If so, what restrictions should be applied, and why?

Nominet should allow, at the absolute minimum, the characters required for UK indigenous languages, which in effect means Scottish Gaelic, Irish Gaelic, and Welsh (see below). At the other end of the spectrum of possibilities, I think it would be wise to restrict the characters of a domain name label to all belong to the same script (as per ISO 15925), as I can think of no good reason for an applicant wishing to mix scripts and the restriction would cut down the scope for phishing. As to where the decision line should be drawn between these two extremes - whether characters for, say, French, Greek, Urdu, Arabic or Chinese should be allowed in .uk, I have no particular personal opinion nor expertese, other than a feeling that it would be nice to be liberal and that it would probably be as easy to be fairly liberal as not to be. Languages do not respect national boundaries. A rolling plan for increasing liberalisation as Nominet gains experience is one possible strategy.

There should be no restriction (other than during the sunrise period - Question 9) for an unaccented version of a domain name to belong to the same registrant as an accented version, since accents can change the meaning of words and are not just optional decorations. More on this below, but first a review of the UK's indigenous languages, with the proviso that that I have no great expertese other than in Scottish Gaelic and Irish.

Irish Gaelic is alive and thriving in Northern Ireland, which is part of the UK. Its only requirement is the acute accent on all five vowels, á, é, í, ó, ú.

(In the spelling system which was most commonly used up until the 1950s, a diacritic "dot-above" was placed on the consonants b,c,d,f,g,m,p,s,t instead of following them with a letter 'h' as is done nowadays (rather akin to the way in which an umlaut diacritic is an alternative to following a vowel with a letter 'e' in German). This was rather a good system, but it had all but completely disappeared in modern use until it underwent a slight revival in recent years, mostly for decorative purposes, with the increased availability of computer fonts. Under ISO 15925, it has a script to itself, Latg. Although I am quite keen on it myself as a method of writing Irish, I think it is a complication which both Irish speakers and Nominet can do without at the moment in domain names. It may be something for the future, and will come anyway under the most liberal scenario.)

Scottish Gaelic requires the grave accent on all five vowels, à, è, ì, ò, ù. Prior to a spelling reform, GOC introduced in 1981 by the schools examination board, it also had acute accents on some of the vowels. Quite a few people, myself included, still use and prefer the pre-reform accents, but since acute accents are covered anyway by the Irish Gaelic requirement, this issue can be ignored.

Welsh's most important requirement is the circumflex diacritic, and the important thing here is that it features not only on the five "English" vowels, â, ê, î, ô, û, but also on the semi-vowels, ŵ and ŷ. The diaeresis is less common than the circumflex, but it is also required for correct spelling, and it also features on all seven Welsh vowels, ä, ë, ï, ö, ü, , ÿ. I am no expert on Welsh, but I believe that it also requires the grave and the acute accent, even though these are rarer than the circumflex and diaerisis. In fact, its full requirements are for all four diacritics - circumflex, diaerisis, grave and acute - on all seven vowels - a, e, i, o, u, w, y, as can be seen from the Unicode character names list. Thus it can be seen that Welsh is the most demanding of the indigenous UK languages, and that its full requirements subsume those of Scottish Gaelic and Irish Gaelic. Welsh has had a rather raw deal in the past. Its requirements, even for the important ŵ and ŷ, were not covered by any of the original parts of ISO 8859 - It was quite simply forgotten - and so its characters were often missing from fonts. However, with the coming of Unicode this will now be history and its lot will improve.

Cornish was all but dead for over a hundred years, but has been revived and now has several hundred fluent speakers. There are several slightly different spelling systems. Of the main two, Kernewek Kemmyn (Common Cornish) does not seem to use accented characters. The other, Kernowek Ünys Amendys (UCR) seems to use the set: ā, ē, ō, ū, ü, ȳ. (The latter two characters are written ű and ý in the Omniglot article but this seems to be a mistake.) However, these seem to be rather optional: "Note that any macron (symbol shown above a letter) is only a guide to help pronunciation and should never be written". Thus it seems unlikely that Cornish speakers will wish to use accented characters in domain names, at least not for the moment.

I mention Manx and Breton in passing to complete the tour of the six Celtic language. However, they are not UK languages and so Nominet has no special responsibility towards them. The Isle of Man is not even part of the EC, and it has its own top-level domain, .im, The only special requirement of Manx is ç, and this will be covered if Nominet decides to support French. Breton has various slightly different spelling systems and requires a variety of characters. Of these ñ is very common, ù is common in plurals, and the others (ê, ü, ...) are rarer.. They would seem to be all covered if Nominet decides to support Spanish and French. The main complication with Breton names is that they often feature apostrophes in the middle, since c'h is regarded as a separate "character", distinct from ch (to distinguish the Scottish/German "ch" sound from the French "ch" sound). However, apostrophes, unless I am mistaken, are not allowed in IDNA, so the Bretons will just have to do without them in domain names.

In passing, I also mention French and Jèrriais (the dialect of French spoken in Jersey) since they are spoken in the Channel Islands. However, the Channel Islands are not part of the UK (nor even of the EC), and they have their own top level domains, .je for Jersey and .gg for Guernsey, so Nominet has no special responsibility towards them. If Nominet decides to support French, it seems that this will cover Jèrriais also.

Scots certainly is a UK indigenous language, so Nominet has a special responsibility towards it, and also to Ullans, (or "Ulster-Scots") the form of Scots spoken in Northern Ireland. However, they seem to have no very settled spelling system. Although various diacritics are sometimes used on vowels to aid the link between spelling and pronunciation, I do not believe that there is sufficient standardisation that there would be much desire to have them in domain names (and if there were, they would be more than covered by the requirements of Welsh). Middle Scots made use of the letter yogh, ȝ (which was also used in Middle English), and this is the origin of the 'z' in words such as Menzies, Dalzell, capercailzie. I do not think, though, that it would be used in modern spelling or required for domain names.

Michael Everson's webpages, The Alphabets of Europe give a comprehensive and voluminous survey of the character set requirements of virtually all European langauges. His character set lists generally tend towards inclusivity and may include characters which are fairly rare in the language.

I stated that I do not believe that domain names differing only in accents should be restricted to belong to the same registrant. This is because accents are not just optional decorations - they can radically change the pronunciation and meaning of words. In Irish Gaelic, "léann" means academic learning, whereas "leann" means beer. In Scottish Gaelic these days, a "fèis" is a summer school/festival with classes in music, dance, song and Gaelic, mostly for older children, whereas "feis" means sexual intercourse. To further support this contention, I have compiled a short list of examples of pairs of Scottish Gaelic and Irish Gaelic words whose meaning is radically altered by the presence or absence of an accent. The same is also true in Welsh: "Llŷn" is the Llŷn peninsula, whereas "llyn" means a lake.

To give a few putative examples, the Isle of Lewis branch of the Lord's Day Observance Society might wish to register the domain name sàbaid.org.uk, whereas a Gaelic speaking war-games club might wish to register sabaid.org.uk. If the Lord's Day Observance Society were to register sabaid.org.uk it would be regarded as hilarious, because the pronunciation would be all wrong and because there has been a lot of nasty infighting leading to a major schism in the Free Church in recent years. In Northern Ireland, a hay-grower's association might want to register féar.org.uk, while a Gaelic-speaking men's group might want to register fear.org.uk (if it had not already been taken by an English-speaking phobia support group).

I believe that that domain names differing only in accents should be allowed to belong to different owners even though there are cases of Gaelic words where the spelling has not been completely standardised and where the presence or absence of an accent would not change the pronunciation or meaning. There would be little point in trying to police such things without also trying to police pre-/post- spelling-reform variants in both Irish and Scottish Gaelic, or variants such as encyclopedia/encyclopaedia, standardise/standardize, nightclub/niteclub in English - an impossible task. Such issues are best left to the Dispute Resolution Service.

Implementing ownership restrictions on domains differing only in accents would also, I guess, require Nominet to make some rather complicated and burdensome system changes - to ensure that the restrictions were enforced even when domains changed expiry status or changed ownership or ownership details. I note that the .de registry makes no such restrictions, and this seems to be the trend.

Question 7: Are there other forms of support for Internationalised Domain Names that you would like to see Nominet provide?

Not really, but I did manage to think of one minor facility which it would be nice to have, namely an "accent-independent" option in the Whois service. I see that Nominet's Whois service is very limited at present - It does not seem to have a facility for wildcard searches. Perhaps this is deliberate, to increase confidentiality. But if not, and if a richer facility were provided in this future, it would be nice if this had an "ignore accents" checkbox. If ticked, a search for "bord*" would come up then with "bòrd-na-gàidhlig.org.uk" as well as "bord-na-gaidhlig.org.uk". (Accent-independent search is, by the way, easy to implement in a mySQL database)

Question 8: For each of the IDN support facilities mentioned in this section, and also any you suggest in answer to question 7, should they be provided regardless of cost, not be provided at all, or be provided only if cheap and simple to do?

Nothing should ever be provided "regardless of cost", but I believe that it will all be very cheap and simple to do.

Question 9: How should the sunrise period operate, and why?

I do not think that there will be any huge initial rush for IDNs in .uk, so I do not think that the sunrise period will be a major issue. It would be good, I think, if there were an initial period - perhaps two weeks, perhaps a month - in which only only owners of existing domains were allowed to register accented versions of their domain names. There could be an additional initial charge during this period due to some manual checking which might be required. After that, IDN registrations and renewals should operate like any other registrations and renewals.

Question 10: Do you think any of these dispute resolution issues are a significant problem?

No. The issues of similarity of domain names are no different in nature from those which have already existed for very many years. Consider, for example:

novell.co.uk | novel.co.uk | nove1l.co.uk
float.co.uk | f1oat.co.uk
microsoft.co.uk | micr0soft.co.uk
encyclopedia.co.uk | encyclopaedia.co.uk | encyclopaeclia.co.uk | encydopedia.co.uk
sanitise.co.uk | sanitize.co.uk

It is very important to have a Dispute Resolution Service which is functioning well - whether or not we have IDNs. Unicode Technical Report #36 gives good advice on minimising any potential problems in country-specific IDNs.

I think it would be reasonable if the application form for domain names were to include, in the case of IDNs, a textbox requiring a note of the language in question (e.g. Welsh, Scottish Gaelic, Irish Gaelic, etc.), and on the connection if any between the applicant and the language/domain-name. It would be explained that this was not for day-to-day policing, but so that appropriate expertese could be called in by the dispute resolution service if necessary, and that any falsification would be regarded as strong evidence of bad-faith in the dispute resolution procedure. I think that the requirement to fill in this box would by itself act as a significant deterrent against abuse.

Question 11: Are there any other ways in which IDNA will affect dispute resolution and the DRS?

The DRS should compile a list of language bodies who could offer advice when any language-related questions arose in disputes. I expect that they would often be prepared to advise free of charge in the case of simple queries requiring little staff time, such as whether or not a word was genuine and meant what the applicant said it meant.

Question 12: Would you expect to use IDNs as your primary domain, or would you only be expecting to register them for "defensive" reasons?

As an educational institution, Sabhal Mòr Ostaig is not in the business of registering large numbers of domains and its primary domain is and will remain "smo.uhi.ac.uk". However, we do register domains for particular projects and will certainly want to register some domains with Scottish Gaelic names using accented characters (and I may well wish to register one or two myself as a private individual). Initially, this would only be domains to which we wished to attach a particularly "Gaelic" image, but as users become more used to IDNs, we would expect to extend usage.

We would register any "real" variants, not just defensively but for redirection and possible use in the future, but would not paranoically register meaningless variants. I am sure, for example, that our offshoot company, Cànan Ltd, would wish to register cànan.co.uk in addition to the canan.co.uk which they already have, but that they would not wish to register meaningless or unrelated variants such as cânan.co.uk or canàn.co.uk.

Question 13: Are there any other issues relating to IDNA that should be taken into account, or that you want to mention?

No. I think that's it. Thank you for this consultation.

Caoimhín Ó Donnaíle
2005-08-25