Creating bilingual phrasebooks from a multilingual XML database

This is very old, and probably not very useful.

SUMMARY: This example solves the problem of producing bilingual phrasebooks for multiple markets, working from a static XML database and extracting the relevant data using XSLT.

Figure 1 shows, for three languages (English, Albanian & Dutch), three words ("cow", "house" and "milk"). It also shows how to build up the title of the book in each of the languages, with a simple rule for building up the title. This rule states whether the word "Dictionary" comes before or after the language pair in the title. For an English - Albanian/Albanian - English pair of phrasebooks, this would be manifest as English - Albanian Dictionary and Fjalor Shqip - Anglisht.

Figure 2 shows an XSLT script that formats the data for output as text, given three parameters:

  1. the source language
  2. the destination language
  3. the language of the intended market.

Note that it makes heavy use of the lang() function. If ISO 639 standard language names are used, then the database can be extended to include other languages, and the XSLT script need not be changed. The output is also sorted in a language-specific manner.

Figures 3, 4, 5, 6 and 7 show sample output from Michael Kay's Saxon XSLT processor on a Linux box. Be advised that other XSLT engines have a different way of passing parameters, so you will have to modify the command lines accordingly.

Update, 5 May 02003: Daniel Biddle has helpfully pointed out that I'm misusing the xml:lang attribute in Figure 1 when I say:

         <instance xml:lang="sq">Albanian</instance>

The content of this element should, of course, be in Albanian. As this proof-of-concept does what I meant it to, the correct solution is left to the reader … ☺


Figure 1: lexicon.xml

<?xml version="1.0" encoding="ISO-8859-1"?>
<lexicon>
<!--

 lexicon - uses ISO 639 language names to choose language pair:

        en      English
        sq      Albanian
        nl      Dutch

 (There must be better element names than "instance"...)

 Stewart C. Russell <scruss@bigfoot.com> - 23/07/02001

 -->
   <head>
      <title placement="after" xml:lang="en">Dictionary</title>
      <title placement="before" xml:lang="sq">Fjalor</title>
      <title placement="after" xml:lang="nl">Woordenboek</title>
      <langname xml:lang="en">
         <instance xml:lang="en">English</instance>
         <instance xml:lang="sq">Albanian</instance>
         <instance xml:lang="nl">Dutch</instance>
      </langname>
      <langname xml:lang="sq">
         <instance xml:lang="en">Anglisht</instance>
         <instance xml:lang="sq">Shqip</instance>
         <instance xml:lang="nl">Holandez</instance>
      </langname>
      <langname xml:lang="nl">
         <instance xml:lang="en">Engels</instance>
         <instance xml:lang="sq">Albanees</instance>
         <instance xml:lang="nl">Nederlands</instance>
      </langname>
   </head>
   <body>
      <word>
         <instance xml:lang="en">cow</instance>
         <instance xml:lang="sq">lopë</instance>
         <instance xml:lang="nl">koe</instance>
      </word>
      <word>
         <instance xml:lang="en">house</instance>
         <instance xml:lang="sq">shtëpi</instance>
         <instance xml:lang="nl">huis</instance>
      </word>
      <word>
         <instance xml:lang="en">milk</instance>
         <instance xml:lang="sq">qumësht</instance>
         <instance xml:lang="nl">melk</instance>
      </word>
   </body>
</lexicon>

Figure 2: lexicon.xsl

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 version="1.0">

<!--
 creates a SOURCE - DEST lexicon for MARKET
 Stewart C. Russell <scruss@bigfoot.com> - 23/07/02001

usage example:
 saxon lexicon.xml lexicon.xsl source=en dest=sq market=en
-->

   <xsl:param name="source"/>
   <xsl:param name="dest"/>
   <xsl:param name="market"/>

   <xsl:output encoding="ISO-8859-1" method="text"/>

   <xsl:template match="*">
      <xsl:apply-templates/>
   </xsl:template>

   <xsl:template match="head">
      <xsl:variable name="langpair"
        select="concat(langname[lang($market)]/instance[lang($source)], 
        ' - ', langname[lang($market)]/instance[lang($dest)])"/>
      <xsl:variable name="bookname">
         <xsl:choose>
            <xsl:when test="title[lang($market)][@placement='before']">
               <xsl:value-of select="concat(title[lang($market)], ' ', 
                 $langpair)"/>
            </xsl:when>
            <xsl:when test="title[lang($market)][@placement='after']">
               <xsl:value-of select="concat($langpair, ' ', 
                 title[lang($market)])"/>
            </xsl:when>
            <xsl:otherwise/>
         </xsl:choose>
      </xsl:variable>
      <xsl:text>&#x0A;</xsl:text>
      <xsl:value-of select="$bookname"/>
      <xsl:text>&#x0A;</xsl:text>
   </xsl:template>

<xsl:template match="body">
 <xsl:for-each select="word">
 <xsl:sort select="instance[lang($source)]"/>
 <xsl:text>&#x0A;</xsl:text>
 <xsl:value-of select="concat(instance[lang($source)], ': ', 
   instance[lang($dest)])"/>
 <xsl:text>&#x0A;</xsl:text>
 </xsl:for-each>
</xsl:template>

</xsl:stylesheet>

Figure 3: English - Albanian Dictionary for the English market

$ saxon lexicon.xml lexicon.xsl source=en dest=sq market=en           
   
English - Albanian Dictionary

cow: lopë

house: shtëpi

milk: qumësht

Figure 4: Albanian - English Dictionary for the English market

$ saxon lexicon.xml lexicon.xsl source=sq dest=en market=en
   
Albanian - English Dictionary

lopë: cow

qumësht: milk

shtëpi: house

Figure 5: Albanian - English Dictionary for the Albanian market

$ saxon lexicon.xml lexicon.xsl source=sq dest=en market=sq
   
Fjalor Shqip - Anglisht
   
lopë: cow

qumësht: milk

shtëpi: house

Figure 6: Dutch - English Dictionary for the Dutch market

$ saxon lexicon.xml lexicon.xsl source=nl dest=en market=nl
   
Nederlands - Engels Woordenboek
   
huis: house

koe: cow

melk: milk

Figure 7: English - Dutch Dictionary for the English market

$ saxon lexicon.xml lexicon.xsl source=en dest=nl market=en
   
English - Dutch Dictionary
   
cow: koe

house: huis

milk: melk

Copyright © 02003–02006 Stewart C. Russell, Scarborough, Ontario.

Updated 25 June 02006