Updated 25 July 02001

Creating bilingual phrasebooks from a multilingual XML database

(Could be better documented, I'm afraid.)

SUMMARY: This example solves the problem of producing bilingual phrasebooks for multiple markets, working from a static XML database and extracting the relevant data using XSLT.

Figure 1 shows, for three languages (English, Albanian & Dutch), three words ("cow", "house" and "milk"). It also shows how to build up the title of the book in each of the languages, with a simple rule for building up the title. This rule states whether the word "Dictionary" comes before or after the language pair in the title. For an English - Albanian/Albanian - English pair of phrasebooks, this would be manifest as English - Albanian Dictionary and Fjalor Shqip - Anglisht.

Figure 2 shows an XSLT script that formats the data for output as text, given three parameters:

  1. the source language
  2. the destination language
  3. the language of the intended market.

Note that it makes heavy use of the lang() function. If ISO 639 standard language names are used, then the database can be extended to include other languages, and the XSLT script need not be changed. The output is also sorted in a language-specific manner.

Figures 3, 4, 5, 6 and 7 show sample output from Michael Kay's Saxon XSLT processor on a Linux box. Be advised that other XSLT engines have a different way of passing parameters, so you will have to modify the command lines accordingly.

Figure 1: lexicon.xml

<?xml version="1.0" encoding="ISO-8859-1"?>

 lexicon - uses ISO 639 language names to choose language pair:

	en	English
	sq	Albanian
	nl	Dutch

 (There must be better element names than "instance"...)

 Stewart C. Russell <scruss@bigfoot.com> - 23/07/02001

      <title placement="after" xml:lang="en">Dictionary</title>
      <title placement="before" xml:lang="sq">Fjalor</title>
      <title placement="after" xml:lang="nl">Woordenboek</title>
      <langname xml:lang="en">
         <instance xml:lang="en">English</instance>
         <instance xml:lang="sq">Albanian</instance>
         <instance xml:lang="nl">Dutch</instance>
      <langname xml:lang="sq">
         <instance xml:lang="en">Anglisht</instance>
         <instance xml:lang="sq">Shqip</instance>
         <instance xml:lang="nl">Holandez</instance>
      <langname xml:lang="nl">
         <instance xml:lang="en">Engels</instance>
         <instance xml:lang="sq">Albanees</instance>
         <instance xml:lang="nl">Nederlands</instance>
         <instance xml:lang="en">cow</instance>
         <instance xml:lang="sq">lopė</instance>
         <instance xml:lang="nl">koe</instance>
         <instance xml:lang="en">house</instance>
         <instance xml:lang="sq">shtėpi</instance>
         <instance xml:lang="nl">huis</instance>
         <instance xml:lang="en">milk</instance>
         <instance xml:lang="sq">qumėsht</instance>
         <instance xml:lang="nl">melk</instance>

Figure 2: lexicon.xsl

<?xml version="1.0" encoding="ISO-8859-1"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">

 creates a SOURCE - DEST lexicon for MARKET
 Stewart C. Russell <scruss@bigfoot.com> - 23/07/02001

usage example:
 saxon lexicon.xml lexicon.xsl source=en dest=sq market=en

   <xsl:param name="source"/>
   <xsl:param name="dest"/>
   <xsl:param name="market"/>

   <xsl:output encoding="ISO-8859-1" method="text"/>

   <xsl:template match="*">

   <xsl:template match="head">
      <xsl:variable name="langpair" select="concat(langname[lang($market)]/instance[lang($source)], ' - ', langname[lang($market)]/instance[lang($dest)])"/>
      <xsl:variable name="bookname">
            <xsl:when test="title[lang($market)][@placement='before']">
               <xsl:value-of select="concat(title[lang($market)], ' ', $langpair)"/>
            <xsl:when test="title[lang($market)][@placement='after']">
               <xsl:value-of select="concat($langpair, ' ', title[lang($market)])"/>
<xsl:value-of select="$bookname"/>

<xsl:template match="body">
 <xsl:for-each select="word">
 <xsl:sort select="instance[lang($source)]"/>
 <xsl:value-of select="concat(instance[lang($source)], ': ', instance[lang($dest)])"/>


Figure 3: English - Albanian Dictionary for the English market

$ saxon lexicon.xml lexicon.xsl source=en dest=sq market=en           

English - Albanian Dictionary

cow: lopė

house: shtėpi

milk: qumėsht

Figure 4: Albanian - English Dictionary for the English market

$ saxon lexicon.xml lexicon.xsl source=sq dest=en market=en

Albanian - English Dictionary

lopė: cow

qumėsht: milk

shtėpi: house

Figure 5: Albanian - English Dictionary for the Albanian market

$ saxon lexicon.xml lexicon.xsl source=sq dest=en market=sq

Fjalor Shqip - Anglisht

lopė: cow

qumėsht: milk

shtėpi: house

Figure 6: Dutch - English Dictionary for the Dutch market

$ saxon lexicon.xml lexicon.xsl source=nl dest=en market=nl

Nederlands - Engels Woordenboek

huis: house

koe: cow

melk: milk

Figure 7: English - Dutch Dictionary for the English market

$ saxon lexicon.xml lexicon.xsl source=en dest=nl market=en

English - Dutch Dictionary

cow: koe

house: huis

milk: melk

© 02001 Stewart C. Russell <scruss@bigfoot.com>