Creating bilingual phrasebooks from a multilingual XML database
SUMMARY: This example solves the problem of producing bilingual phrasebooks for multiple markets, working from a static XML database and extracting the relevant data using XSLT.
Figure 1 shows, for three languages (English, Albanian & Dutch), three words ("cow", "house" and "milk"). It also shows how to build up the title of the book in each of the languages, with a simple rule for building up the title. This rule states whether the word "Dictionary" comes before or after the language pair in the title. For an English - Albanian/Albanian - English pair of phrasebooks, this would be manifest as English - Albanian Dictionary and Fjalor Shqip - Anglisht.
Figure 2 shows an XSLT script that formats the data for output as text, given three parameters:
- the source language
- the destination language
- the language of the intended market.
Note that it makes heavy use of the lang()
function. If ISO 639 standard language names are used, then the
database can be extended to include other languages, and the XSLT
script need not be changed. The output is also sorted in a
language-specific manner.
Figures 3, 4, 5, 6 and 7 show sample output from Michael Kay's Saxon XSLT processor on a Linux box. Be advised that other XSLT engines have a different way of passing parameters, so you will have to modify the command lines accordingly.
Update, 5 May 02003: Daniel Biddle has helpfully
pointed out that I'm misusing the xml:lang
attribute in
Figure 1 when I say:
<instance xml:lang="sq">Albanian</instance>
The content of this element should, of course, be in Albanian. As this proof-of-concept does what I meant it to, the correct solution is left to the reader … ☺
Figure 1: lexicon.xml
<?xml version="1.0" encoding="ISO-8859-1"?> <lexicon> <!-- lexicon - uses ISO 639 language names to choose language pair: en English sq Albanian nl Dutch (There must be better element names than "instance"...) Stewart C. Russell <scruss@bigfoot.com> - 23/07/02001 --> <head> <title placement="after" xml:lang="en">Dictionary</title> <title placement="before" xml:lang="sq">Fjalor</title> <title placement="after" xml:lang="nl">Woordenboek</title> <langname xml:lang="en"> <instance xml:lang="en">English</instance> <instance xml:lang="sq">Albanian</instance> <instance xml:lang="nl">Dutch</instance> </langname> <langname xml:lang="sq"> <instance xml:lang="en">Anglisht</instance> <instance xml:lang="sq">Shqip</instance> <instance xml:lang="nl">Holandez</instance> </langname> <langname xml:lang="nl"> <instance xml:lang="en">Engels</instance> <instance xml:lang="sq">Albanees</instance> <instance xml:lang="nl">Nederlands</instance> </langname> </head> <body> <word> <instance xml:lang="en">cow</instance> <instance xml:lang="sq">lopë</instance> <instance xml:lang="nl">koe</instance> </word> <word> <instance xml:lang="en">house</instance> <instance xml:lang="sq">shtëpi</instance> <instance xml:lang="nl">huis</instance> </word> <word> <instance xml:lang="en">milk</instance> <instance xml:lang="sq">qumësht</instance> <instance xml:lang="nl">melk</instance> </word> </body> </lexicon>
Figure 2: lexicon.xsl
<?xml version="1.0" encoding="ISO-8859-1"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0"> <!-- creates a SOURCE - DEST lexicon for MARKET Stewart C. Russell <scruss@bigfoot.com> - 23/07/02001 usage example: saxon lexicon.xml lexicon.xsl source=en dest=sq market=en --> <xsl:param name="source"/> <xsl:param name="dest"/> <xsl:param name="market"/> <xsl:output encoding="ISO-8859-1" method="text"/> <xsl:template match="*"> <xsl:apply-templates/> </xsl:template> <xsl:template match="head"> <xsl:variable name="langpair" select="concat(langname[lang($market)]/instance[lang($source)], ' - ', langname[lang($market)]/instance[lang($dest)])"/> <xsl:variable name="bookname"> <xsl:choose> <xsl:when test="title[lang($market)][@placement='before']"> <xsl:value-of select="concat(title[lang($market)], ' ', $langpair)"/> </xsl:when> <xsl:when test="title[lang($market)][@placement='after']"> <xsl:value-of select="concat($langpair, ' ', title[lang($market)])"/> </xsl:when> <xsl:otherwise/> </xsl:choose> </xsl:variable> <xsl:text>
</xsl:text> <xsl:value-of select="$bookname"/> <xsl:text>
</xsl:text> </xsl:template> <xsl:template match="body"> <xsl:for-each select="word"> <xsl:sort select="instance[lang($source)]"/> <xsl:text>
</xsl:text> <xsl:value-of select="concat(instance[lang($source)], ': ', instance[lang($dest)])"/> <xsl:text>
</xsl:text> </xsl:for-each> </xsl:template> </xsl:stylesheet>
Figure 3: English - Albanian Dictionary for the English market
$ saxon lexicon.xml lexicon.xsl source=en dest=sq market=en English - Albanian Dictionary cow: lopë house: shtëpi milk: qumësht
Figure 4: Albanian - English Dictionary for the English market
$ saxon lexicon.xml lexicon.xsl source=sq dest=en market=en Albanian - English Dictionary lopë: cow qumësht: milk shtëpi: house
Figure 5: Albanian - English Dictionary for the Albanian market
$ saxon lexicon.xml lexicon.xsl source=sq dest=en market=sq Fjalor Shqip - Anglisht lopë: cow qumësht: milk shtëpi: house