|
|
|
|
Program
Details
Monday, October 21, 2013
|
08:30-10:00 |
MORNING
TUTORIALS |
|
Presenter:
Richard Ishida
Internationalization
Activity Lead,
W3C
|
Track 1: An Introduction to Writing Systems & Unicode
The tutorial will provide you with a good understanding of the many
unique characteristics of non-Latin writing systems, and illustrate
the problems involved in implementing such scripts in products. It
does not provide detailed coding advice, but does provide the
essential background information you need to understand the
fundamental issues related to Unicode deployment, across a wide range
of scripts. It has also proved to be an excellent orientation for
newcomers to the conference, providing the background needed to assist
understanding of the other talks! The tutorial goes beyond encoding
issues to discuss characteristics related to input of ideographs,
combining characters, context-dependent shape variation, text
direction, vowel signs, ligatures, punctuation, wrapping and editing,
font issues, sorting and indexing, keyboards, and more. The concepts
are introduced through the use of examples from Chinese, Japanese,
Korean, Arabic, Hebrew, Thai, Hindi/Tamil, Russian and Greek. While
the tutorial is perfectly accessible to beginners, it has also
attracted very good reviews from people at an intermediate and
advanced level, due to the breadth of scripts discussed. No prior
knowledge is needed. |
|
|
Presenter:
Steven R.
Loomis
Software Engineer, IBM |
Track 2: Putting ICU to Work
This tutorial gives attendees everything they need to know to get
started with working with text in computer systems: character encoding
systems, character sets, Unicode, and text processing, using the
International Components for Unicode library (ICU).
ICU is a very popular internationalization software solution.
However, while it vastly simplifies the internationalization of
products, there is a learning curve.
The goal of this tutorial is to help new users of ICU install and
use the library. Topics include: Installation (C++ libraries, Java
.jar files, Java SPI for JDK integration), verification of
installation, introduction and detailed usage analysis of ICU's
frameworks (normalization, formatting, calendars, collation,
transliteration). The tutorial will walk through code snippets and
examples to illustrate the common usage models, followed by
demonstration applications and discussion of core features and
conventions, advanced techniques and how to obtain further
information. It is helpful if participants are familiar with Java, C
and C++ programming. Issues relating to ICU4C/C++ as well as ICU4J
(Java) will be discussed. After the tutorial, participants should be
able to install and use ICU for solving their internationalization
problems.
This updated presentation will include newer ICU features such as
AlphabeticIndex and the DateTimePatternGenerator. |
|
|
Presenter:
Addison Phillips
Globalization
Architect,
Lab126 (Amazon) |
Track 3: Internationalization: An Introduction
What is internationalization? What do developers, product
managers, or quality engineers need to know about it? How does a
software development organization incorporate internationalization
into the design, implementation, and delivery of an application?
This tutorial track provides an introduction to the topics of
internationalization, localization and globalization. Attendees will
understand the overall concepts and approach necessary to analyze a
product for internationalization issues, develop a design or approach,
and deliver a global-ready solution. The focus is on architectural
approaches and general concepts, but will include specific examples
and exercises.
|
|
10:00-10:30 - Morning
Refreshments |
|
10:30-12:00 |
MORNING
TUTORIALS |
|
Presenter:
Richard Ishida Internationalization
Activity Lead,
W3C |
Track 1: An Introduction to Writing Systems & Unicode
(Cont'd.)
The tutorial will provide you with a good understanding of the many
unique characteristics of non-Latin writing systems, and illustrate
the problems involved in implementing such scripts in products. It
does not provide detailed coding advice, but does provide the
essential background information you need to understand the
fundamental issues related to Unicode deployment, across a wide range
of scripts. It has also proved to be an excellent orientation for
newcomers to the conference, providing the background needed to assist
understanding of the other talks! The tutorial goes beyond encoding
issues to discuss characteristics related to input of ideographs,
combining characters, context-dependent shape variation, text
direction, vowel signs, ligatures, punctuation, wrapping and editing,
font issues, sorting and indexing, keyboards, and more. The concepts
are introduced through the use of examples from Chinese, Japanese,
Korean, Arabic, Hebrew, Thai, Hindi/Tamil, Russian and Greek. While
the tutorial is perfectly accessible to beginners, it has also
attracted very good reviews from people at an intermediate and
advanced level, due to the breadth of scripts discussed. No prior
knowledge is needed.
|
|
|
Presenter:
Jim DeLaHunt
Principal, Jim
DeLaHunt & Associates |
Track 2: Building Multilingual Websites in Drupal 7 and Joomla 3
A practical look at the language and locale capabilities of Joomla!
3 and Drupal 7, two leading free software content management systems (CMSs).
They let you build more powerful, more international websites faster.
We look at: their core internationalization and locale services;
localization of UI and content. Each platform just had a major
release, with advances in internationalization. You will leave with
specific tips for building your own site. We don't assume Joomla or
Drupal experience, but do include material for advanced practitioners.
A good tutorial for web site product managers, web designers,
developers, and managers of international web teams. |
|
|
Presenter:
Tex Texin
Globalization Architect, Xencraft
|
Track 3: Web Internationalization - Standards and Best Practices
This tutorial is an introduction to internationalization on the
World Wide Web. The audience will learn about the standards that
provide for global interoperability and come away with an
understanding of how to work with multilingual data on the Web.
Character representation and the Unicode-based Reference Processing
Model are described in detail. HTML, including HTML5, XHTML, XML (eXtensible
Markup Language; for general markup), and CSS (Cascading Style Sheets;
for styling information) are given particular emphasis. The tutorial
addresses language identification and selection, character encoding
models and negotiation, text presentation features, and more. The
design and implementation of multilingual Web sites and localization
considerations are also introduced. |
|
12:00-13:00 - LUNCH |
|
13:00-14:30 |
AFTERNOON
TUTORIALS |
|
Presenters:
Craig R. Cummings Globalization
Lead, Zynga, Inc.
Michael McKenna
International Engineering Leader, Zynga, Inc. |
Track 1 - Unicode - The Advanced Tour
This tutorial will cover some of the more advanced subjects in
Unicode. We'll discuss properties and the Unicode Code Database (UCD),
normalization, character encoding forms, case mapping, boundary
analysis, supplementary characters, the Unicode Collation Algorithm (UCA),
Common Locale Data Repository (CLDR), and bidirectional text support,
including shaping algorithms. For each, discussion will range over
algorithms, implementations, pros and cons, and gotchas. We'll end
with examples from real-world cases including last year's hit slide
set -- 'Unicode Gone Bad'. |
|
|
Presenters:
Aharon Lanin
Software
Engineer, Google
Richard Ishida
Internationalization Activity Lead,
W3C
Andrew Glass
Program
Manger, Microsoft
|
Track 2 - Handling Arabic, Hebrew, and
Other Right-to-Left Scripts in HTML
This tutorial explains the problems encountered when authoring web
pages in a right-to-left language, or even left-to-right pages
containing snippets of right-to-left text, and offers a step-by-step
guide for preventing most of them. It will also highlight the relevant
new features in HTML5, CSS3, and Unicode 6.3."
|
|
|
Presenters:
Daniel Goldschmidt
Sr.
International Program Manager,
Microsoft Corporation
Iris Orriss
Head
of Localization, Facebook
|
Track 3 - Localization Workshop
Two highly experienced industry experts will illuminate the basics
of localization for session participants over the course of three
one-hour blocks. This instruction is particularly oriented to
participants who are new to localization. Participants will gain a
broad overview of the localization task set, issues and tools.
Subjects covered will be fundamental problems that localization
addresses such as components of localization projects, localization
tools and localization project management. There will also be time for
questions and answers plus the opportunity to take individual
questions offline with the presenters.
|
|
14:30-15:00 - Afternoon Refreshments |
|
15:00-16:30 |
AFTERNOON
TUTORIALS |
|
Presenter:
Martin J.
Dürst
Professor,
Aoyama Gakuin University
|
Track 1 - Character Normalization
This part of the tutorial covers character normalization in
Unicode. It starts with a short history of normalization to help
participants understand the purpose of normalization. We will then
introduce the four standard normalization forms (NFC, NFCK, NFD, NFDK),
followed by (non-official!) normalization variants and other
normalization operations such as quick checks. Next, we cover
normalization pitfalls such as the interaction of normalization with
other operations (e.g. string concatenation, casing) and Unicode
versions and corrigenda. We continue by discussing
implementation-related details including where to find the necessary
data for implementation and testing, and various trade-offs for
efficient implementations. We conclude giving guidelines and advice on
where to use which form of normalization (and where to not use
normalization).
|
|
|
Presenter:
Norbert Lindenberg
Internationalization
Consultant, Lindenberg
Software LLC |
Track 2 - Internationalizing JavaScript Applications
Somewhere on the way to global success, you'll have to get your
software ready to support different human languages and cultures. For
software written in JavaScript it's not obvious how to start: While
the new ECMAScript Internationalization API Specification defines core
functionality, it's not available in all browsers yet, and doesn't
cover all the functionality you need. This talk surveys what browsers
and libraries provide today to make the task easier, and how you can
best take advantage of them.
Topics include resource loading, message construction, sorting,
number formatting and date and time formatting, support for emoji and
other supplementary characters, regular expressions, Unicode
normalization, and internationalized domain names.
|
|
|
Presenter:
Daniel Goldschmidt
Sr.
International Program Manager,
Microsoft Corporation
Iris Orriss
Head
of Localization, Facebook
|
Track 3 -
Localization Workshop (Cont.)
Two highly experienced industry experts will illuminate the basics
of localization for session participants over the course of three
one-hour blocks. This instruction is particularly oriented to
participants who are new to localization. Participants will gain a
broad overview of the localization task set, issues and tools.
Subjects covered will be fundamental problems that localization
addresses such as components of localization projects, localization
tools and localization project management. There will also be time for
questions and answers plus the opportunity to take individual
questions offline with the presenters.
|
Tuesday, October 22, 2013
|
09:00-09:15 |
WELCOME & OPENING REMARKS
|
|
09:15-10:00
|
KEYNOTE PRESENTATION - TBA
|
|
Presenter:
|
|
|
10:00-10:30 - Morning Refreshments |
|
10:30-11:20 |
SESSION 1 |
|
Presenter:
Roy Yokoyama
Principal
Global Engineer, Google-Motorola
Mobility
|
Track 1 - Internationalization and
Localization for Android Devices
Android is the world's most popular mobile platform. There
are more than 600,000 apps and games available worldwide and
growing strong. This panel will cover the anatomy of Android
localization frameworks, structure of resource assets and
qualifiers, and how to localize your applications.
|
|
|
Presenter:
Peter
Edberg
Senior
Software Engineer, Apple Inc.
Mark Davis
Sr. Internationalization Architect, Google Inc.
Steven R.
Loomis
Software Engineer, IBM
|
Track 2 -
What's New in CLDR
The Unicode Consortium's Common Locale Data Repository
project (CLDR) defines LDML (Locale Data Markup Language) and
uses it to organize and provide the most extensive open
repository of locale data, with data collected primarily via the
web-based Survey Tool. This session provides a brief overview of
CLDR, then focuses on recent and forthcoming enhancements,
including more consistent non-Gregorian calendar support and
addition of the Korean Dangi lunar calendar, formatting time
durations, collation data changes, more control of currency
formatting, improvements to the data collection process, and
JSON support. A significant amount of time will be reserved for
demos of behavior based on CLDR data. |
|
|
Presenters:
Addison Phillips
Globalization
Architect,
Lab126 (Amazon)
Richard Ishida
Internationalization Activity Lead, W3C |
Track 3 - The Multilingual Web: A Status
Report
New standards activity at the W3C and other standards bodies
is helping improve the globalization of the Web. From HTML5 to
CSS3, from JavaScript to Unicode bidi, support for creating
international content and Web-based applications is evolving
quickly. This presentation, from W3C internationalization
leaders Addison Phillips and Richard Ishida explores the changes
that are available today, the status of on-going work, and the
challenges that remain.
|
|
11:30-12:20 |
SESSION 2 |
|
Presenter:
Michael S.
Kaplan
Program
Manager, Microsoft
|
Track 1 - Metro or Whatever it's Called
These Days
Coming Soon...
|
|
|
Presenters:
Peter
Edberg
Senior
Software Engineer, Apple Inc.
Mark Davis
Sr. Internationalization Architect, Google Inc.
Steven R.
Loomis
Software Engineer, IBM
|
Track 2 - CLDR Users Panel and Discussion
Coming
Soon...
|
|
|
Presenter:
Felix Sasaki
DFKI / W3C Fellow
|
Track 3 - ITS 2.0: Facilitating Automated
Creation and Processing of Multilingual Web Content
The Internationalization Tag Set (ITS) 2.0 http://www.w3.org/TR/its20/
is an upcoming technology developed by the World Wide Web
Consortium (W3C). Its predecessor, ITS 1.0, provides a set of
metadata items ("data categories") for creating
internationalized XML which can be localized effectively. ITS
2.0 extends the scope of ITS 1.0 in various ways. First, ITS 2.0
can be applied to further formats, e.g. HTML5 or other versions
of HTML. Second, ITS 2.0 provides additional data categories
(one aim is to create further application areas for ITS in the
realm of automated language processing such as machine
translation, text analysis annotation or workflows including
automatic language quality checks). Third, ITS 2.0 provides
implementations in interoperability usage scenarios. A dedicated
draft document http://www.w3.org/TR/mlw-metadata-us-impl/
already has been published to provide information about these
scenarios. This presentation will 1) introduce the basic
principles of ITS 2.0; 2) describe its relation to technologies
like HTML5 and XLIFF; 3) exemplify selected data categories and
usage scenarios; and 4) discuss what ITS 2.0 cannot achieve and
what might be desirable in the future.
|
|
12:30-13:30 - LUNCH |
|
13:30-14:20 |
SESSION 3 |
|
Presenter:
Jeremy Cole
Sr.
System Engineer, Google, Inc.
|
Track 1 -
MySQL History and Practical i18n and i10n
MySQL is one of the world's most popular relational
database systems, powered many of the world's busiest websites.
As the world has moved toward Unicode,MySQL has had to do so as
well. Learn about the challenges specific to MySQL, general
database challenges, solutions, and pitfalls for
internationalization of databases and internationalization and
localization of MySQL itself. MySQL has not always made the best
decisions in its support of Unicode, so we'll also look at some
of the snags MySQL has encountered along the way.
|
|
|
Presenter:
John Emmons
Senior
Software Engineer, IBM
|
Track 2 - CLDR Data for Javascript and the
Web
The Unicode Common Locale Data Repository (CLDR) is the most
extensive and widely used standard repository of locale data for
software developers to use as a resource in building
internationalized applications. The data is typically published
in XML format as defined by the Locale Data Markup Language (
LDML ) specification ( Unicode Technical Report TR#35 ). XML
format, however, is not always the best choice when it comes to
writing applications for the web. This presentation will
highlight recent efforts by the CLDR technical committee to
create and maintain an algorithm and tool set to convert CLDR's
data to a standardized JSON format that would be consumable by
Javascript applications, while still being able to represent all
the various types of data that are currently maintained in the
CLDR.
|
|
|
Presenter:
Nick Patch
Software
Engineer, International,
Shutterstock
|
Track 3 - Unicode Programming in Modern
Perl
In 2010 the Perl 5 language switched to a yearly major
release cycle with monthly developer releases. Perl has a
history of great Unicode support and the new development process
has enabled the language to rapidly enhance Unicode
functionality and support new Unicode standards. This talk will
demonstrate the current state of Unicode in Perl, review the
exciting changes in the last few years, and touch on future
development.
The following core language features will be discussed:
- character strings
- character properties
- regular expressions
- case mapping and case folding
- normalization
- extended grapheme clusters
- segmentation
- identifier names
- UTF-8 source code
- I/O encoding layers
|
|
14:30-15:20 |
SESSION 4 |
|
Presenter:
Jungshik Shin
Systems
Engineer, Google, Inc.
|
Track 1 - No More Tofus and Ransom Note:
Beautiful Fonts for All
Unicode has over 110 thousand characters in over 100 scripts
for all the languages. No set of fonts covers them all with the
consistent and harmonious look and feel. Instead, we have a lot
of fragmented fonts with a hodgepodge coverage with a lot of
"holes" leading to Tofus and "ransom
note-like" rendering. Google embarked on a very challenging
endeavor of developing a set of fonts to cover all the
characters in the Unicode with beauty and harmony across
scripts. In this talk, we will present what we have accomplished
and how we have overcome multitude of challenges and
difficulties, expected and unexpected, in our journey to
"Beautiful Fonts for All".
|
|
|
Presenter:
George Rhoten
Language
Technologies Engineer, Apple Inc.
|
Track 2 - Do You Speak CDLR?
CLDR (Common Locale Data Repository) is a wonderful resource for
localized data. It has a vast supply of localized data for
dates, times, numbers, names and related data. Most of this data
is geared towards the printed world of the Internet and visual
applications.
The printed format of this data is typically short hand
notation for concepts that are speakable. How would you say a
number like the number 1? How do you say the time like 12:00 PM?
What is the pronunciation of a date like January 1? What is the
pronunciation of a currency value like $1.99? This presentation
covers pronunciation topics that are being planned for CLDR. |
|
|
Presenter:
Martin J.
Dürst
Professor,
Aoyama Gakuin University
|
Track 3 - Implementing Normalization in
Pure Ruby - the Fast and Easy Way
Unicode normalization eliminates encoding variants for what
is essentially the same character. We present a new efficient
way to implement normalization in scripting languages such as
Ruby. Our implementation takes advantage of two facts: First,
that large percentages of many kinds of text do not need to be
changed when normalizing, and second, that the number of
different sequences that need to be normalized in a certain
language or group of languages is always much lower than the
number of theoretically possible sequences. We exploit each of
these facts with built-in functionality of Ruby.
Specially-crafted regular expressions capture sequences of
characters in possible need for normalization, and a hash
structure is used to look up their normalization. The hash
structure starts empty and is updated by a default block of code
whenever a new sequence of characters needing normalization is
identified. As a result, no actual normalization code is
executed once the hash structure incorporates the character
sequences used in a particular language.
Performance tests with many different implementations and a
wide range of languages (German, Vietnamese, Korean,...) with
different normalization needs show that our method is one or
more orders of magnitude faster than straightforward
implementations in scripting languages, and in some cases close
to the speed of implementations in compiled languages. Because
the code is written in pure Ruby, it is very easily portable and
configurable, and similars techniques may be useful for other
internationalization operations such as case conversions. |
|
15:20-16:00 - Afternoon Refreshments |
|
|
SESSION
5 |
|
Presenter:
Mark Davis
Sr.
Internationalization Architect, Google Inc.
|
Track 1 - Innovations in
Internationalization at Google
This presentation covers the range of internationalization
challenges that Google has encountered and overcome in the past
year, illustrating the challenges that many companies face.
Among the topics are how to better serve multilingual users
(personal names, plurals, gender, language resolution),
localization of entities (http://goo.gl/3s7SR),
phone numbers and addresses, expanding CLDR and ICU (and Google
products) to more languages, speech-to-text, machine
translation, input & fonts, client-side i18n, and the
overall localization process.
|
|
|
Presenter:
Shan Fu
International
QE, Adobe Systems
|
Track 2 - Best Practice in A New Working
Model for Localization Testing
Localization testing of multilingual software is important
for a company's international marketing strategy. Nowadays,
language-oriented working model is widely used in software
localization functional testing; however, this model doesn't
work as expected in testing efficiency and effectiveness in
practice. This proposal introduces a new working model for
localization functional testing to improve the product quality
and reduce the testing cost by taking one Adobe product
localization functional testing as an example; what's more, it
also discusses how to choose a proper working model for
large-scale localization functional testing.
Unlike the original "language-oriented" model, this
new working model is based on feature-oriented model, and has
been improved according to team size, languages, product type,
testing phase and other factors. This enhanced working model has
proved to be effective in localization functional testing on an
Adobe enterprise desktop product, which supports over 20
languages.
As an enhanced feature-oriented working model, its main
advantages are as follows:
1. Better product quality. With this model, quality engineers
could have more opportunities to execute new feature test cases,
so they could find more bugs on localization, even on core
functions.
2. Higher efficiency, lower cost. In main release of the
Adobe product, around 50% of features are new ones, so the cost
of time for new features learning is very high. In terms of
localization functional testing on new features, the enhanced
feature-oriented working model could help reduce working hours
by around 30% as opposed to language-oriented one.
3. More helpful for team building. With this model, quality
engineers could have a chance to study product features and
discuss testing issues in depth, so that both individual
capability and team building could be improved.
Practice shows that with this new working model, 15% of cost
on localization functional testing has been reduced, and 40%
more bugs have been found, which turns out to be more effective
than before; besides, it also has a huge business value to a
software product targeting international markets.
|
|
|
Presenters:
Tex Texin
Globalization Architect, Xencraft
Craig R. Cummings
Globalization Lead, Zynga, Inc. |
Track 3 - Comparing Javascript Libraries
Which JavaScript library is best for international
deployment? This session presents the results of an
investigation of the features of several JavaScript libraries
and their suitability for international markets. We will show
how the libraries were tested and compare the results for at
least: Dojo, jQuery, and YUI. The results may surprise you and
will be useful to anyone designing new international or
multilingual JavaScript applications or supporting existing
ones. |
|
17:00-17:50 |
SESSION
6 |
|
Presenter:
TBA
|
Track 1 - TBA
|
|
|
Presenter:
Tex Texin
Globalization Architect, Xencraft
|
Track 2 - Critical Values for i18n Testing
This presentation is an extension of the popular presentation
from last year "Does it Hurt When I do This? Data for I18n
Testing".
In this session, we recommend specific data values that are
likely to identify internationalization problems in software
intended for global markets.
Based on years of global software experience, these data
values are useful for functional or linguistic QA tests of
internationalized software. In the previous session, data value
recommendations included character encoding, postal address,
locale and other data types typically used in software and will
trigger common internationalization problems. This presentation
will offer specific new test suggestions. |
|
|
Presenters:
Norbert Lindenberg
Internationalization
Consultant, Lindenberg
Software LLC
Nebojša Ćirić
Software Engineer, Google,
Inc.
Suresh Jayabalan
Program
Manager, Microsoft
|
Track 3 - Internationalization the Core of
JavaScript, Second Edition
After last year's approval of the first edition of the
ECMAScript Internationalization API Specification,
internationalization work on the core of JavaScript continued in
three areas: Implementation of the API in browsers, broadening
its scope in a second edition of the specification, and
strengthening Unicode support in the ECMAScript Language
Specification. We provide an update on the latest work in all
three areas.
|
|
18:00-19:00 - IUC37 CONFERENCE RECEPTION |
Wednesday, October 23, 2013
|
09:00-09:50 |
SESSION
7 |
|
Presenter:
Adam Asnes
President,
Lingoport
|
Track 1 - A Cost and ROI Model of
Internationalization Issues
In many organizations, localization engineering teams
and i18n engineers must battle to gain developer attention
and organizational focus. Often the i18n Lead, if there is
one, does not independently command budget initiatives.
Yet i18n and localization engineering issues that are
found during testing or after the "English"
release are tolerated and even considered normal process.
When compared to better-funded development initiatives
like security, we've heard statements like "nobody
gets into the news because of an encoding issue."
The costs in engineering and time are not well
understood, and we think if they are clearly accounted for
and calculated in a flexible system, we can further the
case for software globalization planning and efficiencies
at the highest management levels.
Working in association with customers at Intel, the LDS
Church and others, we have catalogued the most pervasive
types of globalization engineering issues and their
effects in an engineering ROI calculator. The calculator
accounts for broad scope, variable engineering costs, the
time during the development process issues are found and
fixed, cost in time and money, and information about
delays in planned revenues.
We would like to share this calculator model with the
industry to help conference participants and their
organizations understand and communicate financial
consequences in time, money and strategic objectives.
|
|
|
Presenter:
Travis Keep
Software
Engineer, Google, Inc.
|
Track
2 - A Time for Everything
There is a time for everything. Dates and times are an
important part of our everyday lives, and displaying them
correctly in software is essential. This session will
cover some of the subtle details associated with
formatting dates and times as well as how the ICU API
abstracts away many of these details while still remaining
powerful and flexible. Topics include adapting localized
formats to a client specification of which fields to
include, formatting intervals and durations, handling
non-Gregorian calendars, handling special number systems
in date formats, and multiple ways of formatting timezone
names.
|
|
|
Presenters:
Deborah Anderson
Researcher
University of California Berkeley
Anshuman
Pandey
Doctoral Student,
Dept. of History, University of
Michigan
|
Track
3 - The Script Encoding Initiative: The Successes-and
the Trials and Tribulations-of Encoding Scripts
Since 2002, the UC Berkeley Script Encoding Initiative
has been assisting user communities to get various
eligible characters and scripts proposed for inclusion in
the Unicode Standard. This joint presentation, given by
the Project Leader of SEI and one of its key Unicode
proposal authors, will discuss the highlights as well as
the outstanding problems faced by those proposing
unencoded scripts.
The SEI project has been very successful, with over 69
scripts approved by the Unicode Technical Committee. The
project has also sponsored work on over 38 preliminary
script proposals. (These proposals require additional
information and/or user community buy-in before they are
considered mature and ready for approval by the standards
committees.)
The project's work on final and preliminary proposals
has been made possible thanks to funding from NEH, Google,
and individual and group donors. However, the NEH and
Google support has come to an end, so funding is limited.
This presents a problem, for script research and the
standards approval process itself take several years.
Long-term support is the key to ensuring eligible scripts
are kept in the encoding pipeline. The SEI Project Leader
will suggest a few ideas on possible approaches to
acquiring stable funding.
After 11 years, it would seem the list of unencoded
scripts should be significantly reduced, but because of
the development of new scripts and increased interest (and
recognition of the importance of encoding) of historic
scripts, the number of unencoded scripts is at about 90.
The co-presenter, Anshuman Pandey, will discuss the
creation and propagation of newly constructed scripts, a
phenomenon found particularly in South Asia. He will also
discuss problems in getting newly approved scripts
supported in commercial software and fonts, which hobbles
the user communities.
The talk will end with some suggestions on how members
of the audience can assist in the effort to get unencoded
scripts into Unicode, and be supported in software.
|
|
10:00-10:50 |
SESSION
8 |
|
Presenter:
Loïc
Dufresne de Virel
Localization
Strategist, Intel Corporation
|
Track 1 - From User Experience to
Benutzererlebnis - UX Joins L10N to Offer a Better Global
User Experience
There is much to gain by combining UX and L10N, often
afterthoughts in the world of software development, to
provide a better global user experience. From basic
internationalization to customization and culturalization,
we'll cover topics that seem trivial but present some
significant challenges from a User Experience and
Localization standpoint, such as organization of
information, methods of payment, collection of personal
information, handling of genders, numbers, and
declensions, or language selection. We'll then look into
the new usage models offered by technologies such as Voice
Recognition or Perceptual Computing, which promise to
change the way we interact with our devices.
|
|
|
Presenters:
Addison Phillips
Globalization
Architect, Lab126 (Amazon)
|
Track 2 -
How to Spot a Flying Unicorn: Analyzing User Interface
Designs for Localizability
The user interface is often the difference between
success and irrelevance for a software product. Getting
the maximum of usability, accessibility, and functionality
out of design, with a minimum of user befuddlement is a
big job.
But what happens when that user interface will be
translated into 20 languages? Will the design accommodate
their different needs? Will they even fit onto the screen?
Are cultural or linguistic issues even addressable at
design time? How can the designers, developers, or the
localizers see these "flying unicorns" lurking
in their design choices without actually building the
different language products?
This presentation gives concrete examples from real
products showing an approach to reviewing user experience
design for a global audience.
|
|
|
Presenters:
Brian Kemler
Program Manager, Google, Inc.
Craig Cornelius
i18n
Engineer, Google, Inc.
|
Track 3 - Burmese - Challenges &
Lessons Learned from an Early Adopter
Google Search is the first major online service to
support the Burmese language in its interface. Although
the Unicode standard includes a block for the Burmese
writing system, Burmese text in websites is written using
multiple, non-unicode schemes including Zawgyi and other
font encodings. Enthusiastic speakers developed fonts in
the vacuum left by major vendors due to fear over trade
sanctions or lack of a perceived market importance. Now
that Myanmar is opening, interesting challenges arise.
Detecting the encoding methods and processing Burmese
text present some intriguing challenges around
segmentation, input and output methods. Our discusses
these themes, and outlines our progress toward encouraging
the standardization of the Burmese language on the web. Of
particular importance is promoting the adoption of the
Unicode font MM3 within the context of not only Burmese,
but also minority languages in Myanmar such as; Karen,
Kayah, Pali, Rumai Palaung, and Shan.
|
|
10:50-11:10 - Morning Refreshments |
|
11:10-12:00 |
SESSION
9 |
|
Presenter:
Michael McKenna
International Engineering Leader, Zynga, Inc.
|
Track 1 - Tech View of Global Social
Games Development
This session will cover many of the technology problems
and many of the solutions involved in creating, managing,
and deploying social games for the World. Zynga creates
games in Flash for Facebook and the Web, and mobile games
for iOS and Android using a range of different
technologies. But globalization, to be effective in this
environment, must follow a consistent model that is
portable, flexible, and meets the needs of the many issues
games encouter when designed for multiple languages and
cultures. We will cover the following topics:
- Design issues for global content and global asset
database design
- Linguistic issues for social dialog across langauges
- What to track for effective metrics
- Accelerated development life cycles
- Cadence and content release driving blinding speed
of process
- Build systems and process automation
- Localization test automation (where possible)
- Portability issues for core technology and build
systems across development environments
- Continuous improvement
After this talk, you should have a pretty good idea of
what its like to be in the midst of international
engineering and production at a fast (really fast) paced
social gaming company, publishing dailiy in multiple
languages and cultures.
|
|
|
Presenters:
Michael
Kuperstein
Localization
Engineer, Intel
Octavio Ramos
Localization
Software Quality Assurance Lead
& Engineer, Intel
|
Track
2 - Which *Bleep* Language Tag Should I Use?
How hard could it possibly be to choose a list of
language abbreviations for your product? It seems pretty
easy, since most of the new development environments all
claim to support BCP 47 / RFC 5646. But as it turns out,
language tag choice is much harder than it seems, since
there isn't a definitive list of tags in the BCP 47
specification. Therefore, every company has their own list
of language tags, and some are very, very creative!
Besides, the end user will probably never see these
language tags, so it's not that important, right?
Unfortunately, choosing the wrong language tag can
cause technical failures in products, or can cost many
thousands of dollars in lost translation memory
leveraging, or can create confusion in teams that
generates hundreds of emails and awkward workarounds.
Perhaps you have installed the Chinese Traditional
language pack for Windows, and your application is
displaying English instead? Maybe you have localized into
French for Canada, and the app refuses to show the
translations? Customers may ask, "We want Spanish for
Latin America!" Ok then, what is the language tag for
Spanish in Latin America - for each of the dozen systems
that process language data in your company? What do you
do?!?
In this entertaining session, we will place giant red
construction cones around the potholes of choosing the
wrong language tags when developing drivers, desktop apps,
and web applications on various operating systems, or when
transferring data between different enterprise and
operating systems. |
|
|
Presenter:
|
Track
3 - TBA
|
|
12:00-13:00 - LUNCH |
|
|
SESSION
10 |
|
Presenters:
Steven R. Loomis
Software
Engineer, IBM
Markus
Scherer
Unicode
Software Engineer, Google, Inc.
|
Track 1 - What's New in ICU
The International Components for Unicode library, or
ICU, provides a full range of services for Unicode
enablement, and is the globalization foundation used by
many software packages and operating systems, from mobile
phones like Android or iPhone all the way up to mainframes
and cloud server farms. Freely available as open-source,
it provides cross-platform C, C++, and Java APIs, with a
thread-safe programming model.
This presentation will provide a brief overview of ICU,
with emphasis on the recent updates in ICU 51 & 52,
including the latest support for Unicode 6.3 and CLDR 24,
date/time formatting & parsing improvements, and other
changes (see http://site.icu-project.org/download).
The presentation will also touch on ICU's planned
direction for future releases. |
|
|
Presenter:
John Huan Vu
Senior
Software Engineer, Zynga
Inc.
|
Track
2 - Localization Framework for Dynamic Text
"Localization Framework for Dynamic Text" is
a pending patent that was developed on the Mafia Wars team
at Zynga. During the time, Mafia Wars could not scale to a
typical localization framework due to the fact that (1)
the game runs mostly on PHP with no mature
internationalization framework at the time, (2) the game
generates a lot of dynamic text with at least 4.5 million
words, and (3) the game's codebase is over 2 years old.
This innovative framework addresses the problems by (1)
capturing the most up-to-date generated dynamic strings,
(2) scaling to an existing (legacy) codebase, (3)
supporting engineers with little or no localization
knowledge, and (4) enabling a continuous localization
process for new features and content created. In this
talk, you will learn about this innovative framework in
detail and how it can be utilized as an alternate to a
typical localization framework.
|
|
|
Presenters:
Markus
Scherer
Unicode
Software Engineer, Google, Inc.
Mark Davis
Sr.
Internationalization Architect, Google Inc.
|
Track
3 - One Hundred Ways of Sorting Strings (and Counting)
How do we bring order to 100,000 characters? How do we
support 100 language-specific sort orders with little
data, and make it fast? How do we build phone book
indexes, put your native script first, ignore accents and
punctuation, and search for words in a web page?
Join us for a tour of the Unicode Collation Algorithm (UCA),
the CLDR collation data, and the ICU APIs for sorting,
indexing and searching.
|
|
14:00-14:50 |
SESSION 11 |
|
Presenter:
Roshan Singh
MTS
2, Adobe Systems
|
Track 1 - NLP: A Quick Global
Multilingual Approach to Develop Named Entity Recognizer (NER)
Using DBpedia
Named-entity recognition (NER) (also known as entity
identification and entity extraction) is a subtask of
information extraction that seeks to locate and classify
atomic elements in text into predefined categories such as
the names of persons, organizations, locations,
expressions of times, quantities, monetary values,
percentages, etc. It forms the core of Semantic Text
analytics applications. NER systems have been created that
use linguistic grammar-based techniques as well as
statistical models. Hand-crafted grammar-based systems
typically obtain better precision, but at the cost of
lower recall and months of work by experienced
computational linguists. Statistical NER systems typically
require a large amount of manually annotated training
data. We discuss DBpedia's multilingual data resource and
techniques to extract meaningful information to develop
NER which can be extended to all major languages like
German, Chinese, Japanese, etc. apart from English. And
hence we share a quick global approach to develop a simple
NER out of DBpedia with least effort for all major
languages (EN, DE, ZH, JA, etc.) which tags more than 10
types of named entities like Person, Place, Organization,
Work, Medicine, Transport (vehicle), etc. We demonstrate
the NER for EN and DE on various test articles.
|
|
|
Presenter:
Alolita Sharma
Director
of Engineering, Wikimedia Foundation
|
Track
2 - Socializing Bi-Di
Innovation is key to software engineering success.
Wikipedia's innovative language engineering projects use
agile development to create high-quality
internationalization and localization features and apps in
faster iterations. This talk examines what works and what
doesn't when using agile development for large open source
projects with globally distributed teams. This talk will
help developers and engineering managers better implement
a successful agile process for their open source projects
especially for internationalization and localization
engineering.
|
|
|
Presenter:
Addison Phillips
Globalization
Architect, Lab126 (Amazon)
|
Track
3 - A Dog's Breakfast: Sorting Earth's Largest Collection of
Content Without Going Crazy
Sorting is one of the most basic, fundamental things we
do with text. When you have lots of data, putting it into
some kind of order makes it easier for people to find,
access, and use it.
Your Amazon Kindle can store hundreds of books (videos,
MP3s, etc.), in many languages used around the globe. How
would you sort "Earth's Biggest Selection"? How
can you sort many languages of content at the same time?
The Unicode Collation Algorithm is a start. But much
more is required. This session gives the basic
"tutorial facts" about sorting, and then delves
into the additional issues that surround handling user
content--and how we turned a real mess (a "dog's
breakfast") into something users could navigate.
|
|
14:50 – 15:10 - Afternoon Refreshments |
|
15:10 - 16:00 |
SESSION 12 |
|
Presenter:
Manish Aggarwal
Computer
Scientist, Adobe Systems
|
Track 1 - Enabling Indic Languages
in Applications
Adobe has finally begun to support the languages of India
in CS6 and CS7. But supporting complex Indic scripts such
as Devanagari is non-trivial. For example, the input
sequence order of characters can be different from the
actual pictorial representation, and they rely heavily on
diacritics and ligature formation. This talk covers the
requirements, challenges, and architecture of the Indic
text shaping engine that was developed to support Indic
languages.
|
|
|
Presenter:
Gaurav Bathla
Quality
Engineer Lead Globalization, Adobe
Systems
|
Track
2 - Overcoming Testing Challenges in Agile Model for Adobe
Products
This topic will cover various challenges in the
localization cycle of products using agile/ scrum workflow
and their mitigation. While most of the products localized
today use the agile-scrum model; the challenges include
optimizing the localization testing process w.r.t. core
development cycle, re-work effort, bugs deferred due to
time constraint, and the most feared last minute changes
amongst others. After studying the bugs/ issues
encountered over various localization cycles; we ran a
pilot project where the approach was changed to using a
modified variant of the V-model with the scrum model in
order to reach a resolution to counter the various
concerns.
With localization of various Adobe products reaching
millions of customers; nearly 26 locales across multiple
Windows and Mac platforms get modified for any change in
the English version of the product; increasing the
complexity as the change needs to be tested across a
matrix of around 300 variants. The situation moves towards
a nail biting end; around the release time frame of the
product. With the agile-scrum model these changes come
regularly and with a shorter turnaround time; thus
generating a need for a better managed model for
localization.
Reviewing the challenges faced and the adhoc solutions
employed over the past cycles for various Adobe products;
a strategic change to the approach of the agile
development process was tested. As part of the change
every module of the product was treated as a sovereign
artifact complete with exclusive development, testing, UI
freeze and release cycles before integrating the same into
the product mainline independent of the scrum timelines.
These modules were strictly frozen from development
perspective once integrated. This methodology along with
multiple other process enhancements reduced the re-work
timeframe to a miniscule level as compared to earlier
cycle in addition to a huge saving in localization
efforts.
|
|
|
Presenter:
Andrew Glass
Program
Manger, Microsoft
|
Track
3 - INew Shaping Engines for Windows
This presentation will look at a few complex writing
systems that have recently been enabled for input and
display on Windows. |
|
16:10 - 17:00 |
SESSION
13 |
|
Presenter:
Roshan Singh
MTS
2, Adobe Systems
|
Track 1 - Text Analytics:
Multilingual Dynamic Ontology Creation Using Wikipedia
Ontology provides controlled, consistent vocabularies
to describe concepts and relationships. It forms the core
of Semantic Text analytics applications. Currently
building ontology is complex, manual and time consuming.
We discuss Wikipedia's multilingual data resource
structured in the form of complex directed graph and
issues in parsing the graph to extract meaningful
information and ontology tree which can be extended to all
major languages like German, Chinese, Japanese, etc. apart
from English. We have developed ontology in EN, DE, ZH and
JA and the solution can be extended to all other major
languages supported by wikipedia.
We demonstrate "Manthan" (the document
categorizer), an application of this dynamic Ontology, to
classify/categorize arbitrary documents in EN and DE. The
result section also visually displays the tree hierarchy
of the category path extracted out of ontology.
|
|
|
Presenter:
Saurav Gupta
Sr.
Globalization Lead, Adobe Systems
|
Track
2 - Oiling the Friction Between Agile and Localization
Agile is demand of today's market to adapt to frequent changes in business.
Agile is always on fast move, and team often stumbles at
last minutes due to unanticipated localization issues. The
resulted repercussions impact could be drastic, can
sometimes lead to delay in projects, and no one wants to
dirty his pants at end, but unfortunately you find
yourself in it.
This paper will take you through the journey of one of
the flagship products (Illustrator) starting from
transition from typical waterfall model to Agile, adapting
to Agile with adoptions of new strategies, defining the
new Localization best practices, writing down the success
story of making to market.
Also with this paper, putting few thoughts on table
around handling of Localization errands in an effective
and timely manner with our new improvised strategies. The
new strategies, tools and techniques presented here might
add a new dimension to your thinking process and to your
operational style in Agile. The strategies would talk
about forestalling the Localization issues (selective
approach), early detecting the truncation and Layout
issues, pushing Linguistic testing earlier in the cycle,
optimizing the testing efforts by smarter distribution and
redefining build transfer strategies to share with testing
partners in shorter duration.
This paper broadly covers:
- Market and Agile
- Localization in Agile World
- Adobe Products - Successful story
- Comparison of Waterfall vs. Agile (Factual Data
collected during the execution of project)
- Challenges encountered
- Process refinements
- New strategies, tools and techniques
- Localization Best Practices (in Agile World)
|
|
|
Presenter:
Behnam Esfahbod
Research
Assistant, Xencraft
|
Track 3 - Arabic Script SHOULD NOT be so
Scary!
Arabic script has faced radical changes in the past
century that have left it in a critical situation in the
digital era. Writing styles for the script look fairly
different from region to region, yet it always is
considered the same script in the mind of its natives.
Logical encoding of the script and existence of similar
characters in Unicode Arabic blocks have caused serious
challenges for most of multilingual applications.
Providing a model for the textual part of
human-computer interaction, we model security and
usability issues of Latin and Arabic scripts, demonstrate
the similarity of the problems in these two scripts, and
introduce the Arabic Script Shape Mapping algorithm as a
unified method for applications - like internationalized
domain names and user identifiers - to handle such
problems. The algorithm is designed for simplicity and
behaves similar to how Unicode Case Mapping algorithm
works for Latin script and the like.
Shifting the solution from the servers to applications,
we show that how this algorithm can be used to maintain
enhance usability for average users, whilst eliminate the
need for complicated methods like "bundling"
used in internationalized domain names.
|
|
Program is subject to change.
|
|
|