Alex Sexton

Web Hacking. JavaScript.

Introducing the Jed Toolkit

| Comments

The Jed Internationalization Toolkit

The Jed Toolkit is a collection of interoperable tools to help facilitate the full process of internationalizing applications in JavaScript. These tools have a wide range of utility, from small modules to help format messages, dates, and numbers to services that facilitate translation, and code integration. The goal of the project is to bring the experience and quality of internationalizing JavaScript applications up to par with the rest of the current state of JavaScript tooling.

I’m in the process of moving everything over, but you’ll likely want to watch this space: github.com/jedtoolkit and jedtoolkit.org

The Dojo Foundation and Future

I’m excited that The Jed Toolkit has been accepted into the Dojo Foundation so its users can be sure that it will be a safe, unencumbered resource for them into the future. I’m extremely happy to be part of a family that includes require.js, sizzle, and the dojo toolkit.

After being tasked with internationalizing a large application that I was building I quickly realized that there was little available for JavaScript developers. I had been using Gettext in a python application a few weeks prior, and decided it might be nice to implement in JavaScript. So I did. I called it “Jed” (soon to be gettext2.js) after Jed Schmidt, everybody’s favorite “hobbyist” JavaScripter / Japanese translator.

I was drawn to this problem because there were so few people considering its intricacies, but I was shown by some very smart folks that there was a lot more to internationalization than the little library I wrote. So I wrote another library, and ported a few others.

I was quite happy with how these were turning out. They weren’t especially hard to create, because most of them follow well-documented specifications. I really liked how ICU MessageFormat made a lot of decisions based on how translators think, instead of how programmers think. But naturally, they locked away that goodness behind a syntax/grammar that no non-programmer should ever have to deal with. MessageFormat is great for translations but not for translators. Not in the real world at least. That’s when I realized that the problem was not (only) the tools for writing international apps, but even deeper: in the tools and integration with translators.

It’s all about tools

The translation space hasn’t grown much since computers first existed. We can barely encode files correctly in 2012. However, in other spaces, like content authoring, we have a whole system of tools and integrations to map non-technical users’ intent to structured usable data for consumption. If your local tech writer wants to start a blog, they can! They don’t need to know how to set up a server or that HTML even exists.

The process of getting an app translated is cumbersome, and is a blocker to getting good applications out there. FTP zips and crazy XML specs mixed with Word Documents rule the landscape. There are no decent apis, or automatic integrations that anybody is using at scale. I want to set out to change this.

Translators aren’t all to blame. If you were a translator and got the message “fair”, would you translate it as a carnival, or as ‘just’? We set our translators up for failure with our context. We can do better. We can describe messages, and their variables. We can offer examples and photos of the context. We can even translate the app in real time and they can see their translation literally running in the place it will live.

The goal of the Jed I18n Toolkit is to help make the internationalization process much more accurate and enjoyable for all parties. We should be able to write our messages directly in our templates in whatever format we think is best. Our messages should be automatically culled, and deduped and sent into a translation queue. The translator shouldn’t be presented with anything other than things that help them translate. The programmer’s format should be irrelevant. Context is king, and a bunch of crazy sprintf characters and html are just noise. When the translations are done, they should exist as a service or api and be updated in real time. Gone should be the days of the 2 month translation code freeze. You should be able to write a post commit hook that gets your translations through the system as fast as you can find someone to translate them.

There’s a lot to decide on how to bring all of these ideas into the project in a generic, but still usable way, and it will take some time to get everything right. Right now I’m starting by putting in the few open source projects that are already out there as well as showing early beta work on some of the integration tools. Please be patient with me and send me your suggestions and frustrations so we can finally bring internationalization out of the dark ages.

Third Party JavaScript in the Third Person

| Comments

At the end of last year I spoke at the awesome CapitolJS Conference in DC. I was encouraged to talk a little more broadly about third-party JavaScript development and its various quirks. When I got back home to AustinJS we had some time for a short talk. Luckily, Logan Lindquist is usually there to film talks for Austin Tech Videos. Long story short, my talk and slides are now available for Third Party JavaScript in the Third Person.

The slides are available (likely in webkit) at: alexsexton.com/talks/thirdparty

I can’t thank Logan enough for doing Austin Tech Videos. It’s a treasure trove of all kinds of talks from around town.

Third Party Front-end Performance

| Comments

I work for a company called Bazaarvoice. Our core products (Ratings and Reviews is our biggest) are all implemented as third party javascript applications. We are white-label, so you don’t see a ton of our brand around, but we power the User Generated Content (UGC) behind Walmart, Samsung, Best Buy, Proctor & Gamble, etc, etc. Needless to say, we have one of the highest volume third party applications on the internet. Fun stuff. There are other massively successful and smart companies doing similar things (take a look at Disqus or even peek into the Google or Facebook button code).

Performance Matters

Our core applications were built nearly 7 years ago, and gained features everyday over that period of time. As you can imagine, performance started to suffer. Since we’re on the product page of major retailers, we knew that this wouldn’t stand. I was tasked with re-thinking our solution with performance at the forefront of our architectural and deployment strategies. I attacked three different types of performance at varying levels of depth.

  • Network
  • Injection/Rendering
  • Application

Bazaarvoice has a developer blog that I sometimes write for, so I wrote an article on Third Party Front-end Performance. Normally I don’t just link across, but I think this information is actually fairly applicable to a large chunk of developers. So check it out. Feel free to comment either place. Part 2 and 3 to come.

The UX of Language

| Comments

I once read a tweet that I haven’t been able to attribute back to its original source (Paul Irish has sleuthed and found the original source to be a Greg Brockman tweet), but said something along the lines of

Web programming is the science of coming up with increasingly complicated ways of concatenating strings.

I think that there’s a good reason for this.

As we turn to the next chapter of the web and start building more web applications, rather than just web documents, we begin to need to mix language with data. This presents an extremely difficult challenge if you only have to consider a single language. Often times we have to support many.

While the data density in language increases on our sites and apps, we must consider the user who has to read those sentences. The goal is to offer interesting data to the user without just generating a table and without sounding like Borat writes the copy. You can spend as much time as you’d like on the interaction and visual design, but if your app doesn’t have a good flow, you’re leaving the most important cards on the table: the content.

Let’s Focus on English for a Second

Number of Results: 15

in Number of Categories: 3

This is the type of language that we’ve all come to expect out of the web. It gets the point across, but I’d argue that it’s not very personal or natural. That may not matter in this particular example, but consider building anything social. Identity is becoming a huge part of the next wave of apps. If you want to have any chance at personalization or identity, then the following isn’t going to cut it:

Alex added 5 friend(s) to their group on March 19, 2012

This is in stark contrast to some of the more popular social networks.

/images/uxoflang/google-mf2.png

Not only does this correctly address the correct pluralization for ‘people’, it also takes into consideration the offset from the total count of the 3 people that are listed explicitly. This attention to detail doesn’t stop at decent pluralization, but also gender.

/images/uxoflang/facebook-mf.png

/images/uxoflang/facebook-mf2.png

Remember the old days when Facebook used to refer to everyone as a ‘their’? That was weird. Don’t fall into that trap. When it comes to good user experience, you can’t ignore the text. If you can go through your app and find any sentence that you’d write differently if it wasn’t generated by a computer, you are reducing the effectiveness of that text and degrading the overall experience.

Iteration

Assuming you’re sold on the idea of a proper language treatment of your app, let’s try to solve the problem using some of the examples we’ve already seen.

There are X result(s)

In JavaScript, the naïve solution would be to do something like the following:

1
2
3
4
5
6
if ( X === 1 ) {
  return "There is one result";
}
else {
  return "There are " + X + " results";
}

This results in a proper english sentence in all cases. That’s great, but let’s get a touch more complex. Now we want to support another language. The pluralization rules for English are different than the rules for French. If we want to support French, our solution starts looking more like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
if ( lang === "en" ) {
  if ( X === 1 ) {
    return "There is one result";
  }
  else {
    return "There are " + X + " results";
  }
}
else if ( lang === "fr" ) {
  if ( X > 1 ) {
    return "Le, there are " + X + " results";
  }
  else {
    return "Le, there is " + X + " result";
  }
}

This quickly becomes an unscalable solution. You might be thinking “Well, Alex, I never plan on internationalizing my app, so the rest of this article doesn’t apply to me.” – I’d encourage you to consider the original example as a more pleasant English-only sentence:

There are 8 results in 2 categories.

Even when ignoring locale, we still have some combinatoric debts to pay. Let’s check out the code for handling this naïvely:

1
2
3
4
5
6
7
8
9
10
11
12
if ( ResCount !== 1 && CatCount !== 1 ) {
  return "There are " + ResCount + " results in " + CatCount + " categories.";
}
else if ( ResCount === 1 && CatCount !== 1 ) {
  return "There is one result in " + CatCount + " categories.";
}
else if ( ResCount !== 1 && CatCount === 1 ) {
  return "There are " + ResCount + " results in one category.";
}
else if ( ResCount === 1 && CatCount === 1 ) {
  return "There is one result in one category.";
}

This is pretty painstaking even without the chore of translating it into other languages. You cannot split up the halves of the sentences safely (especially if you want multiple languages), because the two halves may not match. You can imagine that when gender is added to this sentence, things explode even further.

Alex searched for an image. He found 5 results in one category.

Now we have to multiply that logic times the number of gender choices. Most specs have 3 gender choices: “male”, “female”, and “other”. “Other” specifically is treated “as if you cannot determine the gender of someone who is far away from you.” In the case of this simple sentence, we’d need twelve copies of the sentence – one for each combination of gender and plural form of the nouns.

Gettext (jed)

I recently released a library, Jed, for using Gettext style messages in JavaScript. Gettext is a GNU spec that’s been around for ages. I had been exposed to a little bit of Gettext from my python days, and considered it to be the most popular solution to many of these problems. The main feature of Gettext is that you can decouple the messages from the plural-forms of a given language.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
var pluralForms = {
  "en" : function ( x ) {
    if ( x === 1 ) {
      return 0;
    }
    return 1;
  },
  "fr" : function ( x ) {
    if ( x > 1 ) {
      return 1;
    }
    return 0;
  }
};

We now get back an index of sorts. “Gettext” refers to the lookup/loading/encoding mechanism more than anything, but with these plural forms we could do a lookup for the correct string, stored as data.

1
2
3
4
5
6
7
8
9
10
11
var translations = {
  "en" : {
    "somekey" : [ "There is one result.", "There are %s results." ]
  }
  "fr" : {
    "somekey" : [ "Le, there is %s result.", "Le, there are %s results." ]
  }
}

var lang = "en";
var msg = sprintf( translations[ lang ][ "somekey" ][ pluralForms[ lang ]( X ) ], X );

In this case we’re using sprintf style replacement after we do the key lookup. That tends to be the most common way to do substitution with Gettext. Now we have a solution that relies on data instead of one-off code blocks for each message. Also, if your sprintf supports positional variables, you can now solve the problem that different languages order sentences differently than english does.

Soon after I released Jed it was shared on es-discuss. Immediately Norbert Lindenberg stepped up to tell me that I was making a mistake by choosing Gettext. How right he was. The best example of my oversight is actually one that we’ve already seen:

There are 8 results in 2 categories

How would we represent this in Gettext’s PO format? The plural-form functions can only take a single number to decide the plural form. This sentence would need to be split up again, which won’t work across languages and often won’t work well even in English. Gender can be added in by utlizing Gettext’s context feature, but it only goes one level deep. What if I needed an actual context AND a gender selection?

Norbert was kind enough to point me in the direction of the ICU MessageFormat spec. I could quickly see that some smart people had thought about this a lot longer than I had. Using Jed for Gettext can still be nice if you already have invested in using Gettext in other parts of your stack, but I’d generally suggest against it in favor of MessageFormat.

ICU MessageFormat

MessageFormat is actually just a few specs pasted together, but they look similar. They may seem vaguely familiar to those who have ever used Java’s ChoiceFormat utility. They are different in a few ways, but the important part is that they more or less solve multiple plurals and gender specificity without as much of the combinatorics game.

The MessageFormat spec contains PluralFormat and SelectFormat in the most common cases. Using the syntax in PluralFormat we can address multiple plurals in the same sentence. All the pluralization data is standardized and pulled from CLDR and not needed as user input. There are keywords that come back as a result for any given input number: “zero”, “one”, “two”, “few”, “many”, “other”. All languages can be roughly mapped to these keywords, and it is the basis for some of the keywords in the message.

I won’t go in to much detail about the syntax, as that’s not the point of this post.

PluralFormat

1
2
3
4
5
6
7
There {ResCount, plural,
        one {is one result}
        other {are # results}
      } in {CatCount, plural,
        one {one category}
        other {# categories}
      }.

Using PluralFormat we were able to decouple the pluralization of each of the nouns.

SelectFormat

Gender is usually handled via SelectFormat which works much like a switch statement (except default becomes other).

1
2
3
4
5
6
7
8
9
10
11
{GENDER, select,
  male {He}
  female {She}
  other {They}
} just found {ResCount, plural,
        one {one result}
        other {# results}
      } in {CatCount, plural,
        one {one category}
        other {# categories}
      }.

At the top we are able to determine the gender to use and then reuse most of our code from above to have a multiple plural and gender-specific sentence in as few characters as possible. Exceedingly complex sentences can often still require nesting and combinatorics, but for the majority of cases, you can avoid repeating any logic.

There are also complex ‘plugins’ that can be used. The offset option will help you generate sentences like in the google plus example earlier.

NumberFormat

Technically, I don’t think NumberFormat is part of MessageFormat – but it is usually necessary to pull in. NumberFormat allows you to internationalize things that we haven’t even covered yet. Ever consider that other countries use , characters where the US uses . and visa versa? Number format is how you handle numbers, percentages, and currencies across languages.

1
2
3
4
1234.5       //Decimal number
$1234.50     //U.S. currency
1.234,57    //German currency
123457%      //Percent

Tools

messageformat.js

Shortly after releasing Jed, I released messageformat.js. It’s a much less sexy name, but perhaps I’ll fix that soon. Google also has an implementation for people using the Google Closure library: http://code.google.com/p/closure-library/source/browse/trunk/closure/goog/i18n/messageformat.js.

While both are likely to be sufficiently fast, I did implement messageformat.js as a compile to JS language. This means that at build time, you can ‘precompile’ your messages and ditch the majority of the library. This creates some great opportunities to be able to include MessageFormat style strings directly inside of your precompilable templates and have it all compile to a series of string concats. The readme on the project page should be quite helpful to learn the syntax as well as the integration and api.

numberformat.js

My co-worker Oliver Wong was able to do a quick port of the Google Closure NumberFormat.js to not need Google Closure (under Apache 2).

EDIT: moment.js

If anyone was wondering how I handle dates in my JS apps, I figured I’d add it here. Moment.js, by Tim Wood is a library that I lean on a lot. There are plenty of additional internationalization libraries that I could start adding (for collation and rtl, etc), but for right now, the built in localization, and friendly ago syntax of moment.js make for a great user experience around dates.

All Together Now

I plan on updating Jed to actually contain this group of tools rather than the Gettext ones. I think these tools better suit the needs of modern applications. I will certainly keep the old Jed (Gettext) code around for those that require that format. It’s not terribly difficult to integrate with these tools separately now, though.

Conclusion

Language is important. It can get complex. A lot of incredibly bright people have been looking into the constraints of same-language message generation, as well as multi-langual message generation (the spec writers, not necessarily the library creators). These tools and/or ideas should be the starting point of any application that desires to have a good UX.

It doesn’t matter how many drop-shadows or rounded-corners you have, the user shouldn’t have to decode your words. The words are often the most valuable experience.

My Thoughts on AMD

| Comments

So I know it was cool to write blog posts about AMD (A CommonJS JavaScript Module Specification) like a month ago, but I’ve recently had the desire to put my two cents into the discussion. I am likely not entirely objective – as I frequently use RequireJS and have even committed tiny little parts to the project that James likely had to rewrite. I would like to point out up front, though, that AMD !== RequireJS. RequireJS is an implementation of AMD plus a whole ecosystem of usefulness.

I recently had some in-depth discussions with Tom Dale who wrote a pretty popular article against AMD. Tom is a good friend and I respect all of his opinions. I wouldn’t say that he’s since ‘come around’ on the AMD issue, though I may have worn his desire to fight against it. I’ll take what I can get. His business partner and all-around web-tech badboy Yehuda Katz remains less convinced and sent me a few questions that he wanted answered. I won’t add them here verbatim, but I’ll try and touch on a lot of that stuff as well. I think we more or less agree these days (but he by no means necessarily endorses what I’m writing here). I’ll explain.

I’m going to opt to try and not rehash the great responses to Tom’s concerns by James, Dave, and Miller.

AMD is not a script loader

This misconception is likely propagated by the fact that RequireJS shows up on all of the script loader shootouts. I have written most of my thoughts on this already in my most popular (read: only) Quora answer: What are the use cases for RequireJS vs Yepnope vs LABjs. I’ll mostly just leave it at this: Script loading is a means in which AMD/RequireJS meets their requirements. It’s neither a focus of the project or an integral part to using AMD in production.

AMD makes async loading possible, not required

I often hear that people don’t have an interest in asynchronous modules. This is perfectly fine. There are tons of integrated systems in full stack web frameworks that do a lot of great things for preprocessing that just don’t require asynchronous script loading to be great.

AMD is also so other people can include your module asynchronously.

So if you have your app split up into modules via some preprocessor system, that’s fine, but if you’d like your code to be accessible to people who don’t run the exact same stack as you: AMD works everywhere.

So write all your code how you like it, and right before you release it on github, consider adding this:

1
2
3
4
5
6
if (typeof define === 'function' && define.amd) {
  define(function () {
    return TheModule;
  });
}
// Feel free to also leak the global and/or test for a CJS enironment.

And to take this a step further, I’d encourage the preprocessing systems to just process to AMD.

1
2
3
4
5
6
7
// Node style modules
var x = require('x');
var y = require('y');

var res = doSomething(x, y);

exports.module = res;

Instead of preprocessing this code to work on the web by just concatenating the globals in the correct order, why not just use AMD? It could easily be translated at preprocess time to:

1
2
3
4
define(['x','y'], function(x,y) {
  var res = doSomething(x, y);
  return res;
});

Now you have a module that anyone can pick up (and translate it into their own module format if they’d like).

The problem with using anyone else’s format is that AMD is the only format that is suited well for asynchronous loading. This is not important to one group of people, but it is important to a different group of people, and is a valid concern in web development. In this case, AMD is the inclusive module format.

As for ‘synchronous’ module loading with AMD, the api must use the asynchronous pattern, but if a module is already registered, the result is atomic. Not to mention that it’s perfectly valid to use the require('x'); syntax when you can be sure a module already exists. The syntax allows for asynchronous loading, but it doesn’t require you to load things that way.

If you don’t like parts about AMD, and prefer to preprocess, that’s fine. I would encourage you to process to AMD though. Much like JavaScript is a common compilation target, I’d like to see AMD become that as well, for modules. It won’t help when in 5 years every project handles modules with their own preprocessor format.

AMD requires preprocessing anyways, right?

Yep, but not until build time. Any sane user of RequireJS uses the amazing r.js build tool before pushing to production. So why not just preprocess on request and output “modules” in the correct order and wrap them in a big IIFE?

It would likely work, but in my opinion there is value in being able to develop directly out of a folder, with only static resources. It’s the reason that so many new developers are using LESS.js – just load the file and go. It doesn’t require you to install a watcher, or set up rails, or even learn how to install an additional whole programming language to your highly varied OS landscape. AMD works without all the stuff. It is a developer experience with immediate gratification.

Many people disagree with the sentiment that our tools should all be able to run on the client side in order to facilitate the idea that our web pages can run out of static directories. I definitely agree that there are tons of uses for preprocessors that run outside of the browser, but I don’t fundamentally agree that modules are part of the ‘just nice to have if you can figure out the watchers’ group of tools. Real modules are going into browsers soon, and I think they should be part of the first class citizen group of tools that we run in our browsers. Not to mention, if you do opt-in to using a pre-processor and you want someone else’s preprocessor to be able to understand your modules, you’ll have to agree on a standard compilation target anyways. What better than AMD?

Much of what James writes about in his posts are about how there are a few ‘meh’ things about AMD, but that it meets nearly every requirement of a module system better than any alternative. He doesn’t often mention AMD as a compilation target, but I think it solves everyone’s problems.

It’s not nearly as ‘complex’ as you think it is.

AMD is implemented by attaching a scriptloader to an object. Load something, store it in the object, if someone else asks for it, pull it back out. Much of the size in RequireJS is for other amazing features and developer tooling. You can switch out RequireJS in production with Almond – it is 857 bytes.

The other notion is that it’s a chore to write. If we ignore the fact that most modern code editors could easily store the boilerplate for you, I would argue that it’s actually less characters than not doing it. Also, (nearly) everyone who says this also complains that it forces them to nest their code an extra level, and then they turn around and immediately wrap all of their code in an IIFE.

1
2
3
4
5
6
7
8
9
10
11
// An IIFE for leakage. One level of nesting.
(function(){
  // require statement per dependency
  var x = require('path/x'),
        y = require('path/y'),
        z = require('path/z'),
        a = require('path/a');

  doStuff();
  exports.module = { ... }; // if you're doing more commonjs type stuff, not always
})(this);

Here’s the same AMD module:

1
2
3
4
5
6
7
8
9
require([
    'path/x',
    'path/y',
    'path/z',
    'path/a'
], function ( x, y, z, a ) {
  doStuff();
  return { ... };
});

~36% less to type, by my quick and dirty measure (stripping whitespace and counting chars). It can be even less if you don’t have any dependencies and just include a define at the end of your file like we did in a previous example.

Sure there’s a level of nesting, but it’s one that you’ll need to put there regardless of whether you’re using AMD or not. Sure that can be generated at request time, but so can the AMD callback function boilerplate.

Why not just polyfill ES6 Modules?

Well, first off, they’re not entirely baked. David Herman – the author of the original Module proposal – just posted a pretty delicious update for some ideas for a simpler and sweeter module syntax.

Secondly, I think it’s a great idea. As soon as the syntax settles back down I think we could all consider setting up preprocessors to allow for the syntax in the browser. This would require a request-time preprocessor because of the invalid nature of the syntax. So if you were able to run a preprocessor, then I think you should use the ES Harmony Module syntax. I think it should then compile to AMD, obviously, but you shouldn’t ever have to worry about it.

Quick RequireJS plug

I obviously like AMD a decent amount, but I also think RequireJS is a fantastic tool in my stack. The ‘plugin’ architecture is not incredibly well-documented, but it essentially acts as module middleware. My favorite application of this feature involves templates.

I use the pattern in my require-handlebars-plugin to great effect. Rather than require a precompiled template during dev mode, I simply prefix my module name, and the template compilation occurs before the callback is invoked.

1
2
3
4
5
// In a no template middleware world
require(['handlebars', 'text!someTemplate.hbs'], function ( Handlebars, strTemplate ) {
  var fnTemplate = Handlebars.compile( strTemplate );
  fnTemplate({"some": data});
});

This is also similar to the technique where templates are stored in type="script/tmpl" script tags and pulled in by their id. This means that if you want to compile your templates at build time, you need to fundamentally alter every location that retrieves a template.

With the hbs template plugin, this is handled for you.

1
2
3
4
// With require-handlebars-plugin
require(['hbs!someTemplate'], function ( fnTemplate ) {
  fnTemplate({"some":data});
});

During development, this loads the template in as text and precompiles it for you. At build time it automatically outputs a precompiled module that is concatenated into the build. This ability alone is like magic to me and I just wanted to tell everyone how much I like it.

Conclusion

You don’t have to like AMD more than whatever you use. It would be nice if we had a standard, compatible web module syntax though. I think AMD is the best-suited candidate for that role, and that the preprocessors should use AMD as a compilation target, rather than a direct competitor. Obviously it’s still cool if you just use it straight up, but that doesn’t need to happen for it to succeed.