Thursday, January 3, 2013

Kicking Apps and Making Names

Way back in the dawn of time, before I joined Google — OK, fine, two months ago — I was a video-game designer. Occasionally, I had to come up with names for people or places. And I'm no good at naming things.

So once, instead of manually naming hundreds of towns in a (fictitious) foreign country, I weaseled my way out of creativity by writing a ridiculous set of custom spreadsheet functions. My spreadsheet analyzed a list of real placenames from a similar country, then spit out plausible fake names with the same lexical structure.

It worked — but I was pushing spreadsheet functions so far that the “code” (if you can call it that) became difficult to maintain. At the same time, the reason I used a spreadsheet in the first place was so I could lean on the analytical power of pivot tables.

That’s what made Google Apps Script perfect for revamping the project. With Apps Script, I can still use pivot tables, then do the heavy lifting in JavaScript and package everything up as a tidy web app. I call it Name Generator … because I’m terrible at naming software, too.

Now, before you say, “There’s no way I’d call my daughter Harliance,” remember that the goal wasn’t to produce real names. The goal was to produce names that were good enough for a video game. Perhaps Harliance is a cyborg woman of negotiable virtue in dystopian future-America? You should probably pick your daughter’s name the old-fashioned way.

So let’s look at a few of the techniques that NameGen uses.

1. We start out in Google Sheets, first dropping a list of real names into column A, then slicing it into overlapping three-letter segments using the formula =MID($A2,COLUMN(B2)-1,3) (that’s the version of the formula you’d use in cell B2; from there, just copy and paste the formula across the rest of the sheet and the cell references will update accordingly). Here’s a sample of one of the spreadsheets so you can see how the data is set up.

2. We then create a pivot table for each column in that first sheet, just summarizing the column by COUNTA (the number of times each segment occurs). For example, since Lakisha, Nakia, and Nakisha (from our list of real names) share “aki” as letters 2 through 4, the pivot table for Segment 2 shows “aki: 3.”

The plan is that NameGen will randomly pick one of the starting three-letter segments, then look at the last two letters of its selection so that it can find an overlapping segment from the next column. The script then uses the pivot-table statistics to weight its selections toward more common segments. It continues until a segment ends in a blank. This diagram shows how it might build the name Calina:

3. This is where Apps Script takes over. Just once per source list of names, we run the utility function below (shown slightly simplified) to convert the spreadsheet data to a more useful format and store it in ScriptDb. The script can then pull the data from ScriptDb in about 0.5s, versus about 5s to read directly from the spreadsheet. Note that we’re using Script Properties to store the spreadsheet ID rather than cluttering up the code with extra variables.

function updateDb() {
  var language = 'americanFemale';

  // Look up the spreadsheet ID in Script Properties, then grab its sheets. 
  var ssId = ScriptProperties.getProperty(language);
  var sheets = SpreadsheetApp.openById(ssId).getSheets();
  var dictSize = sheets[0].getLastRow() - 1;
  var segment = {};

  // Transform each sheet into the segment object we want later.
  for (var i = 0; i < sheets.length; i++) {    

    // Retrieve the list of real names (first loop) or a pivot table.
    if (i === 0) {
      segment.data = sheets[0].getRange('A:A').getValues();
    } else {
      segment.data = sheets[i].getDataRange().getValues();
    }
    
    // Store other properties so we can retrieve the right record later.
    segment.index = i;
    segment.language = language;
    segment.size = dictSize;

    // Save the object as a ScriptDb record, then start the loop again.
    ScriptDb.getMyDb().save(segment);
  }
}

4. Now, every time the app runs, it jumps straight to the generateNames() function shown below (again slightly simplified). After it queries the database, it’s straight JavaScript — but one Apps Script–specific trick you’ll notice is that we assemble the data into an array using segments[current.index] = current.data rather than segments.push(current.data). Because ScriptDb returns the records in an unpredictable order, we gave our objects an index property to store the correct order.

function generateNames(language, numNames) {
  
  // Query the database to find all results for this language.
  var results = ScriptDb.getMyDb().query({language: language});
  var current = {};
  var segments = [];
  
  // Assemble the DB records into an array so we can pass it around.
  while (results.hasNext()) {
    current = results.next();
    segments[current.index] = current.data;
  }
  
  var names = [];
  var segment = '';
  
  for (var i = 0; i < numNames; i++) {
    var name = '';
    
    // For each requested name, pick one segment, making
    // sure it overlaps with the previous two letters.    
    for (var j = 1; j < segments.length; j++) {
      segment = randomSegment(segments[j], name);
      name = name.slice(0, name.length - 2);
      name += segment;
      
      // If the segment wasn't full length (end of a name), done!
      if (segment.length < 3) {
        break;
      }
    }
    
    names.push(name);
  }
  
  return names;
}

I haven’t explained the randomSegment() function, but you can probably guess at how it works based on the description above. Still, if you want to dig in further, you can view the full source code here.

5. The only remaining step is to expose the results to the world through a web app. Apps Script provides several ways of doing this; I used Html Service, but didn’t require any of the advanced features. Here’s how little HTML we need to turn NameGen into a functional app:

<html>
  <body>
    <script>
      function sendRequest() {
        var language = document.getElementById('language').value;
        var numNames = document.getElementById('numNames').value;
        google.script.run.withSuccessHandler(updateField).
            generateNames(language, numNames);
      }
    
      function updateField(names) {
        var output = "";
        for (var i = 0; i < names.length; i++) {
          output += (names[i] + '<br/>');
        }
        document.getElementById('resultsBox').innerHTML = output;
      }
    </script>
    <select id="language">
      <option value="americanFemale">American Females</option>
      <option value="americanMale">American Males</option>
      <option value="american">American Towns</option>
      <option value="british">British Towns</option>
      <option value="french">French Towns</option>
      <option value="irish">Irish Towns</option>
      <option value="italian">Italian Towns</option>
      <option value="spanish">Spanish Towns</option>
    </select>
    <select id="numNames">
      <option value=1>1</option>
      <option value=10 selected>10</option>
      <option value=100>100</option>
      <option value=1000>1000</option>
    </select>
    <input id="generateButton" type="button"
     value="Generate" onclick="sendRequest()">
    <div id="resultsBox">
    </div>
  </body>
</html>

And presto chango, you have a web app that harnesses the tremendous power of Google’s infrastructure … to produce names that could only ever exist in a parallel universe. It’s like Adriano Celentano’s "Prisencolinensinainciusol" — a convincing rendition of an American pop song, unless you actually speak English, in which case it’s total gibberish.




Dan Lazin   profile | twitter

Dan is a technical writer on the Developer Relations team for Google Apps Script. Before joining Google, he worked as video-game designer and newspaper reporter. He has bicycled through 17 countries.

2 comments:

  1. It sounds a lot like Markov chaining, one of my favorite algorithms. I came up with some similar scripts exploring why it is so hard for most people to come up with new words. I could run it against any word list, generate similar names that weren't in that list. I could also control the "look-ahead" levels to end up with even more reasonable words.

    ReplyDelete
  2. Yep, it's a Markov chain -- well-identified! I should have named it in the post.

    ReplyDelete