Apple Explains How It Uses U S Census Data And Ml To Make Siri A Local

In Apple’s latest Machine Learning Journal entry, the Siri Speech Recognition Team shares an overview of the work behind improving Siri’s understanding of names for regional points-of-interest by incorporating the user’s location.

Based in part on data from the U.S. Census Bureau, Apple has been able to tune Siri to better understand users based on where they are and what POIs they’re more likely to ask about.

Apple says machine learning on its own has helped improve automatic speech recognition for general language over the years, but “recognizing named entities, like small local businesses” has proved a performance bottleneck.

That’s done partly by relying on data collected by the U.S. Census Bureau:

The entry goes on to detail the mechanics behind identifying correct points-of-interest through speech patterns based on location. The Siri Speech Recognition Team says the approach works independent of language, too, so it can be applied to locales beyond U.S. English.

To efficiently search the CSA for a user, we store a latitude and longitude lookup table derived from the rasterized cartographic boundary (or shapefile) provided by the U.S. Census Bureau [2]. At runtime, the complexity of geolocation lookup is O(1).

Read the full entry on Apple’s Machine Learning Journal for a behind-the-scenes look at some of the expertise required for improving Siri.