In our last post, we talked about how we use semantic tagging to sift through vast amounts of news and alternative media in order to surface relevant content for Alacra Pulse users.
An inherent challenge in semantic technology is the ability to accurately match variations of a name. Whether it’s looking for a person or a company, it’s not a trivial task for a computer to understand the myriad variations in names.
Some name variations are easy. When searching for people, one can create or obtain lists of common nicknames – Bill or Billy for William and Peggy or Maggie for Margaret. But name variations can be much more complex. Surnames may precede given names in certain countries; people use nicknames that are not tied to their given name; news articles contain typos or may simply misprint someone’s name.
With companies, it can be even more challenging. Companies are frequently referred to by their acronyms (IBM or AIG), by tickers (GOOG or MSFT), by their brands (iPhone or Prius) or by familiar names (“Marks & Sparks” for Marks & Spencer). It’s also critical to fully understand corporate family information. When an article talks about the Wall Street Journal or MySpace, we need to “understand” that they are talking about News Corp.
When tagging content for Alacra Pulse, our starting point is to identify the companies being talked about. And that means that whether it’s identified as the iPhone, AAPL or Apple, we need to accurately tag Apple in a story.
To ensure high levels of accuracy in our tagging, we rely upon the Alacra Concordance database. This database, which serves as the information management backbone for all Alacra products, is our master database of companies and identifiers. Tracking more than 400,000 companies, the Concordance database houses both public and proprietary identifiers, product and brand names, names of key executives and common nicknames for the companies. So when an analyst forecasts the number of iPhones to be sold or the declining user base at MySpace, we need to attribute those to Apple and News Corp.
Understanding name variations, corporate family trees and company-product relationships is a critical step in finding relevant nuggets from the web. And it’s one of the key building blocks we use to generate quality results in Alacra Pulse.