Alacra Pulse Logo Tagging is at the heart of Alacra Pulse – it’s the technology that sifts through hundreds of thousands of stories and blog posts, delivering just the relevant content to our users. In defining the requirements for Alacra Pulse, we needed a technology which:

  • Is adaptable, letting us move from a concept like M&A to another like bankruptcies
  • Is scalable, so we can deliver tagged news stories in minutes, not hours, and provide the service to clients on a cost-effective basis
  • Can “understand” complex concepts like events and not just return matches to keywords
  • Is highly accurate; 60-70% may be sufficient for search engines but would not meet the needs of our markets

A simple search engine can’t “understand” a concept like an analyst comment on a company or differentiate between a new M&A rumor and a story about a deal that occurred years prior. Due to the vagaries of language, search engines cannot easily sense the difference between Company A buying Company B and Company A buying a product from Company B.

 

The technology best suited for this type of task is semantic tagging. This approach breaks apart sentences, looking to infer the meaning of terms based upon the context in which they are used. Yet even the best semantic taggers typically only achieve accuracy levels of 70-80%, which we knew would be insufficient for this type of product.

 

Our solution to this challenge has three components:

1.       We start with a state-of-the-art semantic tagging engine, which we can use to identify both entities (such as companies or people) and events (such as M&A transactions). The semantic tagger splits each document into sentences, identifies the parts of speech (nouns, verbs, etc), then seeks to match those to known entities. Once it has identified the entities, it identifies relevant events, based upon rules we have defined.

2.       Next, we add what we consider our “secret sauce”, Alacra’s knowledge base, which includes specific information about companies, people, deals and more. For example, if the tagger sees the word “Apple” and wants to know if it refers to Apple, Inc., we have vast information in the knowledge base, which we use to make that determination. For example, we know Apple’s ticker is AAPL, its CEO is Steve Jobs, it is headquartered in Cupertino, CA, it makes iPhones, iPods, MacBooks and more; it’s in the digital media, computer and electronics industries; its partners include AT&T, Rogers, Telus, Bell, Orange and Vodafone, and its competitors include Microsoft, Google, Sony, Palm, RIM and others. Using all of the information in the knowledge base, the tagger can assess whether this mention is referring to Apple, Inc. or not.

Beyond simple company information, the Alacra knowledge base contains information on analysts, their firms and their coverage. We also maintain a proprietary database of M&A deals, so we can accurately determine whether a story is about a real and current deal or not.

3.       Finally, we rely upon human review of the results. While technology is critical, in that it allows the product to scale, technology alone will only reach accuracy levels of 80-85%. That may be sufficient for some products, but when we’re pushing out alerts to users, we believe the bar is higher. So, we have a 24-hour team of editors who review the tagged events to ensure they are accurate.

Together, with use of semantic tagging, the Alacra knowledge base and skilled editors provides the highest level of accuracy. Recent tests show this approach yields precision in the 92-95% range, enabling Pulse users to consistently see results which are relevant, timely and accurate.