“AI” and “machine learning” encompass a number of different technologies. Here’s a summary:
Compared with Legacy Vector Methods: Legacy distributed vector representations operate as a “bag of words” and therefore lose the meaning inherent in document and sentence structure, especially negation. Machine learning can achieve only so much accuracy due to the limited amount of information from the original text that makes it into the vector. NoNLP representation includes that “lost” information, leading to highly accurate machine-learned predictive models.
Compared with Rules-based NLP: The most commonly deployed technology for automatic inference from unstructured text is rules-based natural language processing. In such a system, humans write explicit rules to teach the computer to parse and interpret the input text. Rules-based NLP systems are very brittle. Although they perform well in domains where the rules are followed, they tend to perform poorly for anything outside the pre-specified rules. Consider a system whose rules all refer to a “doctor” that is then presented with text about a “surgeon” or a “physician.” Clearly a rule could be written to overcome this particular synonym problem, but that then points to another weakness: rules-based NLP technology does not scale well as problem complexity grows. Handling negations makes it even more difficult for human-coded rules. This leads typical medical coding NLP systems to have over 500,000 rules, which need to be maintained by an expensive staff of humans, and that results in both high cost and poor response times to queries.
Compared with Statistical NLP: Statistical methods, like vector methods before NoNLP representation, have limited applicability when the text is varied and nuanced. For instance, no two doctors enter notes the same way, using the same abbreviations and shorthand, thus severely limiting the use of these methods on healthcare records.
Compared with Deep Learning and Neural Networks: Deep Learning models perform exceptionally well in some cases. However, the training algorithms are not guaranteed to converge on a solution, and design of the model (e.g., how many layers of how many neurons that connect to how many others) requires a great deal of expertise. Further, Deep Learning requires a very large set of training data and the compute power to process it. This limits the ability of Deep Learning to handle certain scaled-out problems, such as multi-label classification.