When they proved too brittle and expensive to scale for large buildings, stone and masonry were supplanted by steel and concrete. Natural Language Processing (NLP) faces the same challenges.
This blog talks a lot about systems architecture, which lives in a software world. Physical architecture, built of real materials, nonetheless provides an analogy. Natural Language Processing (NLP) dates back to the earliest days of computing[1], so it’s analogous to the earliest building materials: stones and masonry.
Aimee Maree (@aimee_maree) tweeted:
programmers are the stone masons of the modern world lets not kid ourselves, and the many of us have sat for many years just building, toiling away at our works, perfecting our art, carving a new world for the better or worse depending who’s glass your looking through
And, like those ancient building materials, NLP is useful, but labor-intense and brittle. For buildings to rise to new heights, new building materials were required. The same is true of software systems.
NLP: Bricks and Stones
Natural Language Processing falls into two main categories: rules-based and statistical. Rules-based systems were more popular initially. They’re simple conceptually: just write the rules to interpret text and pull them into one system. They’re the masonry – thousands to hundreds of thousands of human-created elements leaning on each other to build a system.
But in the 1980s, we saw the emergence of statistical methods – Bayesian, Hidden Markov Models, etc. – which took advantage of the growing availability of both computational power and data. These models don’t “understand” text the way a human does – in some ways, they’re as dumb as rocks – but they do achieve results automatically that decades of rule-writing did not.
Masonry and stones can be used to build grand structures – Notre Dame, the Taj Mahal, the pyramids in Egypt and Mexico. But stone and masonry are heavy. Bigger buildings require thicker walls to support the weight, which is more weight, which require thicker walls to support. That cycle limits the size of masonry buildings as more material – more rules or more training data – must be provided. Even with great technical leaps – flying buttresses or HMMs – the limits of diminishing returns remain.
This is compounded by the brittleness of these materials. Stone and masonry perform very well, but cracks can spread rapidly with catastrophic results… in both cathedral towers and NLP systems. In the latter for instance, accuracy measures are carefully manipulated to look only at areas where rules have been written, hiding the cracks in the system’s processing of rare/long-tail events.
Further, clay must be dug, shaped into bricks, fired, shipped, and then lifted into place. That’s a lot of labor, and that’s precisely what we see with rules-based NLP systems. For instance, one computer-assisted (medical) coding NLP system we are familiar with has well over 500,000 rules… 500,000 bricks that had to be put into place by hand. Maintenance? Well, it’s like replacing a brick in Il Duomo, hoping it does not collapse in the process!
ML: Steel and Concrete
Then steel came along. Strong in both compression and tension, steel enabled entirely new types of structures and solved some previously insoluable problems. Elegant suspension bridges gracefully spanned rivers; skyscrapers grew taller and taller. And innovation in building forms was unleashed. Steel was the right material for these new structures.
So what is the ‘steel’ in text processing? Many propose that some variant of Machine Learning is the new material. And certainly, the success of Deep Learning in some text-based problems is laudable. Tremendous strides in machine translation – a problem which bedeviled NLP for decades – have been made in less than a decade by Deep Learning.
Unfortunately, Deep Learning is not as adaptable as steel… nor is it as generally applicable as many want to believe. One need look no further than the struggles of IBM Watson Oncology[2] to see that it has limits. Perhaps there is no steel for text processsing.
But that does not mean that some problems… some types of buildings… won’t benefit from other types of Machine Learning. The Romans built the Pantheon from unreinforced concrete. The technology advanced and enormous structures like Hoover Dam could be built using steel-reinforced concrete.
We believe text representation with distributed, fixed-length vectors are analogous to concrete. Often, just as concrete “competes” with traditional masonry and rocks, vector representation is considered where NLP technologies are deployed. Further, early text representations were a “bag of words” with no semantic structure to reinforce them… a bit like the concrete in the Pantheon.
NoNLP™ representation has that reinforcement as it incorporates the text’s structure into the vectors. We therefore expect that it will be better than the rocks and masonry of NLP in multiple applications. We have shown that new buildings, and new building types, are possible with this improved material. Concrete and steel have surpassed stone and brick.
[1] Like this 1957 classic from Noam Chomsky: https://onlinelibrary.wiley.com/doi/abs/10.1002/asi.5090080406
[2] See IBM Has a Watson Dilemma (WSJ) It’s not clear if Watson Oncology is based on Deep Learning or an old-style rules-based decision engine. Some of ‘Watson’ is based on Deep Learning. Use of Watson Oncology here is thus illustrative of the (Marketing) feeling that Deep Learning is the solution to all problems… that it’s steel.