Many of you will know that OrganizationView specialise in text analysis of employee sourced text. We have always used a human-in-the-loop approach and over time we’ve come to realise that the biggest performance improvements come from carefully curating training data.
With any ML model you need both a large volume and variety of examples. What matters are the edge-cases; the examples of rare and often ambiguous sentences which your ML models struggle with. Having large numbers of similar ‘easy’ cases can even reduce the effectiveness of your models, crowding-out more valuable examples.
I’m absolutely delighted to be able to speak to my local NLP group next week about how we tackle this challenge. At OrganizationView we’ve built, and continue to build, our own tooling to enable us to turnaround even the most difficult client projects in a few hours.
As we approach the milestone of 100 million rows of feedback analysed we’ve learnt a huge amount on how to industrialise these analyses.
This talk is going to be ‘For the Analysts’. I’m hoping to share some of the vast number of techniques that we’ve found that work.