Anybody who attended either of my recent presentations at People Analytics World will know that I’m increasingly of the opinion that in our area of focus - text analytics - the really important part of the solution is the curation of a comprehensive and most importantly diverse training data set. Curation is the right, and most important word as to me at least its a skilled and careful, detailed process where each item (in our instance examples of a theme) are chosen because they create more meaning for the overall set than just their own example.
In the first article this week the author (Marianne Bellotti) states that perfect data shouldn’t be the holy grail. In one part she focuses on why we use data in decision making:
For that reason, the process of making a decision is less about an objective analysis of data and more about an active negotiation between stakeholders with different tolerances for risk and priorities. Data is used not for the insight it might offer but as a shield to protect stakeholders from fallout.
Does this mean that our efforts - which we can show provides more accurate classification of data - is pointless? No I don’t think it does. The key is not to think about the ‘AI’ making decisions but as providing insight for discussion. For us some of the most important data-errors comes from the context of what is not in the data, not what is in the data. It’s why we tend to have a conversation with our clients after performing the analysis and why our new reporting is designed for provoking those discussions.