Ben Teusch wrote last week
about how data scientist scripts can be caught out by things like changing data structures. As I commented, we’ve all been there, but working with those with a strong computer science background - for me, our development team - can reduce these instances. As a data scientist I’ve learnt to appreciate the rigour, though at times I was frustrated at how much longer it would take to move something to production than it took to do the initial work.
One of the things some of our larger clients with always-on or repeatable surveys have been pressing us for for some time is an API to let them run new data over the models we’ve developed for them (It’s coming soon folks!). The challenge we’ve had is that our process previously took hours to run, especially on the larger, multilingual datasets. We’ve been tackling this by re-writing the pipeline to be much more parallel in operation, utilising a cluster of about 100 machines. This week I was told that we’d reduced the time required to process a large dataset from several hours to several minutes.
I’m not sure what size of People Analytics team you need to have to warrant hiring ML engineers or developers, not just statisticians and data scientists. What I do know is that every data scientist could benefit from the rigour that good developers bring.