SciComp

Beware of the Python

There is a huge thing missing from Paul Wilmott's Big Book of Banking. Data. The dirty secret of much quant work is that sucking data out of databases, Bloomberg, Reuters, spreadsheets and files is between you and what you need to build to actually make some money. (Yes, people are making money, it’s just you). This can often be done with a little C++ or VBA, but often the data source has all sorts of garbage in it. This will include data you don’t need, and data that is simply wrong, or even corrupted. So the code to deal with it is going to be more complex than simple reading. To find the good, and filter out the bad data, may require a lot of business logic. A “bad” price may be a function of what the prices have been recently, or it may be an error such as when bond yields (typically 5-10%) are given, rather than the price (often around 100). Going from one to the other isn’t that hard, but if you’ve dealt with data you know that the source will sometimes just change for no reason. Corporate actions like stock splits will need to be lined up with time series of the price, and dates may be in the DD/MM/YYYY form or MM/DD/YYYY, or DD/JAN/YY or the number of milliseconds since Nassim Taleb was born. And yes, a given source will sometimes change with no warning, causing all sorts of fun. Sometimes the data will get rejected by the thing you want to put it in, and finding the error is helped by some degree of automation. This is best handled by a scripting language like Python, Perl, Ruby but I have seen UNIX Shell scripts based upon awk, sed and grep. This work is critical to your business unit, but it’s a dangerous are to get too deeply into. It’s very hard to be “excellent” at pumping datga around, but as the examples above show, it’s easy to be the person blamed for it going wrong. You are always playing catch up. It’s also work that is never finished, and you are the one they call to fix and upgrade it, and it’s far from unknown that this becomes your main work, sometimes all of it. The importance of the work gives a strong incentive for your boss to keep you doing it regardless of previous commitments to develop you at the firm. The fact that the toolset of 3rd party software and language is separate from “real” quant work just makes this worse. It’s harder for others to take your work, and many managers will see it as “trivial”. It is of course not very portable to other business units either. Search for jobs that say “Quant with data importing experience wanted” See how far you get. It’s nice to be useful, but it is not work that attracts the best bonuses, nor will you learn as much about your business. It is not just something that will slow down your progress, since this is basically IT work, there is a real danger that you will be sucked into the IT department, taking a big hit on your bonus, and crippling your career progression. You can’t easily avoid doing some of this, but you need to watch the % of your time spent doing it. If it seems to be trending up, then it’s unfortunately necessary to sit your boss down and talk this through. If the trend still goes up, you do really need to leave. Not today, but you probably have to leave before you find that you cannot leave.