Why data science?

3 minute read

If you’ve followed my blog for a while, you’d have seen that I’m fascinated by how data analytics has changed the way people do their jobs. One of the sectors that have benefited from the adoption of analytics is one in which I am working – the reputational due diligence sector. Reputational due diligence is a process by which one examines the character and integrity of an entity – usually a company or an individual. It is an important part of the wider risk mitigation framework that companies usually undertake prior to entering into a deal. Against the backdrop of the tightening regulatory scrutiny, the demand for quick yet trustworthy reputational due diligence methodologies has been steadily increasing.

I have witnessed firsthand the positive impact of analytics in the reputational due diligence sector. I used to work at a traditional due diligence company that performs all of its works manually. The company’s analysts, then including myself, have to comb through stacks of media articles, litigation files and corporate records to uncover potential red flags – a laborious and costly process. In the past year, I have been working at Datarama, a company that combines human analysis and analytics to perform due diligence research. Our team has developed analytic models that enable quick identification of conflicts of interest, political connections and hidden beneficiaries, thus significantly accelerating the overall due diligence analysis.

As a Senior Analyst in Datarama, my main responsibility consists of performing due diligence research on Southeast Asian subjects. Although my work hitherto has been mostly non-technical in nature, I have had the opportunity to work closely with Datarama’s engineers and contribute to the improvement of the company’s automated due diligence platform. One project that I was involved in was a machine learning project that applies natural language processing to quantitatively assess the public perception of a company. Specifically, we used Python’s NLTK package to analyze media articles about the subject company – first categorizing them by topic (topic segmentation) and then analyzing how their tone evolves over time (sentiment analysis). This project exposed me to the entire pipeline of machine learning project, from data collection and preprocessing to model building and deployment.

Unfortunately, in the end, our company decided not to commercially deploy this model. Nonetheless, the project allowed me to appreciate the transformative power of analytics and stirred in me a curiosity towards the field of data science. I then started tinkering with the idea of becoming a data scientist myself. Thus began my “exploratory” phase, where I spent a good month trying to understand the field of data science, by reading blogs, articles and attending local meet ups. In May 2018, I decided to get my feet wet and begin studying data science in earnest. I enrolled myself in two programming courses : MIT’s 6.0001, which was Python-focused, and GaTech’s ISYE60501, which was R-focused. After surviving these two courses, I then took Andrew Ng’s Machine Learning course, a seminal course which is arguably a rite of passage for data scientists-aspirants.

Although I really enjoyed these courses, it definitely wasn’t an easy ride. It’d been almost five years since I graduated from my master’s degree in chemistry, which was the last time I coded and actively used college-level mathematics. Unfortunately, these five years were enough to wipe off most of my memory on these topics. I needed to re-learn the basics – performing matrix multiplication, understanding various probability distributions, constructing Python classes, and so on. Despite the challenges and the lack of sleep they induced, I am proud to say that I completed all of them.

After finishing these courses, I came to the realization that data science could be my calling. I took a hard look at myself and found out that I possess many of the requisite skillsets (logical thinking and quantitative reasoning) and personality traits (curiosity, creativity and tenacity) for a successful career in this field. I then began to chart a career path towards becoming a data scientist. Pivoting one’s career is a big decision, and when faced with any major decisions, I believe that having a thorough preparation is imperative. I scoured the internet to find out recommendations as to how I should move forward. One thing that kept coming back from the sources I consulted was that a graduate degree in computer science or statistics is an important prerequisite to landing a data science jobs. A master’s degree did seem to be a natural progression in my effort to becoming a data scientist, so I decided to pursue one. In my next post, I will detail my experience in selecting, applying and eventually getting admitted to a data science master’s program.

Leave a comment