Sunday, August 10, 2014

Science to Data Science, Pt 1

I’ve traveled to London for a 5-week program on data science techniques. The program is called S2DS (Science to Data Science), and so far it has been really great. There are about 80 participants, who are divided into small (~4 person) teams. Each team partners with a company for the month in order to complete a project that is relevant to the business and serves to teach the participant valuable data science techniques. I actually have an some open questions regarding intellectual property, but glossing over that . . .

We’ve just completed the first week, and we’ve covered a lot of material. It is going pretty fast, and I feel like I’ve been here a while already even though I still haven’t met all of the participants. The lectures have been rather superficial. One simply cannot realistically give an substantial introduction into any of these topics in the small time frames that we have. You can’t learn python in 2 hours, similar for Hadoop, SQL, and machine learning. That said, the introductory lectures are valuable. I am still trying to grasp all the concepts, but hopefully I will get much a more detailed experience as the program progresses.

The thing I am most excited about in this program is expanding my skill set. I’ve always been somewhat uncomfortable with my level of competency with various tasks and programming. I’ve previously written about trying to learn more python, including through the Udacity online coursework. This workshop will really be a time that I can dive into some new things that are applicable both in and outside of astronomy.

So far we have had lectures on good coding practices, Hadoop, Database introduction, SQL, NoSQL, statistics, R, natural language processing, machine learning techniques, and even some more business oriented seminars on economics and marketing. Meanwhile, we’ve also met with our project mentors to outline the project scope for our group work over the next few weeks. I am most excited to dive deeper into the machine learning techniques, because I feel like those are most useful across a wide variety of potential future jobs.

For my team’s project, we’ll be doing a lot of machine learning and trying to do some predictive analysis. The data set that we are getting access to is proprietary, so I’m not allowed to share information about it. But, generally, we’ll be working with customer data to predict purchasing habits and/or demographics of the customer. It is kind of crazy to think about the amount of information people are willingly (or unwittingly) handing over to various companies. Regardless, I am looking forward to learning the techniques of the project and potentially applying them in academia or other industries.

Kyle D Hiner
Data scientist?

No comments:

Post a Comment