Friday, July 11, 2014

Python Pt 1

Python pt. 1 

In the past couple months, I have been trying to get more experienced with the python programming language. I do not have a very strong programming background. I took a C/C++ course in undergraduate, but at the graduate level, I’ve had to teach myself IDL. I never had a statistics course, or a computational methods course. Therefore, I feel quite behind when it comes to scientific computing and data science. In order to rectify this, and to make myself a stronger candidate for positions outside academia, I am trying to develop my skill set.

Python is a completely free programing language that anyone with time and desire can learn. Python is available for nearly all operating systems (Linux/Unix, OS X, Windows). Fortunately, there are plenty of freely available tools to help one learn python. I have decided to work through The Python Tutorial and I am using version 2.7.5. This is not the most recent release of python, but I don’t really want to dig around in my computer to see if I have the most recent release somewhere or not.

After spending some time with the python tutorial, I had also learned about many other free resources for python online. I decided to take a look at the Udacity course Intro to Computer Science, which gives the basics of programming using the python language. I was already familiar with a lot of the basic logical statements like loops and conditions from my previous experience in programming (C/C++ class and working knowledge of IDL), but the course was a good intro to the syntax of python, and showed off some of its features.

The main objective in the course is to build a web search engine, which I found very interesting. There are obviously many good search engines that already exist, which makes the task of building one very accessible to a general audience, if a bit less practical. Probably no one is going to substitute their homemade search engine in place of google, but it is nice to know that one could, given sufficient resources, and without an incredible amount of sophistication.

Actually building the search engine was broken down into parts - first building a web crawler that searches web pages for links to other web pages. Then one has to build up an index of pages and match that index with keywords on the page. Finally, to make the engine practical, one needs to build a ranking system that returns pages in an order that, hopefully, matches the user’s intentions. The ranking system implemented in the Intro to Computer Science course is called PageRank, and is the ranking system that Larry Page used in the early days of google. It basically ranks pages based on the number of other useful pages that link to that page, as a kind of “popularity” measure of web pages.

It took me about two months to actually complete the free-access version of the Intro to Computer Science course on Udacity. I found the exercise quite useful, and I’d like to move to primarily using python for my research purposes, when I can. That is a big move, and opens another can of worms that I won’t get into - mainly because I don’t have experience doing it yet.

I enjoyed the course so much that I started working on another Udacity course: Intro to Data Science, which gives an overview of some other very interesting software packages including Pandas for Python, SQL, and API. I’ve only worked through the first two lessons so far, but I am really enjoying it. I will write another post on the topic after completing more of the course.

Kyle D Hiner, Ph.D.

No comments:

Post a Comment