Friday 29 January 2016

Basics of Numpy & Pandas

Numpy and pandas are very popular packages used for data analysis. this post provide you quick guide to learn the basics of these. We assume you are familiar with vector operation . 

Numpy is mostly written in C & pandas are written over numpy & also uses C programming language as core implementation So these two are fast as compare to general data structure provided with python.

Numpy array is similar to python list but it contain all data of same data-type & it is much faster.
we can also treat numpy array as a vector.

Pandas uses scalar instead of list or numpy array but we can also visualize it as numpy array with some advance functionalities. 


Numpy uses array whereas pandas used scaler
In [2]:
import numpy as np

Array are similar to python list , but it all element must be of same data type, and it faster than list

In [13]:
num = np.array([3,4,2,5,7,23,56,23,7,23,89,43,676,43])
array([  3,   4,   2,   5,   7,  23,  56,  23,   7,  23,  89,  43, 676,  43])

Lets see some of functionality

Importing CSV File in Python

CSV file is one of most common used format we encounter. Let see few technique to efficiently import these data to python for better & efficient use.
CSV is also known as comma separated format. you can easily write a code to import CSV data .
But today we will see some of very efficient way to do so.

Python provide few package like CSV, unicodecsv, pandas Thease packages help you to do your job.

  • CSV : package follow basic approach & slow , it reads data line by line & split.
  • unicodecsv : Import data & create a list of dictionaries each data is associated with its attribute as key. So it  coud be very usefull in many cases but this process is also slow
  • pandas : This technique is very fast as compare to  above two methods.
You can download & test in your computer using Ipython notebook [Github_link]


In [2]:
import unicodecsv
import pprint
import csv
In [3]:

Json Import of csv file

Import data in json formate, All data are imported as string

In [16]:
enrolment = []


reader =unicodecsv.DictReader(f)
#reader is a itterater so loop is possible only once
print "type(reader) =",reader

#for each_row in reader:
#    enrolment.append(each_row)
enrolment=list(reader) #shorthand for above two line

#close file

print "Total no of row : ",len(enrolment),"\n\n"

#print demo data
type(reader) = <unicodecsv.py2.DictReader instance at 0x04A10A08>
Total no of row :  1640 

{u'account_key': u'448',
 u'cancel_date': u'2014-11-10',
 u'days_to_cancel': u'5',
 u'is_canceled': u'True',
 u'is_udacity': u'True',
 u'join_date': u'2014-11-05',
 u'status': u'canceled'}

## Simple import

Wednesday 20 January 2016

Pattern Search using Regular Expression in Python

In our daily life we found lots of problem of pattern search. like find all mobile numbers, or emails etc  from  given web page or from any file.

Writing manual code for that is not efficient and also very messy. Regular Expression is very popular technique used for pattern search (All compiler & interpreter use it ) & it is very easy to implement & it is very efficient .

I suggest you to write a code for extracting all emails from a webpage without using regular expression & test it .
Emails found on a webpage may follow some of these pattern like

Blogger Widgets