CSV file is one of most common used format we encounter. Let see few technique to efficiently import these data to python for better & efficient use.
CSV is also known as comma separated format. you can easily write a code to import CSV data .
But today we will see some of very efficient way to do so.

Python provide few package like CSV, unicodecsv, pandas Thease packages help you to do your job.

CSV : package follow basic approach & slow , it reads data line by line & split.
unicodecsv : Import data & create a list of dictionaries each data is associated with its attribute as key. So it coud be very usefull in many cases but this process is also slow
pandas : This technique is very fast as compare to above two methods.

You can download & test in your computer using Ipython notebook [Github_link]

Example

In [2]:

import unicodecsv
import pprint
import csv

In [3]:

csv_file_name="enrollments.csv"

Json Import of csv file¶

Import data in json formate, All data are imported as string¶

In [16]:

enrolment = []

f=open(csv_file_name,'rb')

reader =unicodecsv.DictReader(f)
#reader is a itterater so loop is possible only once
print "type(reader) =",reader

#for each_row in reader:
#    enrolment.append(each_row)
enrolment=list(reader) #shorthand for above two line

#close file
f.close()

print "Total no of row : ",len(enrolment),"\n\n"

#print demo data
pprint.pprint(enrolment[1])

type(reader) = <unicodecsv.py2.DictReader instance at 0x04A10A08>
Total no of row :  1640 


{u'account_key': u'448',
 u'cancel_date': u'2014-11-10',
 u'days_to_cancel': u'5',
 u'is_canceled': u'True',
 u'is_udacity': u'True',
 u'join_date': u'2014-11-05',
 u'status': u'canceled'}

## Simple import

Import data Row by row

In [23]:

data = []
with open(csv_file_name, 'rb') as csvfile:
    reader = csv.reader(csvfile, delimiter=' ', quotechar='|')
    for row in reader:
        data.append(row)

print "Headings :",(data[0])
print"Data :",(data[1])

Headings : ['account_key,status,join_date,cancel_date,days_to_cancel,is_udacity,is_canceled']
Data : ['448,canceled,2014-11-10,2015-01-14,65,True,True']

Using Pandas ( Very fast than above two)¶

In [4]:

import pandas as pd

In [7]:

enrol = pd.read_csv(csv_file_name)

In [10]:

enrol

Out[10]:

	account_key	status	join_date	cancel_date	days_to_cancel	is_udacity	is_canceled
0	448	canceled	2014-11-10	2015-01-14	65	True	True
1	448	canceled	2014-11-05	2014-11-10	5	True	True
2	448	canceled	2015-01-27	2015-01-27	0	True	True
3	448	canceled	2014-11-10	2014-11-10	0	True	True
4	448	current	2015-03-10	NaN	NaN	True	False
5	448	canceled	2015-01-14	2015-01-27	13	True	True
6	448	canceled	2015-01-27	2015-03-10	42	True	True
7	448	canceled	2015-01-27	2015-01-27	0	True	True
8	448	canceled	2015-01-27	2015-01-27	0	True	True
9	700	canceled	2014-11-10	2014-11-16	6	False	True
10	429	canceled	2014-11-10	2015-03-10	120	False	True
11	429	canceled	2015-03-10	2015-06-17	99	False	True
12	60	canceled	2014-11-10	2015-01-14	65	False	True
13	60	canceled	2015-01-14	2015-04-01	77	False	True
14	60	current	2015-04-01	NaN	NaN	False	False
15	1300	canceled	2014-11-10	2014-11-16	6	False	True
16	369	current	2014-11-10	NaN	NaN	False	False

In [11]:

len(enrol["account_key"].unique()) #unique account keys

Out[11]:

_Beginner 2 Computer Science

Friday 29 January 2016

Importing CSV File in Python

Example

Json Import of csv file¶

Import data in json formate, All data are imported as string¶

## Simple import

Import data Row by row

Using Pandas ( Very fast than above two)¶

No comments:

Post a Comment

Google adsense regulation