How to create Pandas data frame in Python

We provide affordable online training course(via ZOOM meeting) for Python and R programming at fundamental level, click here for more details.

Pandas data frame is a data object type that stores tabular data. It acts like a spreadsheet in Microsoft Excel, that each row represents a sample, with columns representing different information for the sample. Data frame is widely used in reading and storing labeled data, because there are two index along the rows and columns store label information. Furthermore, different columns of a data frame can have different variable types.

To generate a data frame, it is naturally easiest to using Pandas Data.Frame() function, by inputting a dictionary in which keys become column labels and values become column values of the data frame.

#Import Pandas and Numpy module
import pandas as pd
import numpy as np
#create a dictionary
dict1 = {'name' : ['wilson', 'shirley', 'mico', 'mia', 'miaomiao'],
        'age' : [32, 31, 8, 3, 13],
        'gender' : ['male', 'female', 'male', 'male', 'male']}
#create a data frame by inputting dictionary
df1 = pd.DataFrame(dict1)
df1
#output
       name  age  gender
0    wilson   32    male
1   shirley   31  female
2      mico    8    male
3       mia    3    male
4  miaomiao   13    male

Not necessarily that all the columns of a dictionary are inputted to DataFrame() when creating a data frame. You can select wanted key-value pairs instead.

#select key-value pairs from a dictionary, to create a data frame
df2 = pd.DataFrame(dict1, columns=['name', 'age'])
df2
#output
       name  age
0    wilson   32
1   shirley   31
2      mico    8
3       mia    3
4  miaomiao   13

In the previous example, we have seen Pandas automatically adds index labels for rows when creating a data frame. But you can manually define row labels by using ‘index’ option in DataFrame() function.

#manually define row label index, by setting option 'index'
df3 = pd.DataFrame(dict1, index=['p1', 'p2', 'p3', 'p4', 'p5'])
df3
#output
        name  age  gender
p1    wilson   32    male
p2   shirley   31  female
p3      mico    8    male
p4       mia    3    male
p5  miaomiao   13    male

In many cases, a data frame will be generated by inputting a Numpy array. Usually we can set both ‘index’ and ‘columns’ options if necessary.

#generating a data frame by inputting a Numpy array
#setting index and columns labels manually
df4 = pd.DataFrame(np.arange(12).reshape((4,3)),
                     index=['p1', 'p2', 'p3', 'p4'],
                     columns=['Person', 'Age', 'Sex'])
df4
#output
      Person  Age  Sex
p1       0    1    2
p2       3    4    5
p3       6    7    8
p4       9   10   11

You can also watch videos on our YouTube channel for more understanding of Python programming skills.

Published by wilsonzhang746 on August 22, 2024August 22, 2024

0 Comments

Leave a Reply Cancel reply

How to create a data frame from nested dictionary with Pandas in Python

How to delete columns of a data frame in Python

Using isin() to check membership of a data frame in Python