Creating data frames in R using data.frame()

We provide effective and economically affordable training courses for R and Python, Click here for more details and course registration !

Data frames are the most widely used data structures in R programming. Unlike each element in vector/matrix/array must have same data mode, a data frame can store data elements with different mode or type in one object. For example, a data frame of family information can have numeric (e.g. age, income), character (e.g. name), and logical (work/not work) data types. Data frames in R act somewhat similar as a spredsheet in Microsoft Excel, where each row represents each observation or subject and each column refers to each variable or attribute.

Creating a data frame using data.frame() function

A data frame can be simply created using data.frame() functioun when each column of the data frame is existent in the current R working session. Following code shows an example of creating a personal health data frame using data.frame(), after information for each corresponding column vector has been created.


# Creating a data frame using data.frame()
#two numeric vectors

> ID <- c(3, 4, 5, 6) 
> age <- c(15, 14, 18, 12)d
#a character vector,nominal
> blood <- c("Type5", "Type6", "Type5", "Type6")
#a vector of character type, ordinal
> status <- c("Poor", "Improved", "Excellent", "Poor")
> pdata <- data.frame(ID, age, blood, status)
> pdata
  ID age blood    status
1  3  15 Type5      Poor
2  4  14 Type6  Improved
3  5  18 Type5 Excellent
4  6  12 Type6      Poor

2. Specifying elements of a data frame

Elements of a data frame can be specified using several methods. You can just use axis subscripts for identifying the elements of a data frame , similar as dealing with a matrix. You can also include column variable names in indexing when returning parts of a data frame. $ symbol can also be used when a particular column of a data frame is indexed.

# Identifying elements of a data frame
#first and second variable of data frame
> pdata[1:2]  
  ID age
1  3  15
2  4  14
3  5  18
4  6  12
#varaible of age of data frame
> pdata$age   
[1] 15 14 18 12
#variable of "blood" and "status" of data frame
> pdata[c("blood", "status")] 
  blood    status
1 Type5      Poor
2 Type6  Improved
3 Type5 Excellent
4 Type6      Poor
#show information of data frame
> str(pdata)   
'data.frame': 4 obs. of  4 variables:
 $ ID    : num  3 4 5 6
 $ age   : num  15 14 18 12
 $ blood : chr  "Type5" "Type6" "Type5" "Type6"
 $ status: chr  "Poor" "Improved" "Excellent" "Poor"
#select 2nd variable of all observations
> pdata[,2]    
[1] 15 14 18 12
#select all variables of 1st observation
> pdata[1,]    #select all variables of 1st observation
  ID age blood status
1  3  15 Type5   Poor
#using $ symbol to index a particular column
> pdata$blood
[1] "Type5" "Type6" "Type5" "Type6"

You can also watch the corresponding video on our YouTube channel of using data.frame().

wilsonzhang746