R Programming

Creating data frames in R using data.frame()

We provide effective and economically affordable training courses for R and Python, Click here for more details and course registration !

Data frames are the most widely used data structures in R programming. Unlike each element in vector/matrix/array must have same data mode, a data frame can store data elements with different mode or type in one object. For example, a data frame of family information can have numeric (e.g. age, income), character (e.g. name), and logical (work/not work) data types. Data frames in R act somewhat similar as a spredsheet in Microsoft Excel, where each row represents each observation or subject and each column refers to each variable or attribute.

  1. Creating a data frame using data.frame() function

A data frame can be simply created using data.frame() functioun when each column of the data frame is existent in the current R working session. Following code shows an example of creating a personal health data frame using data.frame(), after information for each corresponding column vector has been created.


# Creating a data frame using data.frame()
#two numeric vectors

> ID <- c(3, 4, 5, 6) 
> age <- c(15, 14, 18, 12)d
#a character vector,nominal
> blood <- c("Type5", "Type6", "Type5", "Type6")
#a vector of character type, ordinal
> status <- c("Poor", "Improved", "Excellent", "Poor")
> pdata <- data.frame(ID, age, blood, status)
> pdata
  ID age blood    status
1  3  15 Type5      Poor
2  4  14 Type6  Improved
3  5  18 Type5 Excellent
4  6  12 Type6      Poor

2. Specifying elements of a data frame

Elements of a data frame can be specified using several methods. You can just use axis subscripts for identifying the elements of a data frame , similar as dealing with a matrix. You can also include column variable names in indexing when returning parts of a data frame. $ symbol can also be used when a particular column of a data frame is indexed.

# Identifying elements of a data frame
#first and second variable of data frame
> pdata[1:2]  
  ID age
1  3  15
2  4  14
3  5  18
4  6  12
#varaible of age of data frame
> pdata$age   
[1] 15 14 18 12
#variable of "blood" and "status" of data frame
> pdata[c("blood", "status")] 
  blood    status
1 Type5      Poor
2 Type6  Improved
3 Type5 Excellent
4 Type6      Poor
#show information of data frame
> str(pdata)   
'data.frame': 4 obs. of  4 variables:
 $ ID    : num  3 4 5 6
 $ age   : num  15 14 18 12
 $ blood : chr  "Type5" "Type6" "Type5" "Type6"
 $ status: chr  "Poor" "Improved" "Excellent" "Poor"
#select 2nd variable of all observations
> pdata[,2]    
[1] 15 14 18 12
#select all variables of 1st observation
> pdata[1,]    #select all variables of 1st observation
  ID age blood status
1  3  15 Type5   Poor
#using $ symbol to index a particular column
> pdata$blood
[1] "Type5" "Type6" "Type5" "Type6"

You can also watch the corresponding video on our YouTube channel of using data.frame().

wilsonzhang746

Recent Posts

Python Machine Learning Source Files

Click here to download Python Machine Learning Source Files !

5 days ago

Install PyTorch on Windows

PyTorch is a deep learning package for machine learning, or deep learning in particular for…

2 weeks ago

Topic Modeling using Latent Dirichlet Allocation with Python

Topic modeling is a subcategory of unsupervised machine learning method, and a clustering task in…

1 month ago

Document sentiment classification using bag-of-words in Python

For online Python training registration, click here ! Sentiment classification is a type of machine…

1 month ago

Download R Course source files

Click here to download R Course source files !

10 months ago

Download Python Course source files

Click here to download Python Course Source Files !

10 months ago