Human Activity Data Set

26 Sep 2020 ~5 min read

Synopsis

Data

The experiments have been carried out with a group of 30 volunteers within an age bracket of 19-48 years. Each person performed six activities (WALKING, WALKING_UPSTAIRS, WALKING_DOWNSTAIRS, SITTING, STANDING, LAYING) wearing a smartphone (Samsung Galaxy S II) on the waist. Using its embedded accelerometer and gyroscope, we captured 3-axial linear acceleration and 3-axial angular velocity at a constant rate of 50Hz. The experiments have been video-recorded to label the data manually. The obtained dataset has been randomly partitioned into two sets, where 70% of the volunteers was selected for generating the training data and 30% the test data.

The sensor signals (accelerometer and gyroscope) were pre-processed by applying noise filters and then sampled in fixed-width sliding windows of 2.56 sec and 50% overlap (128 readings/window). The sensor acceleration signal, which has gravitational and body motion components, was separated using a Butterworth low-pass filter into body acceleration and gravity. The gravitational force is assumed to have only low frequency components, therefore a filter with 0.3 Hz cutoff frequency was used. From each window, a vector of features was obtained by calculating variables from the time and frequency domain.

A full description is available at the site where the data was obtained: Here

The purpose of this project is to demonstrating how to collect, load, and clean a data set. The goal is to prepare tidy data that can be processed easily for later analysis.

Data set [60MB]

Processing Steps

Collecting the data.
Merges training and test data into one data set.
Extract the measurement by mean and standard deviations.
Change to descriptive activities label.
Change column name to descriptive labels.
Create independent data set with the average of each variable for each activity and each subject.

1. Collecting the Data

# Download the Data set
    fileurl <- 'https://d396qusza40orc.cloudfront.net/getdata%2Fprojectfiles%2FUCI%20HAR%20Dataset.zip'
    download.file(fileurl, destfile = 'projectdataset.zip')
      
# Unzip the Data set
    unzip('./projectdataset.zip')

## 0 Start to reading files

    # Read training data
    activity.train <- read.table('./UCI HAR Dataset/train/y_train.txt', header = F)
    feature.train <- read.table('./UCI HAR Dataset/train/X_train.txt', header = F)
    subject.train <- read.table('./UCI HAR Dataset/train/subject_train.txt', header = F)

    # Read test data
    activity.test <- read.table('./UCI HAR Dataset/test/y_test.txt', header = F)
    feature.test <- read.table('./UCI HAR Dataset/test/X_test.txt', header = F)
    subject.test <- read.table('./UCI HAR Dataset/test/subject_test.txt', header = F)

    # Read activity labels
    activity.label <- read.table('./UCI HAR Dataset/activity_labels.txt', header = F)

    # Read feature names
    feature.names <- read.table('./UCI HAR Dataset/features.txt', header = F)

2. Merge Training and Test Data into One Data Set

# 1.1 Assigning variable names
    names(activity.train) <- 'Activity'
    names(feature.train) <- feature.names[,2]
    names(subject.train) <- 'Subject'

    names(activity.test) <- 'Activity'
    names(feature.test) <- feature.names[,2]
    names(subject.test) <- 'Subject'

    names(activity.label) <- c('Activity', 'ActivityType')

# 1.2 Merge all data frame into one set
    train <- cbind(subject.train, activity.train, feature.train)
    test <- cbind(subject.test, activity.test, feature.test)
    data <- rbind(train, test)

print(paste("Observation: ", nrow(data),"Column: ", ncol(data)))
head(data[1:10])

[1] "Observation:  10299 Column:  563"

Subject	Activity	tBodyAcc-mean()-X	tBodyAcc-mean()-Y	tBodyAcc-mean()-Z	tBodyAcc-std()-X	tBodyAcc-std()-Y	tBodyAcc-std()-Z	tBodyAcc-mad()-X	tBodyAcc-mad()-Y
1	5	0.2885845	-0.02029417	-0.1329051	-0.9952786	-0.9831106	-0.9135264	-0.9951121	-0.9831846
1	5	0.2784188	-0.01641057	-0.1235202	-0.9982453	-0.9753002	-0.9603220	-0.9988072	-0.9749144
1	5	0.2796531	-0.01946716	-0.1134617	-0.9953796	-0.9671870	-0.9789440	-0.9965199	-0.9636684
1	5	0.2791739	-0.02620065	-0.1232826	-0.9960915	-0.9834027	-0.9906751	-0.9970995	-0.9827498
1	5	0.2766288	-0.01656965	-0.1153619	-0.9981386	-0.9808173	-0.9904816	-0.9983211	-0.9796719
1	5	0.2771988	-0.01009785	-0.1051373	-0.9973350	-0.9904868	-0.9954200	-0.9976274	-0.9902177

3. Extract the Measurement by Mean and Standard Deviations

    subset.feature <- feature.names$V2[grep("mean\\(\\)|std\\(\\)",feature.names$V2)]
    subset.data <- c('Subject', 'Activity', as.character(subset.feature))
    data <- subset(data, select = subset.data)

print(paste("Observation: ", nrow(data),"Column: ", ncol(data)))
head(data[1:10])

[1] "Observation:  10299 Column:  68"

Subject	Activity	tBodyAcc-mean()-X	tBodyAcc-mean()-Y	tBodyAcc-mean()-Z	tBodyAcc-std()-X	tBodyAcc-std()-Y	tBodyAcc-std()-Z	tGravityAcc-mean()-X	tGravityAcc-mean()-Y
1	5	0.2885845	-0.02029417	-0.1329051	-0.9952786	-0.9831106	-0.9135264	0.9633961	-0.1408397
1	5	0.2784188	-0.01641057	-0.1235202	-0.9982453	-0.9753002	-0.9603220	0.9665611	-0.1415513
1	5	0.2796531	-0.01946716	-0.1134617	-0.9953796	-0.9671870	-0.9789440	0.9668781	-0.1420098
1	5	0.2791739	-0.02620065	-0.1232826	-0.9960915	-0.9834027	-0.9906751	0.9676152	-0.1439765
1	5	0.2766288	-0.01656965	-0.1153619	-0.9981386	-0.9808173	-0.9904816	0.9682244	-0.1487502
1	5	0.2771988	-0.01009785	-0.1051373	-0.9973350	-0.9904868	-0.9954200	0.9679482	-0.1482100

4. Change to Descriptive Activities Labels

    for (x in 1:6) {data$Activity [(as.character(data$Activity) == x)] <- as.character(activity.label[x,2])
    }

print(paste("Observation: ", nrow(data),"Column: ", ncol(data)))
head(data[1:10])

[1] "Observation:  10299 Column:  68"

Subject	Activity	tBodyAcc-mean()-X	tBodyAcc-mean()-Y	tBodyAcc-mean()-Z	tBodyAcc-std()-X	tBodyAcc-std()-Y	tBodyAcc-std()-Z	tGravityAcc-mean()-X	tGravityAcc-mean()-Y
1	STANDING	0.2885845	-0.02029417	-0.1329051	-0.9952786	-0.9831106	-0.9135264	0.9633961	-0.1408397
1	STANDING	0.2784188	-0.01641057	-0.1235202	-0.9982453	-0.9753002	-0.9603220	0.9665611	-0.1415513
1	STANDING	0.2796531	-0.01946716	-0.1134617	-0.9953796	-0.9671870	-0.9789440	0.9668781	-0.1420098
1	STANDING	0.2791739	-0.02620065	-0.1232826	-0.9960915	-0.9834027	-0.9906751	0.9676152	-0.1439765
1	STANDING	0.2766288	-0.01656965	-0.1153619	-0.9981386	-0.9808173	-0.9904816	0.9682244	-0.1487502
1	STANDING	0.2771988	-0.01009785	-0.1051373	-0.9973350	-0.9904868	-0.9954200	0.9679482	-0.1482100

5. Change Column Names to Descriptive Labels

    names(data) <- gsub('^t','Time', names(data))
    names(data) <- gsub('^f', 'Frequency', names(data))
    names(data) <- gsub('Acc', 'Accelerometer', names(data))
    names(data) <- gsub('BodyBody', 'Body', names(data))
    names(data) <- gsub('Gyro', 'Gyroscope', names(data))
    names(data) <- gsub('Mag', 'Magnitude', names(data))

print(paste("Observation: ", nrow(data),"Column: ", ncol(data)))
head(data[1:10])

[1] "Observation:  10299 Column:  68"

Subject	Activity	TimeBodyAccelerometer-mean()-X	TimeBodyAccelerometer-mean()-Y	TimeBodyAccelerometer-mean()-Z	TimeBodyAccelerometer-std()-X	TimeBodyAccelerometer-std()-Y	TimeBodyAccelerometer-std()-Z	TimeGravityAccelerometer-mean()-X	TimeGravityAccelerometer-mean()-Y
1	STANDING	0.2885845	-0.02029417	-0.1329051	-0.9952786	-0.9831106	-0.9135264	0.9633961	-0.1408397
1	STANDING	0.2784188	-0.01641057	-0.1235202	-0.9982453	-0.9753002	-0.9603220	0.9665611	-0.1415513
1	STANDING	0.2796531	-0.01946716	-0.1134617	-0.9953796	-0.9671870	-0.9789440	0.9668781	-0.1420098
1	STANDING	0.2791739	-0.02620065	-0.1232826	-0.9960915	-0.9834027	-0.9906751	0.9676152	-0.1439765
1	STANDING	0.2766288	-0.01656965	-0.1153619	-0.9981386	-0.9808173	-0.9904816	0.9682244	-0.1487502
1	STANDING	0.2771988	-0.01009785	-0.1051373	-0.9973350	-0.9904868	-0.9954200	0.9679482	-0.1482100

6. Create Independent Data Set with the Average of Each Variable for Each Activity and Each Subject

# Creates a second independent data set
    data2 <- aggregate(.~ Subject + Activity, data, mean)
    data2 <- data2[order(data2$Subject,data2$Activity), ]

print(paste("Observation: ", nrow(data2),"Column: ", ncol(data2)))
head(data2[1:10])

[1] "Observation:  180 Column:  68"

	Subject	Activity	TimeBodyAccelerometer-mean()-X	TimeBodyAccelerometer-mean()-Y	TimeBodyAccelerometer-mean()-Z	TimeBodyAccelerometer-std()-X	TimeBodyAccelerometer-std()-Y	TimeBodyAccelerometer-std()-Z	TimeGravityAccelerometer-mean()-X	TimeGravityAccelerometer-mean()-Y
1	1	LAYING	0.2215982	-0.040513953	-0.1132036	-0.92805647	-0.836827406	-0.82606140	-0.2488818	0.7055498
31	1	SITTING	0.2612376	-0.001308288	-0.1045442	-0.97722901	-0.922618642	-0.93958629	0.8315099	0.2044116
61	1	STANDING	0.2789176	-0.016137590	-0.1106018	-0.99575990	-0.973190056	-0.97977588	0.9429520	-0.2729838
91	1	WALKING	0.2773308	-0.017383819	-0.1111481	-0.28374026	0.114461337	-0.26002790	0.9352232	-0.2821650
121	1	WALKING_DOWNSTAIRS	0.2891883	-0.009918505	-0.1075662	0.03003534	-0.031935943	-0.23043421	0.9318744	-0.2666103
151	1	WALKING_UPSTAIRS	0.2554617	-0.023953149	-0.0973020	-0.35470803	-0.002320265	-0.01947924	0.8933511	-0.3621534

# Create txt file from this tidy data
    write.table(data2, file = 'TidyData.txt', row.names = F)
      
# E N D