Posts

LIS4370[StatVizR - Final Project]

  StatVizR: Simplifying Statistical Analysis and Visualization in R   Nicolas Pedraza LIS4370  Final Project  4/26/2024  Greetings! Today I introduce StatVizR, an R package designed to streamline statistical analysis and visualization tasks. Whether you're a seasoned data analyst or just dipping your toes in data science, StatVizR is here to make your life easier.  What is StatVizR?   StatVizR is more than just another R package - it's your go-to toolkit for everything related to statistical analysis and visualization. With a comprehensive set of functions, StatVizR empowers users to efficiently manipulate data, explore patterns, conduct hypothesis tests, perform regression analysis, and create insightful visualizations.  Why StatVizR?  You might be wondering, "Why should I choose StatVizR over other packages?" Well, here are a few reasons:  1. Ease of Use:  StatVizR is designed with user-friendliness in mind. Whether you're a begin...

Final Project [LIS4317]

Image
  Project Documentation: Exploring Top Spotify Songs    Introduction:  In an era dominated by digital streaming platforms, the music industry has undergone a profound transformation in how music is created, distributed, and consumed. Among these platforms, Spotify stands out as a global leader, providing users with access to a vast library of songs and personalized music recommendations. Understanding the factors that contribute to the success of top Spotify songs is crucial for artists, record labels, and industry analysts seeking to navigate the evolving landscape of music consumption.  In this project, we embark on a journey to explore the intricacies of top Spotify songs, aiming to unravel the underlying patterns and trends that shape their popularity. Through a combination of exploratory data analysis (EDA) techniques and visual analytics methodologies, we explore the vast reservoir of Spotify data to uncover insights that inform strategic decision-making a...

Module # 13 {LIS4317}

Image
 In this blog post, we'll delve into the world of animated scatter plots using R, leveraging the ggplot2 and animation packages to create dynamic visualizations. Animated plots can be powerful tools for conveying changes and patterns over time, making complex data more accessible and engaging. Setting the Stage:  To begin, we must ensure that the required R packages (animation and ggplot2) are installed and loaded. These packages offer robust functionality for creating animations and generating sophisticated plots, respectively. Crafting the Animation:  Our goal is to construct a series of scatter plots, each representing a distinct frame in our animation. We'll use randomly generated data points within specified limits (-3 to 3 on both the x- and y-axes) to create variability across frames. Bringing It to Life:  Upon executing the code, a GIF named scatter_animation.gif will be generated in your working directory. This file contains a sequence of frames, each showca...

Exploring Social Network Visualization with R: Successes and Challenges

Image
In the realm of data visualization, exploring social networks can yield fascinating insights into relationships and connectivity. Recently, I delved into this area using R, leveraging packages like GGally, network, sna, and ggplot2. Here’s a recount of my journey, highlighting both successes and challenges encountered along the way. Package Installation and Setup:  The initial step was straightforward—installing and loading the necessary packages. Using install.packages() and library() commands, I quickly integrated GGally, network, sna, and ggplot2 into my R environment. Generating Random Network Data: I utilized the rgraph() function from the network package to create a random graph consisting of 10 nodes. Setting mode = "graph" and tprob = 0.5 ensured a symmetric and undirected graph. Visualizing the Network: With the network data prepared, I used ggnet2() from GGally to generate a visualization of the social network. This function creates an aesthetically pleasing graph r...

Module # 11 assignment

Image
  CODE:  > # Load required libraries > library(ggplot2) > library(ggthemes) >  > # Sample data > data <- data.frame( +     x = rnorm(100), +     y = rnorm(100) + ) >  > # Create scatter plot > scatter_plot <- ggplot(data, aes(x = x, y = y)) + +     geom_point() + +     theme_tufte() >  > # Display plot > scatter_plot

Module # 10 assignment

Image
 In this blog post, I'll delve into the world of time series analysis using ggplot2, a powerful data visualization package in R. Time series data involves observations collected or recorded at regular time intervals, making it a crucial area of study in various fields such as finance, economics, and environmental science. Visualizing time series data not only helps in understanding patterns and trends but also aids in making informed decisions based on the insights gained. Code:  # Extract year from date year <- function(x) as.POSIXlt(x)$year + 1900 economics$year <- year(economics$date) # Plot unemployment rate over time plot_unemployment <- ggplot(economics, aes(x = date, y = unemploy / pop)) +   geom_line() +   labs(title = "Unemployment Rate Over Time",        x = "Year",        y = "Unemployment Rate") +   theme_minimal() print(plot_unemployment) This plot provides a clear visualization of how the unemploymen...

Module # 11 Debugging and defensive programming in R

 The bug in the code lies within the tukey.outlier function. The code for tukey_multiple seems to be designed to identify outliers in each column of a matrix x using the Tukey method, and then determine rows where all columns have outliers. However, there's a bug in the logic of the code. The && operator is used for element-wise logical AND operation, but it should be & for element-wise operation.  Fixed CODE:  tukey_multiple <- function(x) {   outliers <- array(TRUE, dim = dim(x))   for (j in 1:ncol(x)) {     outliers[, j] <- tukey.outlier(x[, j])   }   outlier.vec <- apply(outliers, 1, all)   return(outlier.vec) } In this fixed version, we loop through each column of the input matrix x, identify outliers using tukey.outlier, and store the results in the outliers matrix. Then, we use the apply function to check for rows where all columns have outliers and return a logical vector indicating such rows.