Data Science by Nicolas Pedraza

Posts

Final Project [LIS4317]

April 25, 2024

Project Documentation: Exploring Top Spotify Songs Introduction: In an era dominated by digital streaming platforms, the music industry has undergone a profound transformation in how music is created, distributed, and consumed. Among these platforms, Spotify stands out as a global leader, providing users with access to a vast library of songs and personalized music recommendations. Understanding the factors that contribute to the success of top Spotify songs is crucial for artists, record labels, and industry analysts seeking to navigate the evolving landscape of music consumption. In this project, we embark on a journey to explore the intricacies of top Spotify songs, aiming to unravel the underlying patterns and trends that shape their popularity. Through a combination of exploratory data analysis (EDA) techniques and visual analytics methodologies, we explore the vast reservoir of Spotify data to uncover insights that inform strategic decision-making a...

Module # 13 {LIS4317}

April 15, 2024

In this blog post, we'll delve into the world of animated scatter plots using R, leveraging the ggplot2 and animation packages to create dynamic visualizations. Animated plots can be powerful tools for conveying changes and patterns over time, making complex data more accessible and engaging. Setting the Stage: To begin, we must ensure that the required R packages (animation and ggplot2) are installed and loaded. These packages offer robust functionality for creating animations and generating sophisticated plots, respectively. Crafting the Animation: Our goal is to construct a series of scatter plots, each representing a distinct frame in our animation. We'll use randomly generated data points within specified limits (-3 to 3 on both the x- and y-axes) to create variability across frames. Bringing It to Life: Upon executing the code, a GIF named scatter_animation.gif will be generated in your working directory. This file contains a sequence of frames, each showca...

Exploring Social Network Visualization with R: Successes and Challenges

April 08, 2024

In the realm of data visualization, exploring social networks can yield fascinating insights into relationships and connectivity. Recently, I delved into this area using R, leveraging packages like GGally, network, sna, and ggplot2. Here’s a recount of my journey, highlighting both successes and challenges encountered along the way. Package Installation and Setup: The initial step was straightforward—installing and loading the necessary packages. Using install.packages() and library() commands, I quickly integrated GGally, network, sna, and ggplot2 into my R environment. Generating Random Network Data: I utilized the rgraph() function from the network package to create a random graph consisting of 10 nodes. Setting mode = "graph" and tprob = 0.5 ensured a symmetric and undirected graph. Visualizing the Network: With the network data prepared, I used ggnet2() from GGally to generate a visualization of the social network. This function creates an aesthetically pleasing graph r...

Module # 11 assignment

April 01, 2024

CODE: > # Load required libraries > library(ggplot2) > library(ggthemes) > > # Sample data > data <- data.frame( + x = rnorm(100), + y = rnorm(100) + ) > > # Create scatter plot > scatter_plot <- ggplot(data, aes(x = x, y = y)) + + geom_point() + + theme_tufte() > > # Display plot > scatter_plot

Module # 10 assignment

March 25, 2024

In this blog post, I'll delve into the world of time series analysis using ggplot2, a powerful data visualization package in R. Time series data involves observations collected or recorded at regular time intervals, making it a crucial area of study in various fields such as finance, economics, and environmental science. Visualizing time series data not only helps in understanding patterns and trends but also aids in making informed decisions based on the insights gained. Code: # Extract year from date year <- function(x) as.POSIXlt(x)$year + 1900 economics$year <- year(economics$date) # Plot unemployment rate over time plot_unemployment <- ggplot(economics, aes(x = date, y = unemploy / pop)) + geom_line() + labs(title = "Unemployment Rate Over Time", x = "Year", y = "Unemployment Rate") + theme_minimal() print(plot_unemployment) This plot provides a clear visualization of how the unemploymen...

Module # 11 Debugging and defensive programming in R

March 25, 2024

The bug in the code lies within the tukey.outlier function. The code for tukey_multiple seems to be designed to identify outliers in each column of a matrix x using the Tukey method, and then determine rows where all columns have outliers. However, there's a bug in the logic of the code. The && operator is used for element-wise logical AND operation, but it should be & for element-wise operation. Fixed CODE: tukey_multiple <- function(x) { outliers <- array(TRUE, dim = dim(x)) for (j in 1:ncol(x)) { outliers[, j] <- tukey.outlier(x[, j]) } outlier.vec <- apply(outliers, 1, all) return(outlier.vec) } In this fixed version, we loop through each column of the input matrix x, identify outliers using tukey.outlier, and store the results in the outliers matrix. Then, we use the apply function to check for rows where all columns have outliers and return a logical vector indicating such rows.

Search This Blog

Data Science by Nicolas Pedraza

Posts

LIS4370[StatVizR - Final Project]

Final Project [LIS4317]

Module # 13 {LIS4317}

Exploring Social Network Visualization with R: Successes and Challenges

Module # 11 assignment

Module # 10 assignment

Module # 11 Debugging and defensive programming in R