Posts

Module # 9 assignment

Image
 CODE # Load necessary library library(ggplot2) # Create scatterplot matrix scatterplot_matrix <- ggplot(CigarettesB, aes(x = price, y = packs, color = income)) +   geom_point() +   labs(title = "Scatterplot Matrix of Cigarette Data",        x = "Price",        y = "Packs",        color = "Income") +   theme_minimal() # Print the scatterplot matrix print(scatterplot_matrix) Judgment on Multi-Variable Visualization: Multivariate visualization is a powerful way to explore relationships between multiple variables simultaneously. In the context of the cigarette data, a scatterplot matrix allows us to visualize the relationships between the price of cigarettes, the number of packs sold, and the income associated with each observation. This type of visualization enables us to identify patterns, clusters, and trends that may not be apparent when examining variables individually. By visualizing multiple variables...

Module # 10 Building your own R package

Package: Pedraza Title: Package for statistical analysis and visualization Version: 0.1.0.9000 Authors@R: "Nicolas Pedraza <nicolas32@usf.edu> [aut, cre]" Description: Pedraza is an R package designed to facilitate statistical analysis and visualization. It provides a set of functions for data manipulation, exploratory data analysis, hypothesis testing, regression analysis, and plotting. With Pedraza, users can efficiently perform various statistical tasks, from simple descriptive statistics to complex modeling techniques. Whether you are a beginner or an experienced data analyst, Pedraza aims to streamline your workflow and enhance your data analysis capabilities. Depends: R (>= 3.1.2) License: CC0 LazyData: true

Module # 9 Visualization in R

Image
  Basic Histogram: Distribution of Cigarette Prices Our journey begins with a basic histogram, shedding light on the distribution of cigarette prices in the dataset. # Basic Histogram hist(CigarettesB$price, main = "Distribution of Cigarette Prices", xlab = "Price") The histogram vividly illustrates the spread of prices, giving us a glimpse into the variability and concentration within the dataset. Peaks and troughs in the histogram reveal potential clusters or outliers, setting the stage for further exploration. Lattice Scatterplot Matrix: Unveiling Multivariate Relationships Next, we employ a lattice scatterplot matrix, a powerful tool for understanding relationships between multiple variables simultaneously. # Lattice Scatterplot Matrix library(lattice) splom(~CigarettesB[, c("packs", "price", "income")], main = "Scatterplot Matrix") The scatterplot matrix allows us to identify patterns and correlations between "packs,...

Module # 8 Correlation Analysis and ggplot2

Image
  Exploring Relationships in mtcars: A Visual Analytics Journey In the realm of data analysis, the power of visualization cannot be overstated. Visual analytics provides a unique lens through which we can unravel intricate patterns and relationships hidden within datasets. Inspired by Stephen Few, I embarked on a journey to explore the mtcars dataset using the versatile ggplot2 package in RStudio. In my opinion, Few's recommendation to use a grid is not just a mere organizational suggestion; it's a profound insight into how we perceive and comprehend data visually. The grid layout in our scatter plot matrix not only aids in comparisons but acts as a visual roadmap, guiding us through the complexity of relationships within the dataset. The grid layout becomes a powerful ally in our exploration, allowing us to draw connections and identify trends with efficiency. It serves as a testament to Few's emphasis on simplicity and clarity in visualizations. Conclusion: Grids and Visu...

Module # 8 Input/Output, string manipulation and plyr package

 Exploring Gender Patterns and Academic Performance: The 'I' Factor Introduction: In a quest to uncover intriguing patterns within student demographics and academic achievements, we delved into a dataset revealing noteworthy insights. Specifically, our analysis focused on students whose names contain the letter 'i'. What emerged from this exploration was a fascinating correlation between the presence of the letter 'i' in a student's name, their gender, and academic performance. Gender Disparities: Our examination revealed a predominant association between the letter 'i' and female names. Of the students with 'i' in their name, a substantial majority were females.  Academic Excellence Among 'I'-Named Females: Diving deeper into the academic performance of students with 'i' in their names, a striking trend unfolded. Among these individuals, seven females secured the highest accolade – an 'A' grade. This impressive achiev...

Module # 7 R Object: S3 vs. S4 assignment (R Programming)

1. How do you tell what OO system (S3 vs. S4) an object is associated with? You can check if an object has a class attribute using the class() function. If it has a class, it's likely associated with S3. For S4, you can use the showClass() function from the 'methods' package 2. How do you determine the base type (like integer or list) of an object? The typeof() function can be used to determine the base type of an object. Additionally, class() can provide information about the class, which may indicate the type. 3. What is a generic function? A generic function is a function that behaves differently depending on the class of its arguments. It allows you to use the same function name for different methods tailored to specific classes. 4. What are the main differences between S3 and S4? S3 is a simpler and more informal object-oriented system, relying on naming conventions and the class attribute. S4 is a more formal and structured system with an explicit definition of classe...

Module # 7 assignment (Visual Analytics)

Image
  In this example, I've created two scatter plots, one for mpg vs. hp and another for mpg vs. wt, using mtcars and arranged them in a grid using the grid.arrange function.  My opinion on Few's recommendations.  Simplicity:   Few emphasize simplicity to avoid overwhelming the audience. The scatter plots are relatively simple, displaying two variables at a time (mpg vs. hp, mpg vs. wt). This simplicity aids in easy interpretation without unnecessary complexity. Clarity: Is a key principle in Few's recommendations. The use of grid arrangement helps in comparing the two scatter plots side by side, making it easier for viewers to identify patterns and differences between the variables. Accuracy: This is crucial, and Few often advocate for accurate representation of data. The scatter plots accurately represent the relationship between variables, allowing viewers to make informed observations about the distribution and correlation. Color Usage: Few often advise agai...