Data Science by Nicolas Pedraza

Posts

Showing posts from February, 2024

Module # 7 R Object: S3 vs. S4 assignment (R Programming)

February 25, 2024

1. How do you tell what OO system (S3 vs. S4) an object is associated with? You can check if an object has a class attribute using the class() function. If it has a class, it's likely associated with S3. For S4, you can use the showClass() function from the 'methods' package 2. How do you determine the base type (like integer or list) of an object? The typeof() function can be used to determine the base type of an object. Additionally, class() can provide information about the class, which may indicate the type. 3. What is a generic function? A generic function is a function that behaves differently depending on the class of its arguments. It allows you to use the same function name for different methods tailored to specific classes. 4. What are the main differences between S3 and S4? S3 is a simpler and more informal object-oriented system, relying on naming conventions and the class attribute. S4 is a more formal and structured system with an explicit definition of classe...

Module # 7 assignment (Visual Analytics)

February 21, 2024

In this example, I've created two scatter plots, one for mpg vs. hp and another for mpg vs. wt, using mtcars and arranged them in a grid using the grid.arrange function. My opinion on Few's recommendations. Simplicity: Few emphasize simplicity to avoid overwhelming the audience. The scatter plots are relatively simple, displaying two variables at a time (mpg vs. hp, mpg vs. wt). This simplicity aids in easy interpretation without unnecessary complexity. Clarity: Is a key principle in Few's recommendations. The use of grid arrangement helps in comparing the two scatter plots side by side, making it easier for viewers to identify patterns and differences between the variables. Accuracy: This is crucial, and Few often advocate for accurate representation of data. The scatter plots accurately represent the relationship between variables, allowing viewers to make informed observations about the distribution and correlation. Color Usage: Few often advise agai...

Module # 6 assignment (Visual Analytics)

February 19, 2024

Code : # Create a vector for the bar chart x <- c(40, 30, 20, 10) # Display the vector x # [1] 40 30 20 10 # Create a basic bar chart barplot(x) # Add names to the elements of the vector names(x) <- c("Red", "Blue", "Green", "Brown") # Display the updated vector x # Red Blue Green Brown # 40 30 20 10 # Create a bar chart with labels barplot(x) # Define colors for the bar chart mycolors <- c("red", "blue", "green", "brown") # Create a bar chart with custom colors barplot(x, col = mycolors) Clarity : The use of custom colors can enhance the clarity of the bar chart by making it visually appealing and helping distinguish between different categories. However, it's essential to choose colors that don't compromise accessibility for color-blind individuals. Simplicity : While the custom colors can make the chart visually interesting, they might introduce complexity. Too many colors or ...

Module # 6 Doing math in R part 2 (R Programming)

February 19, 2024

1. Consider A=matrix(c(2,0,1,3), ncol=2) and B=matrix(c(5,2,4,-1), ncol=2). a) Find A + B A <- matrix(c(2, 0, 1, 3), ncol = 2) B <- matrix(c(5, 2, 4, -1), ncol = 2) result_addition <- A + B print(result_addition) [,1] [,2] [1,] 7 2 [2,] 5 2 Description: A = [2 0, 1 3 ], B = [ 5 2, 4 -1 ]. Add both A + B = [ 2+5 , 0 +2 , 1 + 4 , 3+(-1) ] = [ 7 2 , 5 2 ] b) Find A - B result_subtraction <- A - B print(result_subtraction) [,1] [,2] [1,] -3 -2 [2,] -3 4 Description: A - B = [ 2 - 5, 0 - 2, 1 -4, 3 - (-1) ] = [ -3 -2, -3 4 ] 2. Using the diag() function to build a matrix of size 4 with the following values in the diagonal 4, 1, 2, diagonal_values <- c(4, 1, 2, 3) result_matrix <- diag(diagonal_values) print(result_matrix) [,1] [,2] [,3] [,4] [1,] 4 0...

Module # 5 assignment (Visual Analytics)

February 12, 2024

In undertaking this week's assignment, the decision was made to employ a scatter plot, a tool recognized for its ability to convey complex data relationships. While considering alternative visualization methods, it became apparent that some options posed challenges in deciphering the intended message, leading to a preference for the clarity offered by the scatter plot. Within the presented data, a notable observation emerges: specifically, 60% of the average time for position 23 is documented at 22.80 seconds. This statistic provides a succinct overview of the temporal dynamics associated with this specific position. A more nuanced perspective is gained when examining positions 30-31, revealing that the average time for 80% of the race extends to 30.40 seconds. Noteworthy is the incremental time difference of 7.6 seconds from the average time of position 23. This underscores the significance of seemingly marginal time differentials in the competitive landscape, illustrating the c...

Module # 5 Doing Math [R Programming]

February 08, 2024

# Creating matrices A and B A <- matrix(1:100, nrow = 10) B <- matrix(1:1000, nrow = 10) # Calculating the inverse of matrix A A_inverse <- solve(A) # Calculating the determinant of matrix B B_det <- det(B) # Printing the results print("Inverse of Matrix A:") print(A_inverse) print("Determinant of Matrix B:") print(B_det) 1. A_inverse <- solve(A): This line calculates the inverse of matrix A using the solve() function. 2. B_det <- det(B): This line calculates the determinant of matrix B using the det() function. 3. print("Inverse of Matrix A:") and print(A_inverse): These lines print the header and the result of the inverse of Matrix A. 4. print("Determinant of Matrix B:") and print(B_det): These lines print the header and the result of the determinant of Matrix B.

Module # 4 assignment [Visual Analytics]

February 04, 2024

In this week's dataset analysis, I opted for a distinctive visualization approach, as illustrated above. Instead of the conventional U.S. map, I found this presentation style to be more intuitive for interpreting the data. Unlike the typical geographic map that aggregates collisions for entire states, our dataset includes data for individual state counties, making it challenging to discern specific details in a standard map view. In the showcased visualization, each square corresponds to a distinct county within various states. This format enables us to pinpoint the exact locations of each accident, offering a granular perspective rather than a holistic view. Unlike the state-level summary presented on a traditional map, this approach allows for a more detailed examination of where each incident occurs within individual counties. As observed in the visualization, the squares positioned towards the lower right corner indicate instances where there are either zero or only one collisi...