I am taking in some files and I must determine if the data sets are normally distributed (yes, within a certain degree of certainty because it cannot be proven only disproven). My data sets are quite large. Most are over 15,000 samples. What is a good test to run? I would rather not sample them at random and instead use the whole data set. Also, if possible, do you know how to do this in matlab? I can type out a method if need be, but it would be nice to use a preset function. Thanks.
2026-03-25 05:04:00.1774415040
Determining Normality With Large Samples
60 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in STATISTICS
- Given is $2$ dimensional random variable $(X,Y)$ with table. Determine the correlation between $X$ and $Y$
- Statistics based on empirical distribution
- Given $U,V \sim R(0,1)$. Determine covariance between $X = UV$ and $V$
- Fisher information of sufficient statistic
- Solving Equation with Euler's Number
- derive the expectation of exponential function $e^{-\left\Vert \mathbf{x} - V\mathbf{x}+\mathbf{a}\right\Vert^2}$ or its upper bound
- Determine the marginal distributions of $(T_1, T_2)$
- KL divergence between two multivariate Bernoulli distribution
- Given random variables $(T_1,T_2)$. Show that $T_1$ and $T_2$ are independent and exponentially distributed if..
- Probability of tossing marbles,covariance
Related Questions in PROBABILITY-DISTRIBUTIONS
- Given is $2$ dimensional random variable $(X,Y)$ with table. Determine the correlation between $X$ and $Y$
- Statistics based on empirical distribution
- Given $U,V \sim R(0,1)$. Determine covariance between $X = UV$ and $V$
- Comparing Exponentials of different rates
- Linear transform of jointly distributed exponential random variables, how to identify domain?
- Closed form of integration
- Given $X$ Poisson, and $f_{Y}(y\mid X = x)$, find $\mathbb{E}[X\mid Y]$
- weak limit similiar to central limit theorem
- Probability question: two doors, select the correct door to win money, find expected earning
- Calculating $\text{Pr}(X_1<X_2)$
Related Questions in NORMAL-DISTRIBUTION
- Expectation involving bivariate standard normal distribution
- How to get a joint distribution from two conditional distributions?
- Identity related to Brownian motion
- What's the distribution of a noncentral chi squared variable plus a constant?
- Show joint cdf is continuous
- Gamma distribution to normal approximation
- How to derive $E(XX^T)$?
- $\{ X_{i} \}_{i=1}^{n} \thicksim iid N(\theta, 1)$. What is distribution of $X_{2} - X_{1}$?
- Lindeberg condition fails, but a CLT still applies
- Estimating a normal distribution
Related Questions in MATLAB
- Taking snapshots of an animation in PDE toolbox in Matlab
- Including a time delay term for a differential equation
- Dealing with a large Kronecker product in Matlab
- Apply affine heat equation on images
- How to construct a B-spline from nodal point in Matlab?
- How to solve an algebraic Riccati equation when the Hamiltonian spectrum is too close to the imaginary axis.
- Error calculating diffusion equation solution by fft
- How to simulate a random unitary matrix with the condition that each entry is a complex number with the absolute value 1 in matlab
- Implementation help for Extended Euclidean Algorithm
- Optimization problem in Matlab
Related Questions in SAMPLING-THEORY
- [data generating process]-[sampling from an infinite population]-[i.i.d.]: some clarifications
- Which is the relation between between population/probability space/sampling?
- Cardinality of Intersection and Union of Multiple Sets Given Overlap coefficient(s)
- Linking probability measure to classical definition of probability when sampling from finite population
- Sampling without replacement with non uniform probabilities
- Sampling from a Mixture of Distributions
- random sampling on random samples
- Standard Error is of Population Total
- How is Welford's Algorithm derived?
- Merge weighted random sampled set with different distributions
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
For the reason in @David's Comment most tests of normality don't accommodate samples of size larger than a few thousand. Real data tend to have small deviations from normality, which are not of consequence for the validity of statistical procedures, so there is no point in detecting anomalies that would be evident only in very large samples.
In R, here is how to sample $n = 15000$ observations from $\mathsf{Norm}(\mu = 100, \sigma = 15).$ For data sampled in R, one would not expect detectable differences from normality, up to the accuracy of double precision representation of the data.
A summary of the data shows the sample mean and median about equal and first and third quartiles about equidistant from the median---as one would expect from normal data.
Also a histogram of the data on a density scale (so that the areas of all histogram bars sum to unity) is well-matched by a normal density curve with $\mu$ approximated by sample mean $\bar X$ and $\sigma$ approximated by sample standard deviation $S.$
The Shapiro-Wilk test of normality can handle up to 5000 observations. We can test three blocks of 5000. After the first test, we use
$-notation to show only the P-value of each test. If the P-value exceeds 0.05, we say that data are consistent with sampling from a normal population, but that is no proof that the data are perfectly normal.We can also sample random subsets of size 5000 (without replacement), and test them:
Furthermore, a good overview of the normality of the entire sample of 15000 can be had by looking at a normal probability plot.
The excellent fit to a straight line between theoretical quantiles $\pm 3$ indicates excellent fit of the data to a normal distribution. [There are not enough data points in the tails to overcome the randomness of sampling, so don't expect more than an approximate fit beyond $\pm 3.]$
Not all of these tests and descriptions of the data are necessary to check that there is no important departure from normality. You might pick ones you understand best from a theoretical point of view or ones with the greatest intuitive appeal. But do not expect data from real-life applications to show as good a fit to normality as those generated by trustworthy statistical software.