I have used a python script to identify target sequences in a DNA sequence file.
There are two classes of sequence: coding and non-coding. I have identified $728$ sequences of interest. $597$ of these fall into the coding regions and $131$ of these fall into the non-coding regions. This is the equivalent of $18\%\,$ non-coding, but the total non-coding region in the sequence file is $13\% $.
Is there a statistical tool to demonstrate the python script identified target sequences in a non-random fashion way?
If the script identified sequences that were randomly distributed then $13\% $ of them would have been found in the non-coding region, from a total of $728$ sequences. This seems like it should be reliable.
I hope my question is clear.
Your null hypothesis is $H_0: p = 0.13$ against the alternative $H_a: p \ne 0.13,$ where $p = P(\text{Non Coding}).$ You observe $X =131$ non-coding sequences among $n = 728$ observed, which gives you $\hat p = 0.1812$ as the observed frequency. Because the observed frequency is substantially different from $p = 0.13$ you wonder whether this might have been an 'unlucky' draw, or whether you have statistically significant evidence that the method of sampling is unfair.
This is called a "one-sample binomial test". Often this test is done by using a normal approximation to the binomial distribution. You can find that method in elementary statistics textbooks. The output below from Minitab statistical software uses the binomial distribution to give an exact P-value. [It seems that that SciPy also implements a version of this test, but I have not tried it.]
If the P-value is less than 5%, one says that the null hypothesis is rejected at the 5% level of significance. Here the P-value is printed as
0.000which means that the P-value is smaller than 0.0005. So it is extremely unlikely that an unbiased draw would give an observed proportion of non-coding sequences so far from $p = 0.13.$Another way to interpret the output is that a 95% confidence interval for $p$ is $(0.154, 0.211),$ which is centered at $\hat p = 0.1812,$ but does not contain $p = 0.13.$ Thus it is difficult to believe that the sampling procedure would have given close to the true value $p = 0.13.$
Note: Yet another approach is to note that quantiles .025 and .975 of the 'null distribution' $\mathsf{Binom}(n = 723, p = 0.13)$ are 77 and 112, respectively. Thus the observed value $X = 131$ falls considerably above the upper 'critical value' of the null distribution for a two-sided test at the 5% level. (Computation in R.)