I have a data set with N = 80,000 supposedly unique individuals. Individuals are coded by a 9 digit hexadecimal "hashed" version of their 9 digit decimal US SSN. I have no idea what the "hashing" function is, but there appears to be collisions. I am interested in knowing what the expected number of collisions is with a general purpose 36 bit hashing function and 80,000 entries.
2026-03-30 10:36:55.1774867015
Probability of collision with a hashing function
841 Views Asked by Bumbble Comm https://math.techqa.club/user/bumbble-comm/detail At
1
There are 1 best solutions below
Related Questions in PROBABILITY
- How to prove $\lim_{n \rightarrow\infty} e^{-n}\sum_{k=0}^{n}\frac{n^k}{k!} = \frac{1}{2}$?
- Is this a commonly known paradox?
- What's $P(A_1\cap A_2\cap A_3\cap A_4) $?
- Prove or disprove the following inequality
- Another application of the Central Limit Theorem
- Given is $2$ dimensional random variable $(X,Y)$ with table. Determine the correlation between $X$ and $Y$
- A random point $(a,b)$ is uniformly distributed in a unit square $K=[(u,v):0<u<1,0<v<1]$
- proving Kochen-Stone lemma...
- Solution Check. (Probability)
- Interpreting stationary distribution $P_{\infty}(X,V)$ of a random process
Related Questions in CRYPTOGRAPHY
- What exactly is the definition of Carmichael numbers?
- What if Eve knows the value of $S$ in digital signiture?
- Relative prime message in RSA encryption.
- Encryption with $|K| = |P| = |C| = 1$ is perfectly secure?
- Cryptocurrency Math
- DLP Relationship of primitive roots $\pmod{p}$ with $p$ and $g$
- Hints to prove $2^{(p−1)/2}$ is congruent to 1 (mod p) or p-1 (mod p)
- Period of a binary sequence
- generating function / stream cipher
- RSA, cryptography
Trending Questions
- Induction on the number of equations
- How to convince a math teacher of this simple and obvious fact?
- Find $E[XY|Y+Z=1 ]$
- Refuting the Anti-Cantor Cranks
- What are imaginary numbers?
- Determine the adjoint of $\tilde Q(x)$ for $\tilde Q(x)u:=(Qu)(x)$ where $Q:U→L^2(Ω,ℝ^d$ is a Hilbert-Schmidt operator and $U$ is a Hilbert space
- Why does this innovative method of subtraction from a third grader always work?
- How do we know that the number $1$ is not equal to the number $-1$?
- What are the Implications of having VΩ as a model for a theory?
- Defining a Galois Field based on primitive element versus polynomial?
- Can't find the relationship between two columns of numbers. Please Help
- Is computer science a branch of mathematics?
- Is there a bijection of $\mathbb{R}^n$ with itself such that the forward map is connected but the inverse is not?
- Identification of a quadrilateral as a trapezoid, rectangle, or square
- Generator of inertia group in function field extension
Popular # Hahtags
second-order-logic
numerical-methods
puzzle
logic
probability
number-theory
winding-number
real-analysis
integration
calculus
complex-analysis
sequences-and-series
proof-writing
set-theory
functions
homotopy-theory
elementary-number-theory
ordinary-differential-equations
circles
derivatives
game-theory
definite-integrals
elementary-set-theory
limits
multivariable-calculus
geometry
algebraic-number-theory
proof-verification
partial-derivative
algebra-precalculus
Popular Questions
- What is the integral of 1/x?
- How many squares actually ARE in this picture? Is this a trick question with no right answer?
- Is a matrix multiplied with its transpose something special?
- What is the difference between independent and mutually exclusive events?
- Visually stunning math concepts which are easy to explain
- taylor series of $\ln(1+x)$?
- How to tell if a set of vectors spans a space?
- Calculus question taking derivative to find horizontal tangent line
- How to determine if a function is one-to-one?
- Determine if vectors are linearly independent
- What does it mean to have a determinant equal to zero?
- Is this Batman equation for real?
- How to find perpendicular vector to another vector?
- How to find mean and median from histogram
- How many sides does a circle have?
This is an odd question, you are increasing the amount of bits required to encode the SSN. A decimal is encodable is roughly 3.32 bits, a hex value is 4 bits, so you could uniquely describe every single US SSN possible in that value, with room to spare, much less your 80,000. This means there should be no collisions, for reference. You could encode every SSN Possible in roughly 30 bits, so a 32 bit integer would be sufficient (available in most programming platforms, as I assumed this is for a program).
As far as hashing algorithms go, there is no "general," hashing mechanism, there are a couple of common ones, sum mod n and md5 leap to mind, however these aren't really standard.
The goal of a hashing algorithm is, in the context of this discussion, to create a uniform distribution of the domain to the range. This will allow for the fewest hash collisions. The probability of a collision, assuming this is working correctly, given hashing n items into k locations is $n-k+k(1-1/k)^n$, which in your case is $80000-2^{36}+2^{36}(1-1/2^{36})^{80000}$, which is 4.6%.
https://math.dartmouth.edu/archive/m19w03/public_html/Section6-5.pdf