Cluster analysis is a vibrant area of applied mathematics, "used in many fields, including machine learning, pattern recognition, image analysis, information retrieval, and bioinformatics". A subfield of cluster analysis of special mathematical interest - because of its clear-cut underlying definitions - is graph clustering.
Graph clustering has also some philosophical impact, e.g. in the context of logical atomism and logical constructions – both tracing back to B. Russell – or in R. Dipert's The World as Graph.
From Wikipedia's cluster analysis entry:
a clique, i.e., a subset of nodes in a graph such that every two nodes in the subset are connected by an edge can be considered as a prototypical form of cluster. Relaxations of the complete connectivity requirement (a fraction of the edges can be missing) are known as quasi-cliques.
A completely different kind of graph clusters are the orbits of the automorphism group of a graph $G$, which are not necessarily connected but contain similar (in fact: structurally identical) objects.
Both cliques and orbits seem to me prototypical "natural definitions" of families of subgraphs.
The above mentioned relaxations - there is an infinitude of them - usually depend on arbitrary numerical parameters and don't feel so natural, at least when the parameters are not natural in a sense.
Comparably, a definition of "n-cliques" using not immediate but intermediate connectivity - eg. by replacing "connected by an edge (= 1-path)" by "connected by an n-path" in the definition above - would not feel as natural as the "immediate" definition.
But it is hard to formalize this intuitive feeling of "naturality" which comprises among others:
- no arbitrary numerical parameters
- no arbitrary reference to singled-out (named) objects
- no unnecessary complifications of otherwise simple definitions
My question is:
What attempts to define "natural definitions" are there, especially with respect to families of subgraphs of arbitrary graphs? Is it possible at all?
I guess somehow we should rule out spurious ways to depend on parameter, such as
Here is attempt (not constructive and not really canonical). Fix a first-order language $L$ containing some basic arithmetic so we can use parameters from $\mathbb N$.
Definition 1. For an $L$-definable function $f$ with domain $\mathbb N$, define the complexity $C(f)$ to be the shortest definition of $f$ in $L$.
Definition 2. For functions $f^*$ and $f$ with domain $\mathbb N$, we say that $f$ is an improvement of $f^*$ if $C(f)\le C(f^*)$, and for all but finitely many $n$, $f(n)=f^*(n)$.
Definition 3. A set $S$ depends on a parameter if there exists an injective $f$ with domain $\mathbb N$, such that for all improvements $g$ of $f$, there is an $n$ with $f(n)=g(n)=S$. In this case we say the value of the parameter can be $n$.
Theorem(?) The collection $S$ of graphs having a clique of size 5 does depend on a parameter (which can have the value 5), since a natural definition of "clique of size $k$" cannot be shortened and modified in such a way as to still be correct for almost all $k$, and at the same time remove $S$ from the range.