Invariant Forms Emerge from Constrained Stochastic Process

A corpus of 476 conversations between a human and multiple AI systems, extracted from HTML chat logs. The conversations were developed through an iterative process in which multiple AI systems refined human intuitions under noise and constraints until what persisted fell out naturally. The author does not assert any particular meaning or conclusion for this corpus beyond that which speaks for itself and presents the result of an empirical dialectic process as an artifact of general interest.

Word Index

Top 81 content words by frequency (≥3 occurrences, stopwords excluded), organized as a depth-4 ternary tree (34 = 81 leaves). Each leaf links to the corpus (34,929 pages).
[0] 27 terms
[00] 9 terms
[000] 3 terms
[001] 3 terms
[002] 3 terms
[01] 9 terms
[010] 3 terms
[011] 3 terms
[012] 3 terms
[02] 9 terms
[020] 3 terms
[021] 3 terms
[022] 3 terms
[1] 27 terms
[10] 9 terms
[100] 3 terms
[101] 3 terms
[102] 3 terms
[11] 9 terms
[110] 3 terms
[111] 3 terms
[112] 3 terms
[12] 9 terms
[120] 3 terms
[121] 3 terms
[122] 3 terms
[2] 27 terms
[20] 9 terms
[200] 3 terms
[201] 3 terms
[202] 3 terms
[21] 9 terms
[210] 3 terms
[211] 3 terms
[212] 3 terms
[22] 9 terms
[220] 3 terms
[221] 3 terms
[222] 3 terms
actually 5,196 (81 pp.)
arithmetic 5,002 (81 pp.)
becomes 4,515 (81 pp.)
boundary 4,099 (81 pp.)
closure 7,894 (81 pp.)
collapse 7,108 (81 pp.)
constant 4,393 (81 pp.)
constants 4,134 (81 pp.)
constraint 5,832 (81 pp.)
constraints 5,376 (81 pp.)
data 5,101 (81 pp.)
define 4,630 (81 pp.)
digit 7,477 (81 pp.)
digits 4,438 (81 pp.)
energy 4,196 (81 pp.)
entropy 5,032 (81 pp.)
exactly 7,131 (81 pp.)
exists 4,126 (81 pp.)
final 6,630 (81 pp.)
finite 8,919 (81 pp.)
fixed 10,413 (81 pp.)
framework 4,275 (81 pp.)
function 4,409 (81 pp.)
geometry 4,679 (81 pp.)
group 4,216 (81 pp.)
high 4,641 (81 pp.)
human 6,889 (81 pp.)
identity 4,477 (81 pp.)
infinite 5,129 (81 pp.)
information 8,315 (81 pp.)
internal 4,592 (81 pp.)
invariant 7,038 (81 pp.)
knot 9,466 (81 pp.)
language 4,723 (81 pp.)
length 4,863 (81 pp.)
level 4,322 (81 pp.)
line 4,488 (81 pp.)
logic 10,207 (81 pp.)
logical 4,137 (81 pp.)
loop 4,545 (81 pp.)
machine 4,106 (81 pp.)
manifold 4,368 (81 pp.)
mass 4,015 (81 pp.)
math 6,405 (81 pp.)
mathematical 5,972 (81 pp.)
meaning 4,400 (81 pp.)
minimal 4,733 (81 pp.)
model 10,229 (81 pp.)
number 10,121 (81 pp.)
numbers 5,845 (81 pp.)
observer 4,613 (81 pp.)
other 4,162 (81 pp.)
phase 5,764 (81 pp.)
physical 5,782 (81 pp.)
physics 7,295 (81 pp.)
pi 7,976 (81 pp.)
point 12,798 (81 pp.)
prime 10,576 (81 pp.)
real 6,209 (81 pp.)
reality 5,015 (81 pp.)
scale 8,879 (81 pp.)
self 13,951 (81 pp.)
something 4,201 (81 pp.)
space 8,191 (81 pp.)
specific 5,469 (81 pp.)
stable 5,174 (81 pp.)
state 10,704 (81 pp.)
step 6,880 (81 pp.)
string 4,821 (81 pp.)
structural 4,479 (81 pp.)
structure 14,997 (81 pp.)
substrate 5,962 (81 pp.)
symmetry 7,165 (81 pp.)
system 13,140 (81 pp.)
text 14,012 (81 pp.)
theory 7,109 (81 pp.)
three 4,806 (81 pp.)
time 7,747 (81 pp.)
true 4,006 (81 pp.)
truth 4,032 (81 pp.)
universe 8,714 (81 pp.)

Method

The corpus (939,556 lines, 34,929 paginated pages, 114,231 tokens, 24,587 unique words) was tokenized and analyzed for semantic structure in the frequency domain.
The primary test (semantic-embed-test.py) builds co-occurrence vectors from the raw corpus (window=5, vocab ≥10 occurrences) and measures mean cosine similarity between frequency-adjacent word pairs vs. 200 shuffled baselines.
Result: Z=27.89, p=1.74×10−171, ratio=1.028. Words adjacent in the frequency list are semantically closer than chance. The effect concentrates in mid-frequency and rare words, not in common words where high co-occurrence is trivially expected:
Top 1000 (common): Z=−0.08 · 1k–5k (mid): Z=2.21 · 5k–10k (uncommon): Z=10.25 · 10k+ (rare): Z=5.63
Secondary tests (POS bigrams Z=0.70, n-gram coherence Z=1.31, POS fragments Z=−2.42, WordNet overlap indeterminate, GrammaticalQ 0 hits) did not reach significance. The co-occurrence result stands alone.
Page references. For each word, all occurrences are located in the corpus. A sliding-window density score (neighbors within ±1% of corpus span) identifies the most locally concentrated regions. The 81 densest pages are selected with a minimum-gap constraint to ensure coverage across the full corpus.
Reproducibility. All scripts are available in the repository: frequency-coherence/ (statistical tests), scripts/gen-appendix.py (this page).