Invariant Forms Emerge from Constrained Stochastic Process

A corpus of 476 conversations between a human and multiple AI systems, extracted from HTML chat logs. The conversations were developed through an iterative process in which multiple AI systems refined human intuitions under noise and constraints until what persisted fell out naturally. The author does not assert any particular meaning or conclusion for this corpus beyond that which speaks for itself and presents the result of an empirical dialectic process as an artifact of general interest.

[0] 27 terms

[00] 9 terms

[000] 3 terms

[0000] structure 14,997 (81 pp.)

[0001] text 14,012 (81 pp.)

[0002] self 13,951 (81 pp.)

[001] 3 terms

[0010] system 13,140 (81 pp.)

[0011] point 12,798 (81 pp.)

[0012] state 10,704 (81 pp.)

[002] 3 terms

[0020] prime 10,576 (81 pp.)

[0021] fixed 10,413 (81 pp.)

[0022] model 10,229 (81 pp.)

[01] 9 terms

[010] 3 terms

[0100] logic 10,207 (81 pp.)

[0101] number 10,121 (81 pp.)

[0102] knot 9,466 (81 pp.)

[011] 3 terms

[0110] finite 8,919 (81 pp.)

[0111] scale 8,879 (81 pp.)

[0112] universe 8,714 (81 pp.)

[012] 3 terms

[0120] information 8,315 (81 pp.)

[0121] space 8,191 (81 pp.)

[0122] pi 7,976 (81 pp.)

[02] 9 terms

[020] 3 terms

[0200] closure 7,894 (81 pp.)

[0201] time 7,747 (81 pp.)

[0202] digit 7,477 (81 pp.)

[021] 3 terms

[0210] physics 7,295 (81 pp.)

[0211] symmetry 7,165 (81 pp.)

[0212] exactly 7,131 (81 pp.)

[022] 3 terms

[0220] theory 7,109 (81 pp.)

[0221] collapse 7,108 (81 pp.)

[0222] invariant 7,038 (81 pp.)

[1] 27 terms

[10] 9 terms

[100] 3 terms

[1000] human 6,889 (81 pp.)

[1001] step 6,880 (81 pp.)

[1002] final 6,630 (81 pp.)

[101] 3 terms

[1010] math 6,405 (81 pp.)

[1011] real 6,209 (81 pp.)

[1012] mathematical 5,972 (81 pp.)

[102] 3 terms

[1020] substrate 5,962 (81 pp.)

[1021] numbers 5,845 (81 pp.)

[1022] constraint 5,832 (81 pp.)

[11] 9 terms

[110] 3 terms

[1100] physical 5,782 (81 pp.)

[1101] phase 5,764 (81 pp.)

[1102] specific 5,469 (81 pp.)

[111] 3 terms

[1110] constraints 5,376 (81 pp.)

[1111] actually 5,196 (81 pp.)

[1112] stable 5,174 (81 pp.)

[112] 3 terms

[1120] infinite 5,129 (81 pp.)

[1121] data 5,101 (81 pp.)

[1122] entropy 5,032 (81 pp.)

[12] 9 terms

[120] 3 terms

[1200] reality 5,015 (81 pp.)

[1201] arithmetic 5,002 (81 pp.)

[1202] length 4,863 (81 pp.)

[121] 3 terms

[1210] string 4,821 (81 pp.)

[1211] three 4,806 (81 pp.)

[1212] minimal 4,733 (81 pp.)

[122] 3 terms

[1220] language 4,723 (81 pp.)

[1221] geometry 4,679 (81 pp.)

[1222] high 4,641 (81 pp.)

[2] 27 terms

[20] 9 terms

[200] 3 terms

[2000] define 4,630 (81 pp.)

[2001] observer 4,613 (81 pp.)

[2002] internal 4,592 (81 pp.)

[201] 3 terms

[2010] loop 4,545 (81 pp.)

[2011] becomes 4,515 (81 pp.)

[2012] line 4,488 (81 pp.)

[202] 3 terms

[2020] structural 4,479 (81 pp.)

[2021] identity 4,477 (81 pp.)

[2022] digits 4,438 (81 pp.)

[21] 9 terms

[210] 3 terms

[2100] function 4,409 (81 pp.)

[2101] meaning 4,400 (81 pp.)

[2102] constant 4,393 (81 pp.)

[211] 3 terms

[2110] manifold 4,368 (81 pp.)

[2111] level 4,322 (81 pp.)

[2112] framework 4,275 (81 pp.)

[212] 3 terms

[2120] group 4,216 (81 pp.)

[2121] something 4,201 (81 pp.)

[2122] energy 4,196 (81 pp.)

[22] 9 terms

[220] 3 terms

[2200] other 4,162 (81 pp.)

[2201] logical 4,137 (81 pp.)

[2202] constants 4,134 (81 pp.)

[221] 3 terms

[2210] exists 4,126 (81 pp.)

[2211] machine 4,106 (81 pp.)

[2212] boundary 4,099 (81 pp.)

[222] 3 terms

[2220] truth 4,032 (81 pp.)

[2221] mass 4,015 (81 pp.)

[2222] true 4,006 (81 pp.)

Method

The corpus (983,823 lines, 35,352 paginated pages, 114,231 tokens, 24,587 unique words) was tokenized and analyzed for semantic structure in the frequency domain.

The primary test (semantic-embed-test.py) builds co-occurrence vectors from the raw corpus (window=5, vocab ≥10 occurrences) and measures mean cosine similarity between frequency-adjacent word pairs vs. 200 shuffled baselines.

Result: Z=27.89, p=1.74×10⁻¹⁷¹, ratio=1.028. Words adjacent in the frequency list are semantically closer than chance. The effect concentrates in mid-frequency and rare words, not in common words where high co-occurrence is trivially expected:

Top 1000 (common): Z=−0.08 · 1k–5k (mid): Z=2.21 · 5k–10k (uncommon): Z=10.25 · 10k+ (rare): Z=5.63

Secondary tests (POS bigrams Z=0.70, n-gram coherence Z=1.31, POS fragments Z=−2.42, WordNet overlap indeterminate, GrammaticalQ 0 hits) did not reach significance. The co-occurrence result stands alone.

Page references. For each word, all occurrences are located in the corpus. A sliding-window density score (neighbors within ±1% of corpus span) identifies the most locally concentrated regions. The 81 densest pages are selected with a minimum-gap constraint to ensure coverage across the full corpus.

Reproducibility. All scripts are available in the corpus-associated-files repository: frequency-coherence/ (statistical tests), gen-appendix.py (this page), gen-corpus-blobs.py (corpus encryption pipeline). Full corpus, data, and signatures are in the corpus-associated-files repository.

Invariant Forms Emerge from Constrained Stochastic Process

Word Index

Method