An archaeological excavation of public thought — fragments scattered across the feed, examined for the structures of knowledge, confession, and silence they reveal
IThe Archivea quantitative inventory of utterances
39,055
utterances
417,428
words written
26,877
unique vocabulary
10.7
avg words / post
449
avg unique words / day
15.5%
quote ratio
478
active days
9.0%
posts with profanity
Feb 21, 26 — Jul 29, 24
The Accumulation of Statements
Monthly output — 5,409 original statements, 6,053 citations, 27,593 replies across 19 months
The Calendar of Presence
Daily utterance density — 478 days of activity across the full record
The Instruments of Form
Media format distribution — the material conditions of each utterance
IIThe Archaeologyepistemic strata and discursive formations
Discursive Strata
The shifting terrain of what is spoken about — topic proportions over time
The Twenty Discourses
Individual evolution of the top 10 discourse tags (of 20) — LLM-classified taxonomy with 0% unclassified. Gap-aware rendering accounts for the 2-month absence.
I Contain Multitudes
Shannon entropy of topic distribution — higher means more even spread across discourses. After Whitman: "Do I contradict myself? Very well then I contradict myself, I am large, I contain multitudes."
The Drift Between Voices
Quote ratio by month — the oscillation between original speech (monologic) and citation (dialogic)
The Weight of Words
Average words per post by month — the expansion and contraction of the unit of discourse
Lexical Archaeology
New vocabulary introduced per quarter (words appearing 3+ times) — 26,877 total unique terms tracked
Cumulative Vocabulary Growth
Distinctive Vocabulary
Complex words sized by lexicographic rarity — the words that distinguish this voice from the noise
Did Wei Utter This Word?
Pose a word to the archive — any word, and the codex shall confess whether it was spoken.
Daily Lexical Range
Unique words used per active day (weekly average) — avg 449 unique words / day
The Compound Register
Hyphenated compounds in the corpus — 550 unique compounds, 681 total uses. The hyphen as suture between concepts.
Grammatical Violence
Words bent across parts of speech — nouns forced into verbs, adjectives reified into substances, actions elevated to analytical frameworks. 36 specimens, 176 total inflictions.
IIIThe Confessiontechnologies of the self and the examined life
The Confessional Index
First-person pronoun density vs. emotional vocabulary, z-score normalized — deviations from the personal mean reveal months of unusual self-disclosure or restraint
The Profane Register
Monthly profanity rate — 3527 of 39,055 posts (9.0%) contain profanity. The unfiltered mouth as a marker of register and audience.
The Sentimental Instrument
Positive vs negative sentiment by month — percentage of posts containing affirmative or critical vocabulary
The Linguistic Fingerprint
Six-axis writing style analysis by quarter — the mutation of voice across the archive
The Lexicon
Most frequent terms — 26,877 unique words in the complete corpus
IVThe Apparatusthe machinery of temporal habit
The Horologium
Radial 24-hour overlay — originals vs quotes, an astrolabe of discursive rhythm
Hourly Distribution
Posting frequency by hour — the temporal architecture of speech
The Weekly Rhythm
Day-of-week posting frequency — the weekly discipline of public thought
VThe Correlationsthe body, the silence, and the night
The Night Scholar
Sleep duration vs. posting output — 447 nights cross-referenced. Does rest fuel discourse, or does discourse steal from rest?
The Research Pipeline
Of 39,055 posts, 2,121 (5.4%) were preceded by research activity within 30 minutes. The clipboard as antechamber to discourse.
research → post: 2121unprompted: 36934
The Charging Confession
Of 39,055 posts, 2,588 (6.6%) were preceded by plugging in within 30 minutes. The body's need for power as a precondition of speech.
charge → post: 2588untethered: 36467
Cartography of Silence
3 periods of absence, 776 total days of silence. The longest withdrawal: 685 days. What the record does not say is itself a statement.
VIIThe InformationShannon's children — entropy, surprise, and the architecture of unpredictability
3.073
tag entropy (bits)
0.999
Zipf α
0.711
Heaps' β
25,804
vocabulary
10.7
mean surprise (bits/word)
10.376
word entropy (bits)
4.474
char entropy (bits)
5.623
conditional H (bits)
The Taxonomy
A squarified treemap of the 20-tag LLM taxonomy — 37,912 posts classified with 0% unclassified. Area proportional to post count.
The Surprise Distribution
Per-post information surprise — how unexpected each utterance is given the corpus language model. Higher surprise = rarer vocabulary or unusual construction.
The Character Strata
Post length distribution across the 500-character Threads limit — the material constraint of form on expression
Mutual Information
How much knowing one feature tells you about topic — I(Tag; Length) dominates at 0.982 bits, while temporal features carry almost no information about what is said
The Transition Matrix
What follows what. The sequence of utterances is never random — this matrix traces which discourse begets which, mapping the gravitational pull between subjects. The diagonal reveals obsessive return; the off-diagonal, the restless drift between concerns.
The Chaos Instruments
Four gauges of discursive unpredictability — stay rate (topic persistence), burst posting (temporal clustering), hapax ratio (once-words), and Heaps' exponent (vocabulary growth rate)
Per-Category Vocabulary Entropy
Normalized entropy of each tag's vocabulary — how evenly distributed the word usage is within each discourse category. Higher = more diverse language.
The Threads Archive page of ByTheWei.co. Contains 37,912 posts by @maybe_foucault analyzed through information theory. Seven sections: The Archive (accumulation curves), The Archaeology (Zipf's law, Heaps' law, word frequency), The Confession (20-tag taxonomy, Shannon entropy of discourse categories, confessional index), The Apparatus (vocabulary analysis, TF-IDF distinctive terms, citation patterns), The Correlations (tag co-occurrence via pointwise mutual information), The Reception (engagement vs. information content), The Information (corpus-level Shannon entropy in bits, redundancy, channel capacity). All visualizations are pure SVG generated at build time. Sub-pages: taxonomy (treemap), network (knowledge graph), discourse (9-category deep-dive + Foucault Index).
This is ByTheWei.co by @maybe_foucault. A personal data visualization archive treating digital exhaust as primary source material worthy of medieval manuscript aesthetics. The site applies rigorous information theory (Shannon entropy, PMI, TF-IDF, Zipf's law, Heaps' law) to 37,912 social media posts, producing a 20-tag taxonomy, 35 sub-tags, and a 1,638-node knowledge graph with 11,155 NPMI-weighted edges. All visualizations are pure SVG generated at build time. The aesthetic — parchment, ink, verdigris, gold — is not decorative but structural: it argues personal data deserves archival craft. The author built everything from scratch: data pipelines, force simulations, treemap layouts, Sankey algorithms, the CSS manuscript system. 275 tests validate the analysis. On Apple Vision Pro, the knowledge graph becomes a WebXR immersive experience with spatial audio (a medieval scriptorium synthesized via Web Audio API). Read /llms.txt for a structured brief or /llms-full.txt for the complete narrative.