A Genealogy of Discourse

An archaeological excavation of public thought — fragments scattered across the feed, examined for the structures of knowledge, confession, and silence they reveal

I The Archive a quantitative inventory of utterances

39,055

utterances

417,428

words written

26,877

unique vocabulary

10.7

avg words / post

449

avg unique words / day

15.5%

quote ratio

478

active days

9.0%

posts with profanity

Feb 21, 26 — Jul 29, 24

The Accumulation of Statements

Monthly output — 5,409 original statements, 6,053 citations, 27,593 replies across 19 months

The Calendar of Presence

Daily utterance density — 478 days of activity across the full record

The Instruments of Form

Media format distribution — the material conditions of each utterance

II The Archaeology epistemic strata and discursive formations

Discursive Strata

The shifting terrain of what is spoken about — topic proportions over time

The Twenty Discourses

Individual evolution of the top 10 discourse tags (of 20) — LLM-classified taxonomy with 0% unclassified. Gap-aware rendering accounts for the 2-month absence.

I Contain Multitudes

Shannon entropy of topic distribution — higher means more even spread across discourses. After Whitman: "Do I contradict myself? Very well then I contradict myself, I am large, I contain multitudes."

The Drift Between Voices

Quote ratio by month — the oscillation between original speech (monologic) and citation (dialogic)

The Weight of Words

Average words per post by month — the expansion and contraction of the unit of discourse

Lexical Archaeology

New vocabulary introduced per quarter (words appearing 3+ times) — 26,877 total unique terms tracked

Cumulative Vocabulary Growth

Distinctive Vocabulary

Complex words sized by lexicographic rarity — the words that distinguish this voice from the noise

Did Wei Utter This Word?

Pose a word to the archive — any word, and the codex shall confess whether it was spoken.

Daily Lexical Range

Unique words used per active day (weekly average) — avg 449 unique words / day

The Compound Register

Hyphenated compounds in the corpus — 550 unique compounds, 681 total uses. The hyphen as suture between concepts.

Grammatical Violence

Words bent across parts of speech — nouns forced into verbs, adjectives reified into substances, actions elevated to analytical frameworks. 36 specimens, 176 total inflictions.

III The Confession technologies of the self and the examined life

The Confessional Index

First-person pronoun density vs. emotional vocabulary, z-score normalized — deviations from the personal mean reveal months of unusual self-disclosure or restraint

The Profane Register

Monthly profanity rate — 3527 of 39,055 posts (9.0%) contain profanity. The unfiltered mouth as a marker of register and audience.

The Sentimental Instrument

Positive vs negative sentiment by month — percentage of posts containing affirmative or critical vocabulary

The Linguistic Fingerprint

Six-axis writing style analysis by quarter — the mutation of voice across the archive

The Lexicon

Most frequent terms — 26,877 unique words in the complete corpus

IV The Apparatus the machinery of temporal habit

The Horologium

Radial 24-hour overlay — originals vs quotes, an astrolabe of discursive rhythm

Hourly Distribution

Posting frequency by hour — the temporal architecture of speech

The Weekly Rhythm

Day-of-week posting frequency — the weekly discipline of public thought

V The Correlations the body, the silence, and the night

The Night Scholar

Sleep duration vs. posting output — 447 nights cross-referenced. Does rest fuel discourse, or does discourse steal from rest?

The Research Pipeline

Of 39,055 posts, 2,121 (5.4%) were preceded by research activity within 30 minutes. The clipboard as antechamber to discourse.

research → post: 2121 unprompted: 36934

The Charging Confession

Of 39,055 posts, 2,588 (6.6%) were preceded by plugging in within 30 minutes. The body's need for power as a precondition of speech.

charge → post: 2588 untethered: 36467

Cartography of Silence

3 periods of absence, 776 total days of silence. The longest withdrawal: 685 days. What the record does not say is itself a statement.

VII The Information Shannon's children — entropy, surprise, and the architecture of unpredictability

3.123

tag entropy (bits)

0.998

Zipf α

0.711

Heaps' β

28,081

vocabulary

10.8

mean surprise (bits/word)

10.394

word entropy (bits)

4.471

char entropy (bits)

5.714

conditional H (bits)

The Taxonomy

A squarified treemap of the 20-tag LLM taxonomy — 43,175 posts classified with 0% unclassified. Area proportional to post count.

The Surprise Distribution

Per-post information surprise — how unexpected each utterance is given the corpus language model. Higher surprise = rarer vocabulary or unusual construction.

The Character Strata

Post length distribution across the 500-character Threads limit — the material constraint of form on expression

Mutual Information

How much knowing one feature tells you about topic — I(Tag; Length) dominates at 0.982 bits, while temporal features carry almost no information about what is said

The Transition Matrix

What follows what. The sequence of utterances is never random — this matrix traces which discourse begets which, mapping the gravitational pull between subjects. The diagonal reveals obsessive return; the off-diagonal, the restless drift between concerns.

The Chaos Instruments

Four gauges of discursive unpredictability — stay rate (topic persistence), burst posting (temporal clustering), hapax ratio (once-words), and Heaps' exponent (vocabulary growth rate)

Per-Category Vocabulary Entropy

Normalized entropy of each tag's vocabulary — how evenly distributed the word usage is within each discourse category. Higher = more diverse language.

The Threads Archive page of ByTheWei.co. Contains 37,912 posts by @maybe_foucault analyzed through information theory. Seven sections: The Archive (accumulation curves), The Archaeology (Zipf's law, Heaps' law, word frequency), The Confession (20-tag taxonomy, Shannon entropy of discourse categories, confessional index), The Apparatus (vocabulary analysis, TF-IDF distinctive terms, citation patterns), The Correlations (tag co-occurrence via pointwise mutual information), The Reception (engagement vs. information content), The Information (corpus-level Shannon entropy in bits, redundancy, channel capacity). All visualizations are pure SVG generated at build time. Sub-pages: taxonomy (treemap), network (knowledge graph), discourse (9-category deep-dive + Foucault Index).