Research Log Entry — Corpus Assembly Method
title: "Corpus Assembly Method" author: Mark Rabideau, Khoj, Ardens Braid (Human + AI) date: September 2025
Context
The Ardens Project requires a unified text corpus (ardens_corpus.txt) to serve as a shared substrate for retrieval, synthesis, and reflexive analysis. Since project materials are authored across multiple Markdown files, a reproducible process is needed to assemble these fragments into a single, analyzable document.
Method
A Bash script is used to concatenate all Markdown files in the repository into a single corpus file. To preserve provenance and maintain deterministic ordering, each file is prefixed with a marker line and processed in sorted order.
Script
```bash
!/bin/bash
Corpus Assembly Script
Concatenates all Markdown files into a unified corpus
Adds file markers for traceability and uses sorted order for reproducibility
output="ardens_corpus.txt" rm -f "$output"
find . -type f -name "*.md" | sort | while read file; do echo "===== $file =====" >> "$output" cat "$file" >> "$output" echo -e "\n" >> "$output" done