DOCX imports can carry years of hidden formatting debt. This cleanup playbook prioritizes structure, paragraph integrity, and heading consistency so final PDF and EPUB exports stay predictable.
Run structure cleanup before visual cleanup
Start by fixing document structure: chapter headings, scene separators, and front/back matter boundaries.
If structure is unstable, every later formatting pass produces noisy warnings and false regressions.
Do not tune typography first. Lock structure first, then clean presentation-level issues.
Normalize heading hierarchy and remove duplicates
Keep one heading pattern per level (for example: chapter = Heading 1, scene = body text plus divider).
Delete duplicate chapter lines created by copy/paste imports. Duplicate headings create TOC jumps and navigation drift.
If a chapter title appears as plain text plus heading style in two places, keep only one authoritative heading line.
Remove manual spacing and tab hacks
Replace repeated blank lines, tabs-for-indent, and manual line breaks with normal paragraph styling.
Legacy DOCX files often use visual hacks that collapse differently across print and EPUB outputs.
Any spacing achieved with repeated returns is fragile; convert it to style-based paragraph spacing.
Clear hidden artifacts that break export flow
Turn on hidden characters in your source editor and remove stray section breaks, page breaks, and mixed list markers.
Watch for pasted web content carrying inline style noise and nonstandard characters.
If one chapter behaves differently, inspect it for direct formatting overrides before changing global settings.
Re-export once, then triage with a failure-mode checklist
Generate one fresh release-candidate export after cleanup and inspect chapter starts, TOC links, and paragraph rhythm.
Classify every defect as structure, spacing, heading, or artifact issue and fix the source DOCX only.