DOCX Cleanup for Clean Exports

Older Word files often carry hidden formatting problems. This guide helps you clean headings, spacing, breaks, and pasted formatting so your PDF and EPUB exports behave more predictably.

Fix the structure first, not the appearance

Start with the bones of the document: chapter headings, scene separators, and where front matter and back matter begin and end.

If that structure is messy, later formatting checks become noisy and confusing.

Do not worry about making the file pretty yet. First make it stable.

Clean up headings before anything else

Use one clear heading pattern for each level. For example, chapter titles might be Heading 1 while scene breaks stay in body text with a divider.

Delete duplicate chapter lines created by copy-and-paste imports. Duplicate headings often break the TOC or send links to the wrong place.

If a chapter title appears twice, keep only the real heading line.

Remove extra spaces, tabs, and manual line breaks

Replace repeated blank lines, tab indents, and manual line breaks with normal paragraph settings.

Older DOCX files often use visual shortcuts that collapse differently in print and EPUB.

If spacing only works because you pressed Enter several times, it is likely to break later.

Show hidden marks and clear surprise breaks

Turn on hidden characters in Word and look for section breaks, page breaks, mixed list styles, and odd pasted formatting.

Watch for web text copied into the document with random fonts, spacing, or special characters.

If one chapter behaves differently from the rest, look for direct formatting on that chapter before changing the whole book.

Export once, then check the actual results

After cleanup, export one fresh file and check chapter starts, TOC links, and paragraph spacing.

If something still looks wrong, decide whether it is a heading problem, a spacing problem, a break problem, or leftover pasted formatting.

Fix the DOCX source file, then export again. Do not treat the exported file as the main place to do repairs.

Numbers and Reference Tables

Top DOCX import failure modes and first fixes

Failure mode How to detect quickly First source-first fix
Duplicate chapter headings TOC points to repeated titles or the wrong chapter. Keep one Heading 1 per chapter and remove duplicates.
Manual blank-line spacing Large vertical gaps vary between chapters. Replace repeated returns with paragraph spacing styles.
Tab-based paragraph indents First lines indent differently across the book. Remove tabs and set first-line indent in the paragraph style.
Hidden page or section breaks Unexpected blank pages appear before chapter starts. Reveal hidden marks and remove unintended breaks.
Mixed list marker styles Bullets or numbering change style mid-section. Normalize list style in the source and reapply once.
Inline font overrides from pasted text Random font changes appear inside body paragraphs. Clear direct formatting and reapply the body style.
Soft line breaks inside paragraphs Paragraphs wrap strangely in EPUB output. Replace manual line breaks with normal paragraph flow.
Scene divider copied as image Divider alignment or size drifts across formats. Use a text divider or one consistent ornament style.
Inconsistent quote punctuation characters Search and replace misses some quote variants. Normalize smart quotes and apostrophes globally.
Broken heading levels in front matter TOC includes title or copyright pages unexpectedly. Use body style for front matter lines that should not appear in the TOC.

Publish Checklist

  1. Choose one heading hierarchy before changing the look of the file.
  2. Remove duplicate chapter headings and duplicate anchor text.
  3. Replace tab indents and repeated blank lines with paragraph styles.
  4. Show hidden characters and remove unintended page or section breaks.
  5. Clear direct formatting from pasted web or email content.
  6. Normalize list styles and divider styles.
  7. Run find and replace for inconsistent quote and dash characters.
  8. Export one fresh file after cleanup changes.
  9. Check TOC targets, chapter starts, and paragraph spacing in the export.
  10. Write down any remaining problem pages before the final formatting pass.

Warning-to-Fix Map

Warning pattern: TOC links land on the wrong chapter

Fix: Remove duplicate heading lines and clean up chapter heading levels.

Verify: Every TOC link lands on the expected chapter heading in order.

Warning pattern: unexpected blank pages near chapter starts

Fix: Remove hidden section or page breaks and keep chapter-start rules consistent.

Verify: Only intentional blank pages remain after re-export.

Warning pattern: paragraph spacing varies chapter to chapter

Fix: Replace manual blank lines with style-based paragraph spacing.

Verify: Body rhythm is consistent on multi-chapter spot checks.

Warning pattern: random font changes in body text

Fix: Clear direct formatting and reapply one body-text style.

Verify: Font properties remain consistent across sample chapters.

Warning pattern: list formatting breaks after import

Fix: Normalize list style definitions and reapply list blocks.

Verify: Bulleted and numbered lists render consistently in export.

Verification Checklist

Screenshots worth keeping

  • Save one screenshot of your cleaned-up heading structure.
  • Save one screenshot showing hidden marks if you removed surprise breaks.
  • Save one screenshot of TOC verification in the exported file.

Final cleanup check before formatting

  • Give the cleaned export a clear file name so you do not confuse it with older versions.
  • Keep cleanup notes and screenshots in the same folder as the exported file.
  • Start final print or EPUB QA only after the cleanup issues are under control.

The Senswriter way (faster)

Use the same workflow in one workspace: draft, export, run checks, fix the source, and upload one clean final file.

Open the Senswriter Workspace and see export examples.

Frequently Asked Questions

Do I need a perfect DOCX before importing?

No. You mainly need stable structure and clean paragraph behavior. A cleanup pass removes most hidden issues before final export checks.

What is the fastest cleanup order?

Clean heading hierarchy first, remove spacing hacks second, then clear hidden breaks and direct formatting.

Can I skip cleanup if the first export looks mostly fine?

You can, but hidden DOCX problems often come back during late edits. A cleanup pass early on makes the final week much calmer.

Sources and Claim Checks