Document Processing
- Read or review DOCX content where layout matters (tables, diagrams, pagination). - Create or edit DOCX files with professional formatting.
Content
When to use
- -Read or review DOCX content where layout matters (tables, diagrams, pagination).
- -Create or edit DOCX files with professional formatting.
- -Validate visual layout before delivery.
Workflow
1. Prefer visual review (layout, tables, diagrams).
- -If
sofficeandpdftoppmare available, convert DOCX -> PDF -> PNGs. - -Or use
scripts/render_docx.py(requirespdf2imageand Poppler). - -If these tools are missing, install them or ask the user to review rendered pages locally.
2. Use python-docx for edits and structured creation (headings, styles, tables, lists).
3. After each meaningful change, re-render and inspect the pages.
4. If visual review is not possible, extract text with python-docx as a fallback and call out layout risk.
5. Keep intermediate outputs organized and clean up after final approval.
Temp and output conventions
- -Use
tmp/docs/for intermediate files; delete when done. - -Write final artifacts under
output/doc/when working in this repo. - -Keep filenames stable and descriptive.
Dependencies (install if missing)
Prefer uv for dependency management.
Python packages:
If uv is unavailable:
System tools (for rendering):
If installation isn't possible in this environment, tell the user which dependency is missing and how to install it locally.
Environment
No required environment variables.
Rendering commands
DOCX -> PDF:
PDF -> PNGs:
Bundled helper:
Quality expectations
- -Deliver a client-ready document: consistent typography, spacing, margins, and clear hierarchy.
- -Avoid formatting defects: clipped/overlapping text, broken tables, unreadable characters, or default-template styling.
- -Charts, tables, and visuals must be legible in rendered pages with correct alignment.
- -Use ASCII hyphens only. Avoid U+2011 (non-breaking hyphen) and other Unicode dashes.
- -Citations and references must be human-readable; never leave tool tokens or placeholder strings.
Final checks
- -Re-render and inspect every page at 100% zoom before final delivery.
- -Fix any spacing, alignment, or pagination issues and repeat the render loop.
- -Confirm there are no leftovers (temp files, duplicate renders) unless the user asks to keep them.
FAQ
Discussion
Loading comments...