The importer still uses Streamlit internally, but it is now rendered as the Page Importer tab inside the shared app rather than being the main entrypoint for the repository.

Features

Upload a CSV of submitted URLs
Choose the URL column and optional title override column
Optionally map post type from the CSV or force a single post type
Scrape only the listed URLs
Extract title, publish date, author, body HTML, categories, and tags
Retry failed rows
Export a WordPress WXR XML file

Recommended Usage

Run the root application:

streamlit run ../app.py

Or run the combined Docker container from the repository root.

Standalone Usage

If you need to run this importer by itself:

python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
streamlit run app.py

On Windows PowerShell:

python -m venv .venv
.venv\Scripts\Activate.ps1
pip install -r requirements.txt
streamlit run app.py

CSV Input

The app accepts CSV files with any columns. You choose:

the URL column to scrape
an optional title or name column to override the scraped title
an optional post type column with values like post or page
an optional category column whose values are appended during export

You can also add manual categories in the sidebar to append them to every exported item.

Notes

Exported posts default to draft unless changed in the UI
Image and link URLs remain pointed at the source site
Some themes need heuristic fallback. The Force heuristic scraping option skips JSON-LD-first extraction and relies on page structure
In the combined app, dependencies come from the root requirements.txt