64 lines
1.8 KiB
Markdown
64 lines
1.8 KiB
Markdown
# Page Importer
|
|
|
|
This folder contains the WordPress import tool used by the combined application in the repository root.
|
|
|
|
The importer still uses Streamlit internally, but it is now rendered as the `Page Importer` tab inside the shared app rather than being the main entrypoint for the repository.
|
|
|
|
## Features
|
|
|
|
- Upload a CSV of submitted URLs
|
|
- Choose the URL column and optional title override column
|
|
- Optionally map post type from the CSV or force a single post type
|
|
- Scrape only the listed URLs
|
|
- Extract title, publish date, author, body HTML, categories, and tags
|
|
- Retry failed rows
|
|
- Export a WordPress WXR XML file
|
|
|
|
## Recommended Usage
|
|
|
|
Run the root application:
|
|
|
|
```bash
|
|
streamlit run ../app.py
|
|
```
|
|
|
|
Or run the combined Docker container from the repository root.
|
|
|
|
## Standalone Usage
|
|
|
|
If you need to run this importer by itself:
|
|
|
|
```bash
|
|
python3 -m venv .venv
|
|
source .venv/bin/activate
|
|
pip install -r requirements.txt
|
|
streamlit run app.py
|
|
```
|
|
|
|
On Windows PowerShell:
|
|
|
|
```powershell
|
|
python -m venv .venv
|
|
.venv\Scripts\Activate.ps1
|
|
pip install -r requirements.txt
|
|
streamlit run app.py
|
|
```
|
|
|
|
## CSV Input
|
|
|
|
The app accepts CSV files with any columns. You choose:
|
|
|
|
- the URL column to scrape
|
|
- an optional title or name column to override the scraped title
|
|
- an optional post type column with values like `post` or `page`
|
|
- an optional category column whose values are appended during export
|
|
|
|
You can also add manual categories in the sidebar to append them to every exported item.
|
|
|
|
## Notes
|
|
|
|
- Exported posts default to `draft` unless changed in the UI
|
|
- Image and link URLs remain pointed at the source site
|
|
- Some themes need heuristic fallback. The `Force heuristic scraping` option skips JSON-LD-first extraction and relies on page structure
|
|
- In the combined app, dependencies come from the root `requirements.txt`
|