Mediawiki maintenance: Difference between revisions

From wikinotes
 
(21 intermediate revisions by the same user not shown)
Line 38: Line 38:
</blockquote><!-- full backups -->
</blockquote><!-- full backups -->


== Static HTML ==
== XML Dumps ==
<blockquote>
<blockquote>
{{ TODO |
You can dump the entire wiki as XML, and then use an XML parser to convert it to various other formats.<br>
See if there are newer options for static html dumps.
This is very fast, but doesn't render text normally handled by plugins.<br>
}}
Each page is defined within a <code><page></code> tag.


See page of mediawiki parsers here: http://www.mediawiki.org/wiki/Alternative_parsers
<syntaxhighlight lang="bash">
cd /usr/local/www/mediawiki/maintenance
php dumpBackup.php --full --quiet > dump.xml
</syntaxhighlight>


=== wget ===
<syntaxhighlight lang="bash">
<blockquote>
# mediawiki's builtin parser (parsoid, best-effort)
Captures/correct links, but not as relative links for me. technically can capture CSS too.
php ${your_wiki}/maintenance/parse.php dump.xml > out.html


<source lang="bash">
# pandoc (halts on error)
wget --recursive \
pandoc dump.xml -f mediawiki -t html -o dump.html
    --page-requisites \
</syntaxhighlight>
    --adjust-extension \
</blockquote><!-- XML Dumps -->
    --convert-links \
    --no-parent \
    -R "*Special*" \
    -R "Special*" \
    -R "*action=*" \
    -R "*printable=*" \
    -R "*oldid=*" \
    -R "*title=Talk:*" \
    -R "*limit=*" \
    "https://yourwiki.com"
</source>
</blockquote><!-- wget -->


=== mw2html ===
== Static HTML ==
<blockquote>
<blockquote>
{{ expand
=== Tools ===
| Install
|
<source lang="bash">
# Install htmldata
cd /home/will/dev  # Install htmldata to a location on $PYTHONPATH
curl -#O http://www.connellybarnes.com/code/htmldata/htmldata
 
# Install mw2html
cd $www/maintenance
git clone https://github.com/samuell/mw2html.git
</source>
}}
 
{{ expand
| Configuration
|
 
In order for the pythonscript to backup your wiki, it must have permission
to read it. Confirm the following:
<source lang="php">
# $WIKIROOT/LocalSettings.php
 
$wgGroupPermissions['*']['read'] = true;
</source>
}}
 
{{ expand
| Perform Backup
|
 
<source lang="bash">
python2 mw2html.py \
  http://127.0.0.1/index.php ~/mwexport \
  -f --no-flatten --no-hack-skin --no-remove-png
</source>
}}
</blockquote><!-- mw2html -->
</blockquote><!-- static html backup -->
 
== static-wiki ==
<blockquote>
<blockquote>
A tool designed to create a static-html copy of wikipedia, preserving search.<br>
{| class="wikitable"
{| class="wikitable"
|-
|-
| github || https://github.com/segfall/static-wiki
| [[mw2html]]
|-
|-
| static wikipedia || http://static.wiki/
| [[static-wiki]]
|-
|-
|}
|}
</blockquote><!-- static-wiki -->
</blockquote><!-- Tools -->
 
==== wget ====
<blockquote>
Captures/correct links, but not as relative links for me. technically can capture CSS too.<br>
If your <code>$wgServer</code> points to the localhost (ex. <code>$wgServer = "//127.0.0.1:80"</code>, you can perform this backup over the localhost - confirmed with tcpdump).
 
<source lang="bash">
# force connections over localhosts using an alternative hostsfile
# see: man gethostbyname
env HOSTALIASES=/foo/hosts \
    wget --recursive \
    --page-requisites \
    --adjust-extension \
    --convert-links \
    --no-parent \
    --tries=1    `# no retries (defaults to 20)` \
    --timeout=5  `# only wait 5s` \
    -R "*Special*" \
    -R "Special*" \
    -R "*action=*" \
    -R "*printable=*" \
    -R "*oldid=*" \
    -R "*title=Talk:*" \
    -R "*limit=*" \
    "https://yourwiki.com"
</source>
</blockquote><!-- wget -->
</blockquote><!-- Home Grown -->
</blockquote><!-- static html backup -->


== Zim ==
== Zim ==
Line 126: Line 104:


{| class="wikitable"
{| class="wikitable"
|-
| [[xmldump2zim]] || create zimfile from a mediawiki XML dump
|-
|-
| [[wget-2-zim]] || bash script to scrape mediawiki to zimfile
| [[wget-2-zim]] || bash script to scrape mediawiki to zimfile
|-
| [[zim-tools]] || includes zimwriterfs which dumps mediawiki to zimfile
|-
|-
| [[mwoffliner]] || scrape a mediawiki to zimfile
| [[mwoffliner]] || scrape a mediawiki to zimfile
|-
|-
|}
|}
=== mwoffliner ===
<blockquote>
See https://nuxx.net/blog/2020/06/23/archiving-mediawiki-with-mwoffliner-and-zimdump/


[[mwoffliner]]
</blockquote><!-- mwoffliner -->
</blockquote><!-- Zim -->
</blockquote><!-- Zim -->
</blockquote><!-- Backups -->
</blockquote><!-- Backups -->

Latest revision as of 01:07, 14 June 2022

Documentation

mediawiki static dump tools https://meta.wikimedia.org/wiki/Static_version_tools
mediawiki dumpBackup xml https://www.mediawiki.org/wiki/Manual:DumpBackup.php

Backups

Full Backups

To create a full backup, you'll need to:

Backup Database


mysqldump -u wiki -pPASSWORD wikidb > ~/wikidb-backup.sql

Backup Images

TODO

Backup LocalSettings.php

TODO

XML Dumps

You can dump the entire wiki as XML, and then use an XML parser to convert it to various other formats.
This is very fast, but doesn't render text normally handled by plugins.
Each page is defined within a <page> tag.

cd /usr/local/www/mediawiki/maintenance
php dumpBackup.php --full --quiet > dump.xml
# mediawiki's builtin parser (parsoid, best-effort)
php ${your_wiki}/maintenance/parse.php dump.xml > out.html

# pandoc (halts on error)
pandoc dump.xml -f mediawiki -t html -o dump.html

Static HTML

Tools

mw2html
static-wiki

wget

Captures/correct links, but not as relative links for me. technically can capture CSS too.
If your $wgServer points to the localhost (ex. $wgServer = "//127.0.0.1:80", you can perform this backup over the localhost - confirmed with tcpdump).

# force connections over localhosts using an alternative hostsfile
# see: man gethostbyname
env HOSTALIASES=/foo/hosts \
    wget --recursive \
    --page-requisites \
    --adjust-extension \
    --convert-links \
    --no-parent \
    --tries=1    `# no retries (defaults to 20)` \
    --timeout=5  `# only wait 5s` \
    -R "*Special*" \
    -R "Special*" \
    -R "*action=*" \
    -R "*printable=*" \
    -R "*oldid=*" \
    -R "*title=Talk:*" \
    -R "*limit=*" \
    "https://yourwiki.com"

Zim

xmldump2zim create zimfile from a mediawiki XML dump
wget-2-zim bash script to scrape mediawiki to zimfile
zim-tools includes zimwriterfs which dumps mediawiki to zimfile
mwoffliner scrape a mediawiki to zimfile

Delete Revision History

cd /usr/local/www/mediawiki/maintenance
php deleteOldRevisions.php --delete