Benutzer:DrTrigonBot/ToDo-Liste
Diese Seite dient der Entwicklung (Fehlerbeseitigung und Weiterentwicklung) von DrTrigonBot. Um diese Arbeiten besser koordinieren zu können (und da der Bot sowieso auf dem Toolserver lebt) werden hier die praktischen und vielseitigen Dienste des Wikipedia:Toolservers genutzt.
Zum einen verwenden wir den Toolserver Issue Tracker/Bugtracker JIRA (https://jira.toolserver.org/) und zum anderen den Toolserver FishEye repository viewer (https://fisheye.toolserver.org/). Diese beiden Systeme arbeiten sehr gut und eng zusammen, siehe dazu in die Dokumentation, z.B. Example Linkers.
Der Tracker bietet auch eine Übersicht zum aktuellen Fortschritt unter Summary und der Repo Viewer unter Activity.
Der aktuelle Tracker usw. Fortschritt vom pywikipedia framework ist unter Python Wikipedia Robot Framework einzusehen.
BUGS
[Bearbeiten | Quelltext bearbeiten]Wikipedia ist ein Wiki, sei mutig!
Hier bitte Fehler im Bot melden. (Liste der vom Bot gemeldeten Probleme: wiki • toolserver) | |
Auf dem Toolserver existiert für diesen Zweck extra ein Bugtracker, bitte dort registrieren und dann den Fehler melden. Zur Not können sie auch hier gemeldet werden (Permalink auf Problem, Beschreibung und Signatur hinterlassen bitte), ich werde sie dann übertragen - das kann aber etwas länger dauern. | |
Toolserver Issue Tracker (JIRA): Issue Navigator • Report Bug |
- I think the category JSEG filled by this bot should be named "JPEG" because it contains only JPEG images.torsch (Diskussion) 21:15, 9. Feb. 2013 (CET)
- DON'T ADD new bug reports here, please report either on User talk:DrTrigon or as described in User:DrTrigonBot#Bug list und feature request!
- Hier KEINE neuen Fehlerberichte HINZUFÜGEN, bitte melden sie diese entweder auf User talk:DrTrigon oder wie beschrieben in User:DrTrigonBot#Bug list und feature request!
FEATURE REQUESTS
[Bearbeiten | Quelltext bearbeiten]Wikipedia ist ein Wiki, sei mutig!
Hier bitte Anfragen nach neuen Funktionen stellen. | |
Auf dem Toolserver existiert für diesen Zweck extra ein Bugtracker, bitte dort registrieren und dann eine Anfrage stellen. Zur Not auch hier (Beschreibung und Signatur hinterlassen bitte), ich werde sie dann übertragen - das kann aber etwas länger dauern. | |
Toolserver Issue Tracker (JIRA): Issue Navigator • Request Feature |
JIRA:DRTRIGON-92JIRA:DRTRIGON-120- talk/discuss about integration of Lua support to pywikipedia
- If the bot categorizes a file, he should also add this files to the correct subcategories of commons:Category:Media needing category review by number of suggestions and commons:Category:Media needing category review by date because this is where I'm looking for files to check categories manually.torsch (Diskussion) 21:21, 9. Feb. 2013 (CET)
- That is already covered by using commons:Template:Check categories - at least the second one. --DrTrigon 08:49, 19. Feb. 2013 (CET)
- DON'T ADD new feature requests here, please report either on User talk:DrTrigon or as described in User:DrTrigonBot#Bug list und feature request!
- Hier KEINE neuen Anfragen HINZUFÜGEN, bitte melden sie diese entweder auf User talk:DrTrigon oder wie beschrieben in User:DrTrigonBot#Bug list und feature request!
TODO
[Bearbeiten | Quelltext bearbeiten]Toolserver Issue Tracker (JIRA): Issue Navigator unresolved • Issue Navigator • Report Task |
DONE (engl. only!)
[Bearbeiten | Quelltext bearbeiten]Included from: Benutzer:DrTrigonBot/ToDo-Liste/DONE
Files related to DrTrigonBot-"framework" (look also at Botwiki:Python:DrTrigonBot scripts):
Configuration: config.py
, user-config.py
, (others?)
Data: (BoW training, ...)
Look also at Benutzer:DrTrigonBot#Source.
Aktuelle Version
[Bearbeiten | Quelltext bearbeiten]Aktuelle Revision (FishEye): Activity • Files • Users • RSS feed |
Id | What | Script |
---|---|---|
Bugfix for:
|
framework, dtbext_query.py, runbotrun.py | |
25 | Spezial:Permanentlink/64368286 sandbox for new created status/subster bot. This bot watches text on external (or internal) pages, the text is specified by regex. If this text changes, the bot copies the new/actual text into the wiki. Tag model to mark places of text drop is borrowed from MerlBot (and this sub-bot could also be used on my local wiki with an anacron-job ;). | subster.py |
Version 0.2.0000
[Bearbeiten | Quelltext bearbeiten]Revision 1 (WebSVN): Log |
Id | What | Script | |
---|---|---|---|
Feature/Bugfix/... for:
|
page_disc.py, dtbext_query.py, dtbext_wikipedia.py, mailer.py | ||
Feature for:
|
mailer.py, mailerreg.py | ||
Feature for:
|
subster_beta.py, substersim.py | ||
Bugfix for:
|
mailer.py | ||
Bugfix for:
|
dtbext_wikipedia.py | ||
Bugfix for:
|
framework, dtbext_query.py, runbotrun.py | ||
Bugfix for:
|
dtbext_wikipedia.py | ||
B50 | Spezial:Permanentlink/64115999 Notification links; don't work. Used target link from http://toolserver.org/~merl/UserPages/query.php?user=DrTrigon&format=xml (parameter/value 'url'). But the handling of foreign-wiki-links in framework could still be improved (interwikimap). | dtbext_pagegenerators.py, sum_disc.py | |
37 | Creation of a stable release v0.2.0000 based on bug fixes, new feature and improvements in current semi-stable release v0.1.0013. Testing of this version and update of bot software on toolserver to this new version. Extensive testing of all new functions and whole code like output of times (correct timezone setting) and others.
Setup of new CRON-job for compression of history once all 14 days. |
(toolserver) | |
MailerBot created. This sub-bot is very experimental at the moment; it does it's job for me, probably/hopefully for you to, but do not expect miracles at this current state (but ideas, hints and bug reports are still very welcome). | mailer.py | ||
36 | Check CPU load of whole bot script with:
/usr/bin/time -v python /home/drtrigon/pywikipedia/runbotrun.py -cron because the bot runs every day at least for 1 hour, so the load must not be too high! If it is, try to delay the expensive calls and/or contact the toolserver admins what to do, because 'nice' is already in use. The load of history compression has not to be checked, since the script runs not that often. Result: drtrigon@nightshade:~$ /usr/bin/time -v python /home/drtrigon/pywikipedia/runbotrun.py -cron Command being timed: "python /home/drtrigon/pywikipedia/runbotrun.py -cron" User time (seconds): 32.23 System time (seconds): 2.86 Percent of CPU this job got: 1% Elapsed (wall clock) time (h:mm:ss or m:ss): 46:18.79 Average shared text size (kbytes): 0 Average unshared data size (kbytes): 0 Average stack size (kbytes): 0 Average total size (kbytes): 0 Maximum resident set size (kbytes): 0 Average resident set size (kbytes): 0 Major (requiring I/O) page faults: 0 Minor (reclaiming a frame) page faults: 42316 Voluntary context switches: 204822 Involuntary context switches: 3707 Swaps: 0 File system inputs: 0 File system outputs: 1816 Socket messages sent: 0 Socket messages received: 0 Signals delivered: 0 Page size (bytes): 4096 Exit status: 0 should be ok with 1% CPU load. |
(toolserver) | |
B44 | For several pages related with MerlBot the bot was not able to retrieve section (heading) info. Those issues appeared on following pages:
and were related to |
dtbext_wikipedia.p | |
F45 | List of parameters/options enhanced:
(similar to Benutzer:CopperBot/Überwachte Seiten) Mentioned by Benutzer:Merlissimo (was already implemented in bot code). |
(publishing) | |
B49 | Bugfix for:
Spezial:Permanentlink/63464336#Problem_mit_DrTrigon_Bot has solved itself, according to Spezial:Permanentlink/63713171. |
||
34 | Poll/survey under the bot users about happiness and feature requests. | (my talk page) | |
F38 | New feature integrated, by using data from http://toolserver.org/~merl/UserPages/query.php?user=DrTrigon&format=xml the bot is now able to give users hints if any of their other wiki talk pages was changed, just for the case of getting messages there.
Mentioned by Benutzer:Merlissimo in chat. |
dtbext_wikipedia.py, dtbext_pagegenerators.py, sum_disc.py, sum_disc_conf.py | |
B47 | Bugfix for:
Reported by [[ Forrester ]]. |
sum_disc.py | |
F46 | As long as F45 is not implemented yet, please add
to the surveillance list. Mentioned by Benutzer:Merlissimo. |
sum_disc_conf.py | |
B48 | Bugfix for:
look at id B45 Reported by [[ Forrester ]]. |
(look there) | |
B46 | Bugfix for:
|
sum_disc.py | |
B45 | Bugfix for:
|
sum_disc_conf.py | |
From id 44 splitted into B?? and F?? for simpler counting. | |||
33 | Bot runs all 24 hours (instead of all 48 hours).
Proposed by Benutzer:Forrester (for BLUbot). |
toolserver | |
List of paramters/options enhanced:
|
sum_disc.py, sum_disc_conf.py | ||
40.2 | Bugfix for:
|
sum_disc.py | |
42, 43 | Bugfix for:
The bot reports redirects (created by users) as new discussions (which of course are non-existent). Maybe redirects should be resolved and the resulting pages checked against the ignore-list? This change was mentioned by Benzen C6H6 09:20, 5. Jul. 2009 (CEST) (probably a hamster?). |
dtbext_wikipedia.py, dtbext_query.py, sum_disc.py | |
17.2 |
|
sum_disc.py | |
35.2 | Method of choice for wiki content access is to use the API and retrieve pages in bunches (of e.g 10 pages). Dumps are very fast too but not available on a daily base (all 2 days only), look at http://download.wikimedia.org/dewiki/ also (Benutzer:MerlBot claims to use dumps). So switched to handling of page read access with API (it looked to me that the recent rev. of bot-framework have switched partly to access over API, but that is wrong; it are a few commands only, mostly user or config related), this was already implemented with different methods in dtbext.wikipedia.Page().get(). Now it was implemented for simultaneous read of a bunch of pages with prop=revisions (rv) in dtbext.wikipedia.Pages().get(). But this has a small drawback, the parameter/option 'directgetfull_switch' becomes a kind of useless or obsolete (because it is faster to re-read single pages per block, then splitting the block and all handling related with this), therefore removed parameter/option:
It is important that the original and the new page read methods deliver the same page content! look also at id 17 |
dtbext_wikipedia.py, dtbext_query.py, sum_disc.py | |
21 | Merged with most recent pywikipedia-framework 6977 and testing of new functions, as well as update of wikipedia.py (because of lines 725-728). The bot reports now on every run the most recent framework rev. because of verbosity and information (download via svn could be automated, look also at Benutzer Diskussion:Benzen#Fragen wegen Bot-Modifikation). The extensions needed for the bot are now techn. adapted to the framework and packed into one single package with same structure like the framework:
|
(various) | |
35.1 | Introduced bundeling of API calls. A new class 'Pages' performs all actions on a set of pages simultaneously. Currently page version history information can be retrieved this way, this can be enhanced on other functions too.
look also at id 17, 35.2 |
wikipediaAPI.py, sum_disc.py | |
Full compression (not fast) improved by adding a part that removes all old/removed headings from history page entries. | wikipediaAPI.py, sum_disc.py | ||
|
sum_disc.py | ||
Heavily improved parameter/option handling, all internal used user specific bot parameters/options can be modified by the users. The behaviour either to overwrite or preserve the defaults can easily be configured internally. | sum_disc.py, sum_disc_conf.py | ||
18 | List of paramters/options enhanced:
|
sum_disc.py, sum_disc_conf.py | |
17.1 | Speed-up by:
|
wikipediaAPI.py, sum_disc.py | |
28 | Clean the code by removing all unnecessary 'try' constructs, replace '(... == None)' contructs by the shorter (and cleaner) version and add documentation.
look also at id 26, 27 |
sum_disc.py | |
32 | Compression could be done by comparing against the backlinks. This was implemented and tested, here the observations: the original method depends on '_SearchForSignature(...)', the alternative method on a properly configured 'backlinks_list'. if properly configured the alternative mode is a lot faster, but it depends on this config, whereas the original method works always! the statement all pages listed on backlinks have currently a signature is correct if 'backlinks_list' is properly configured (how to check this?) | sum_disc.py | |
27 | The checksum checker in '_checkThData(...)' improved (no need for 'try' structure anymore) and the different modes documented. Mode v1 was removed because it is very old, if there is any history entry with v1, then it is ignored, the page is reported and logged again. This should cause not that much problems (since there are very few or no such entries existent anymore). | sum_disc.py | |
31 | Checkout of API functions/calls, here some examples:
look also at id 13, 24 |
||
24 | The page Spezial:Änderungen an verlinkten Seiten (Special:RecentChangesLinked) could be implemented as an addition to Spezial:Beiträge (Special:Contributions) here is an example for DrTrigon. Unfortunately I was not able to find an API for this, but list=backlinks (bl) which is the same as Spezial:Linkliste (Special:WhatLinksHere) and this was implemented as an addition. The use of this is configurable by a new parameter/option:
|
wikipediaAPI.py, sum_disc.py | |
29 | All pages 'Wikipedia:Löschkandidaten/<DATE>' are read twice (second time with 'getFull()'), therefore a parameter/option was introduced:
look also at id 17 (This option may later become obsolete, e.g. when updating to new bot framework; if the default read method 'get()' is changed) |
sum_disc.py | |
26 | Bot should run without any use of 'fall back' mode in '_getThContent(...)' therefore this obsolete, old and unreliable mode was removed but replaced with a warning message (just in case). | sum_disc.py |
Version 0.1.0013
[Bearbeiten | Quelltext bearbeiten]Id | What | Script | |
---|---|---|---|
List of paramters/options enhanced:
|
sum_disc.py | ||
40.1 | Bugfix for:
All those are related with the different history formats and the way they were read in and updated. All older formats are converted now during read and in to v5. Took from Version 0.1.0014. |
sum_disc.py | |
Bugfix for: Spezial:Permanentlink/61457165 / 1st entry; head recognition out of sync, added another check for validity (to decide to use full read mode). Took from Version 0.1.0014. | wikipediaAPI.py | ||
Bugfix for: html tag heading recognition; tags with additional values are now recognized too. | wikipediaAPI.py | ||
Bugfix for: heading recognition; e.g. Spezial:Permanentlink/61381636 / last block, 1st entry; because of some <nowiki>-tags the function had some problems. All '=' within those tags are substituted by '=' as work-a-round. | wikipediaAPI.py | ||
Bugfix for: re-reported discussions; e.g. Spezial:Permanentlink/61311960 / last entry or Spezial:Permanentlink/61309424 / 1st entry of last block; this is related with the different history formats and the way they were read in and updated. Some older formats are converted now during read and in that way the data can be updated properly. | sum_disc.py | ||
Bugfix for: twice appearing headings; the API returns wrong links they are not complete, the numbering (from 2 upwards) is missing. So a fix for this was implemented now the bot should really be able to get all head links correct. | wikipediaAPI.py | ||
The DrTrigonBot status panel was added to the Toolserver Wiki as a tool (with description page, a.o.) and a bit modified (added a link to User:DrTrigon). | panel.py | ||
35.3 | Bugfix for: Spezial:Permanentlink/61105710 / 1st entry of last block; the page was processed without any heading info, in fact it could not retrieve the headings (sections). Added an additional/alternative method if the first fails.
look also at id 28 |
wikipediaAPI.py | |
35.4 | Bugfix for: Spezial:Permanentlink/61106764 / 2nd entry; the headings got out of sync, in the 2nd half it was shifted about 1. The problem was that headings recognized by wiki parser has to start at the first char of a line!
look also at id 28 |
wikipediaAPI.py | |
5 | testing of new 0.1.0013 release. upload of it and rearrange data on toolserver.
|
local, toolserver | |
15 | it would be a good idea to use https://jira.toolserver.org/secure/BrowseProjects.jspa the Toolserver issue tracker if the amount of bugs reported and to handle, exceed a certain number, but for the moment it is not needed. the problem is you cannot add a new project as normal user, so the system here has to deliver this service. | BUGS | |
16 | had issues with creation of BOT-MESSAGE containing some special chars (e.g. ':'). the algorithm should now be more stable, but it estimates the correct heading only, so could still make some problems.
|
sum_disc.py | |
13 | sample bot output:
* Processing User List (wishes): Benutzer:DrTrigonBot/Diene Mir! NOTE: You have new messages on wikipedia:de ... if such a message apears again, the bot now read its discussion page Benutzer Diskussion:DrTrigonBot and prints/show the content to output (log file). that informs you about the message and should lead to supress further display (of the same old message). |
runbotrun.py | |
14 | a very intressting fact is the page Spezial:Änderungen an verlinkten Seiten (Special:RecentChangesLinked) could be able to replace or be an alternative to Spezial:Beiträge (Special:Contributions). here an example for DrTrigon (linked to Benutzer Diskussion:DrTrigon) to show the capabilities. | sum_disc.py | |
9 | if the bot is executed by CRON job on toolserver the preprocessing/filtering of log file output does not work properly. have added another regex to the output handler, hope that helps. | runbotrun.py | |
10.3 | purge page cache through wiki API according to example http://toolserver.org/~drtrigon/wiki-purge.html (has to be deleted from the toolserver now) implemented as wikipediaAPI.Page(...).purgePageCache(). the issues occured here earlier have disappeared or were due miss-programming/-understanding of the API by myself. | wikipediaAPI.py, sum_disc.py | |
10.2 | the render given in http://svn.wikimedia.org/viewvc/pywikipedia/trunk/pywikiparser/ looks to be (very) old; the newest contributions are Modified Mon Oct 29 06:41:05 2007 UTC (19 months, 1 week ago) and the wiki API seems to do a quite good job. so the bot is using the API at the moment. | wikipediaAPI.py, sum_disc.py | |
11 | because of improvements in 10.1 before the bot is now able to detect level 1 headings (through a work-a-round, but it is possible) with the old it was nearly impossible. so level 1 headings are taken into account except the bot has to fall-back into the old mode so the wrong display of discussions in front of level 1 headings (on problematic pages such as Löschdiskussionen bei Benuterseiten, WP:FzW, WP:Auskunft, Portale and others) should stop now. | wikipediaAPI.py, sum_disc.py | |
10.1 |
|
wikipediaAPI.py, sum_disc.py | |
7 | '_setUser(...)' and '_getUsers()' used both two seperate parameter/option handling mechanisms. since the entry in Benutzer:DrTrigonBot/Diene Mir! without and parameters in brackets '{...}' offers already a config option; you can specify any target page within your userspace. this option and the "real" options in brackets where handeled in 'self.userListInfo' and 'self.param'. those two mechanisms were now merged and the target page in 'self.userListInfo' is added to 'self.param' as 'userResultPage'.
|
sum_disc.py | |
6 | config management/system for running bot in seperate file; all options (global vars) moved there. the variables could be named nicer (more self-explainig) but the external options names (parameters that can be set within Benutzer:DrTrigonBot/Diene Mir!) must not be changed! never!
|
sum_disc.py, sum_disc-conf.py | |
8 | extended search/check range/context:
added. some of them could be extended by Wikipedia:Administratoren/* (but this would trigger more traffic with few benefits only). this change was mentioned by Benutzer:Benzen in Benutzer Diskussion:DrTrigon/Archiv#Bot: Reklamation. |
sum_disc.py |
Version 0.1.0012
[Bearbeiten | Quelltext bearbeiten]Id | What | Script | |
---|---|---|---|
18 | parameters/options introduced with configuration directly on Benutzer:DrTrigonBot/Diene_Mir!
currently possible/implemented options are:
|
sum_disc.py | |
12 | format/system of saved checksums had to be changed once again (3rd or 4th time) because of the dict used with subheadings as key was not unique (all subheadings on a page may have the same name); now the bot is using a list again, but a list of tuples (there are still some issues open with subheadings, this may make another change necessary...) | sum_disc.py | |
4 | add interface for history's compression; was done initially but does not make sense since the permissions of user 'apache' will not allow to to re-write the history. the good possibilities are to do it by hand (manual) with a toolserver login (from time to time) or setup a e.g. monthly or weekly CRON job
|
panel.py | |
3 | improvement of admin log interface: 'filelist' and 'checkfiles' are not necessary, the whole path can be generated only once just before the call of 'os.remove'! | panel.py | |
2 | instead of using 'runbotrun.py -auto', now a CRON job was created on toolserver, this should in order take care on the bot timing and be able to deal with server reboots. with this action all issues related with this are solved. call is of bot is 'runbotrun.py -cron' | runbotrun.py | |
1 | admin interface added for removal of old log files instead of using cyclic logs (5 files e.g.)
|
panel.py |