Benutzer Diskussion:Stefan Kühn/Check Wikipedia/Archiv/2009/März
Math causes false positives on bracket detection
Users on French Wiki report false positives on "Template not correct end", detection 43, unclosed "{{":
<math>\frac{{-1}^i}{(2i)!}</math> is correct since it brings .
Also there are false positives on "Template not correct begin", detection 47, unopened "}}":
<math>n = 1 + (n_s - 1) \times { (\frac{p} {p_s}) \times { (\frac{T_s} {T}) }}</math> is correct since it brings .
Possibly suspend these detection within math scopes. Thanks --66.131.214.76 18:40, 1. Mär. 2009 (CET)
- ! Yesterday, I have change a simple thing in my math-detection-programm. But I make a mistake. I have fix this. -- Oksk 19:00, 1. Mär. 2009 (CET)
"Fehlende" Bildbeschreibung nur an anderer Stelle
Hallo Stefan! Bei Nahwa meldet dein Skript eine fehlende Bildbeschreibung. Die ist jedoch sehr wohl da, nur an einem ungewöhnlichen Platz angegeben (vor "thumb" usw). Das sollte zwar eher die Ausnahme sein, aber mitteilen will ich dir das trotzdem. lg, → «« Man77 »» 01:25, 2. Mär. 2009 (CET)
- Siehe hier: Wikipedia_Diskussion:WikiProject_Check_Wikipedia#Bilder_ohne_Beschreibung. Sollte eigentlich besser an den Standard aller Bilder angepasst werden. -- sk 19:27, 2. Mär. 2009 (CET)
Improvement to summary information
Hi Stefan, I believe it would be interesting info it you could add the total number of articles that were scanned in the summary section at the top, for example following «Last scan: The script found 37701 errors in 30468 articles; 0000000 inspected ». French wiki hit the 40000 limit lately and it would have been interesting to know how many were not scanned. --66.131.214.76 15:06, 2. Mär. 2009 (CET)
- Hello! My script has a limit of 40000 errors. I set this limit only for one reason. If I program subroutine for error detection and make a mistake, then my script can't crash the toolserver. When the script scan the new dump of frwiki then it finish this after 40000 errors and write this errorlist with 30468 articles. I don't known how much errors total in the frwiki-dump. If I set the limit higher then the scan will need more then 4 hours. To this list of 30468 articles every morning I add a list of new articles (2165 from the last 48 hours) and last recent changes (4900 from the last 48 hours (every 3h I take 500 changes)). So I think at the moment every morning 30468+2165+4900 = ~37500 articles will be scanned. I think also at the moment the limit is ok. If after the next dump the number of errors lower then 40000 then you see that I have scan the full dump. - I write this feature "123 inspected" at my to-do-list. -- sk 20:13, 2. Mär. 2009 (CET)
- Hi Stefan, thanks a lot, this is extremely interesting info that I transmited to the French Discussion page. We elected to deactivate detection of <small> that was producing 12750 detections. Let's hope that all articles will be scanned next week. Thanks again --66.131.214.76 04:06, 3. Mär. 2009 (CET)
References
Hello again! On the English Wikipedia, the tag en:Template:Reference isn't being counted as a <references /> tag. Could you add that to your script? Thanks. -Drilnoth 21:54, 2. Mär. 2009 (CET)
- , I have insert this. -- Oksk 16:15, 3. Mär. 2009 (CET)
- Great; thanks. -Drilnoth 22:43, 3. Mär. 2009 (CET)
Replacement of "<" and ">"
Hi Stefan, I noticed (French wiki again) that you started substituting < for < and > for > in the report, likely to deactivate HTML code from the translation pages. I believe it is an excellent idea and will reduce the number of problems arising from translation (also this is a first step toward using anchors -- ankertext like this one -- in the summary at the top!)
Note however that this change deactivated the text colouring that you used recently to highlight the newer detections :
- <span style="color:#e80000;">
Now this text appears on the error report (unless it was removed at translation). It is up to you, but I guess it would be better to avoid using HTML in the English text of the translation page if you wish to retain that "<" and ">" replacement. Thanks again --66.131.214.76 16:19, 3. Mär. 2009 (CET)
- Yesterday, I make a main change in the script. Now I use for the text the API. Like this. Today I see this problem and I hope I can fix it. -- sk 20:24, 3. Mär. 2009 (CET)
- , I change my script. -- Oksk 21:37, 3. Mär. 2009 (CET)
Code 034: Template programming element
Article contains a "#if:" or "#if exist" or "#switch:" or "{{NAMESPACE}}" or "{{SITENAME}}" or "{{PAGENAME}}" or "{{FULLPAGENAME}}" or "{{{1}}}" (Parameter).
nl:Geschiedenis van de wiskunde | Tussen 400 v. Chr. en 200 n. Chr. bestudeerden Indische wiskundigen, met name de |
Why is this article counted as error? Thanks for all your efforts.
87.212.82.171 17:55, 3. Mär. 2009 (CET)
- In this section is {{{feit}} with 3x"{{{". -- sk 20:12, 3. Mär. 2009 (CET)
DEFAULTSORT in Swedish
The Swedish (sv.wikipedia) synonym for DEFAULTSORT is STANDARDSORTERING. The Swedish synonym for REDIRECT is OMDIRIGERING. For File:...thumb...left...right we now write Fil:...miniatyr...vänster...höger. --LA2 13:25, 1. Mär. 2009 (CET)
- At the moment I have hardcode this MagicWords in my script. In the future I will use the API with this Result. But at the moment I write at an other part of my script. -- sk 19:02, 1. Mär. 2009 (CET)
- STANDARDSORTERING is still not recognized in report 37. --LA2 10:11, 4. Mär. 2009 (CET)
False negative for DEFAULTSORT
The article ca:Vafþrúðnismál doesn't have DEFAULTSORT or ORDENA, and is never flagged. --JoRobot 19:38, 3. Mär. 2009 (CET)
- The script only checks accents in the first 3 characters since in general only the first few characters play a role in the alphabetical ordering of articles in their categories. --66.131.214.76 03:32, 4. Mär. 2009 (CET)
Seiten, die nicht vorhandene Bilder einbinden
Wie wäre es mit einer Liste aus Seiten die nicht vorhandene(gelöschte?) Bilder einbinden? S. auch hier. MfG, º the Bench º WikiWartung 18:41, 5. Mär. 2009 (CET)
- Generell wäre das nicht schlecht, aber leider fehlt mir dazu die Technik. Ich nutze nicht die Datenbank auf dem Toolserver und kann deshalb dort auch nicht mit SQL nach den entsprechenden Bildern fahnden. -- sk 20:21, 5. Mär. 2009 (CET)
Update?
The list for the English Wikipedia wasn't updated for March 5th... I don't know if it's just running really late or if there's a bug, but I thought I'd let you know. -Drilnoth (Talk) 00:40, 6. Mär. 2009 (CET)
- Here you can find the update. I change from txt to html. Now we have no problems with UTF-8. -- sk 08:32, 6. Mär. 2009 (CET)
- Ah... great! A much appreciated change. -Drilnoth (Talk) 20:28, 6. Mär. 2009 (CET)
Template not correct end
Please disable the detection for error 043 within <pre> tags. --fryed-peach 08:01, 5. Mär. 2009 (CET)
- Hello Fryed-peach, can you give me an example? I detect "pre" in all articles, so in error 43 it should not be a problem. -- sk 20:16, 5. Mär. 2009 (CET)
- An old version of ja:tanasinn was listed, then "fixed" by this edit, and now not listed. --fryed-peach 18:32, 8. Mär. 2009 (CET)
New detection of HTML ellipsis
Hi Stephan, possibly your script could detect use of HTML &he
llip;
instead of symbol « … ». Bye --66.131.214.76 17:13, 8. Mär. 2009 (CET)
- Thanks for this info. I will test it. Maybe next weekend I have time to make this. -- sk 17:39, 8. Mär. 2009 (CET)
error 52
Hi Stefan. On it.wiki we use the it:Template:Bio where a parameter is the category, with square brackets. The template is placed at the beginning of the page, [1], so it generates a lot of false positives. Anyway, it's not urgent. Thanks. --Red Power 21:22, 9. Mär. 2009 (CET)
- Hello Red Power, I never seen this construct. Only in IT you have this. I don't think that this is a good construction. In all other wikipedia we write the categories at the end or invisible inside a template, but never like this. I think it is better you change this template or you deactivate this error in IT. IMHO it is better to change this bad template. -- sk 21:25, 11. Mär. 2009 (CET)
- I'd like to destroy that monster! Unfortunately it's used in 97.000 biographies, and I believe it's the most complex template of ALL the wiki projects... I've disabled the error, and I'll try to find a way to change the template. Thanks anyway. --Red Power 22:33, 11. Mär. 2009 (CET)
Nested table
The article sv:Elitserien i ishockey 2008/2009 with nested tables was falsely reported as a table that doesn't end. --LA2 22:22, 9. Mär. 2009 (CET)
- Oh, yes this is wrong. This is a known error. I will change this in the future, but at the moment I work at an other part of this project. Thanks for this info. -- sk 21:27, 11. Mär. 2009 (CET)
Improve the summary of the report
Hi Stefan, with 53 detection types, it has now become difficult to locate a given section in the WikiCheck report: some long sections separate shorter ones. Possible improvements:
- Organise your summary report at the top to group error types according to priorities, like the report itself
- Add an anchored link for Top/Middle/Lowest priority in the summary so that users can quickly jump to each such section
- Allow more priorities (up to 10 ?) so that we can better separate error types that require long-lasting effort from those that are fully resolved and just require maintenance.
When you can find time, of course, I see you are quite busy ! ;) Bye --66.131.214.76 23:04, 9. Mär. 2009 (CET)
- Hello IP, thanks for this infos. To 1.) This is a good point. I thinks this is possible. To 2.) At the moment I have no idea how to make this. I try this, but I had problems with the different languages of the headlines. To 3.) I think three is enough. It is like an traffic light (only green, yellow and red). If I make more priorities we will have an endless discussion about an error. Is this error priority 5, 6 or 7. With three levels we have enough. -- sk 21:58, 11. Mär. 2009 (CET)
Deactivation in the Norwegian wiki
Hi Stefan
Most of the instances of <big> and <u> in the Norwegian Wikipedia are not errors. Could you please deactivate the scans for these? Could you also deactivate the scans for em and en dash, as the amount of these indicate that they are not considered errors in our wiki?
Thank you & keep up the good work! --Helt 13:04, 9. Mär. 2009 (CET)
- Hi, you can activate and deactivate individual detection types from your translation page, by assigning 0 to the value of the corresponding variable, like : error_000_prio_XXwiki=0. In French WikiCheck, we added comments clarifying this under the «priority» heading of the translation page. --66.131.214.76 23:15, 9. Mär. 2009 (CET)
- Hello Helt, I see you found the way. :-) -- sk 21:15, 11. Mär. 2009 (CET)
- I did, thank you. --Helt 07:27, 12. Mär. 2009 (CET)
Title in text
Hello, there is little problem: In some articles there are naviboxes, but not included through template, but like pseudotemplate. eg. in cs:Boháňka is
{{HlavičkaObecNav|Boháňka|Boháňka|Obec}} [[Boháňka]] | [[Chloumek (Boháňka)|Chloumek]] | [[Skála (Boháňka)|Skála]] | [[Votuz]] {{PatičkaObecNav}}
Would it be possible to ignore selflinks between {{HlavičkaObecNav}} and {{PatičkaObecNav}} in cs.wiki? This is used in about 1000 articles JAn Dudík 01:10, 9. Mär. 2009 (CET)
- I think it is the best to fix this in the articles. It is an error. I know that it is very easy to make this bold with this syntax, but it is not right. Make it with a bot. -- sk 21:11, 11. Mär. 2009 (CET)
- But these selflinks have it's own sense - easier copying to other articles, because this are subst:NaviTemplates. JAn Dudík 14:07, 12. Mär. 2009 (CET)
- This is right. At the time you create this insert, it is very easy. But after this insert, every time you find with every script this selflink. At the moment there are 423 article with this error in cswiki. Maybe there are more, for example 5000. If you are four men then you can fix very day 100 articles (only 25 per person) and in 50 days this is never a problem. I have no idea, how to fix this problem in my script. Also I want find selflinks in templates. So please change it or deactivate this error. -- sk 15:56, 12. Mär. 2009 (CET)
- But these selflinks have it's own sense - easier copying to other articles, because this are subst:NaviTemplates. JAn Dudík 14:07, 12. Mär. 2009 (CET)
++
The Swedish Wikipedia page ++ produces several false reports, probably because something goes wrong with its strange title. The correct URL is %2B%2B --LA2 13:14, 2. Mär. 2009 (CET)
- I will check this later. At the moment I have kick this article from the list. Thanks for this info. -- sk 20:19, 2. Mär. 2009 (CET)
In der lateinischen Wikipedia wird der folgende Fehler angezeigt:
Title in text
Found a link to the title inside the text. Change this [[Title]] into '''Title'''
This is a new error. If you find a bug then please tell this here.
This error was found 1 times.
la:C++ | [[C++]] lingua programmanda ex lingua C creata.==Origo |
Was daran seltsam ist: Der in der Fehlermeldung genannte Text ([[C++]] lingua programmanda ex lingua C creata.==Origo
) kommt in der Seite la:C++ gar nicht vor … --UV 23:34, 13. Mär. 2009 (CET)
- Also ich glaube mein Skript hat ein Problem mit dem Textholen, wenn ein "+" im Titel enthalten ist. Der erwähnte Text stammt aus dem Artikel la:C. -- sk 18:01, 14. Mär. 2009 (CET)
< math > not correctly detected
Hi Stefan, this seems like a false positive for 047 on French wiki:
Front de flamme | ...ée de D sur tau">u_L \simeq \sqrt{\frac{D_T}{\tau}} |
The complete line is:
- <math title="vitesse égale racine carrée de D sur tau">u_L \simeq \sqrt{\frac{D_T}{\tau}}</math>
which brings:
Possibly argument «title» in «math» confuses the detection. --66.131.214.76 21:36, 6. Mär. 2009 (CET)
- Thanks for this information. I will change this in my script, but at moment I have no time. Maybe next weekend I can change this. -- sk 17:29, 8. Mär. 2009 (CET)
- , I have insert this in my script. -- Oksk 09:59, 15. Mär. 2009 (CET)
request to add Arabic wikipedia
Hi. I wonder whether it is possible or not to generate a check wikipedia report for Arabic wikipedia? --Ciphers 07:12, 12. Mär. 2009 (CET)
- I will try it at the weekend. -- sk 10:28, 12. Mär. 2009 (CET)
- . You find the output Okhere. -- sk 09:59, 15. Mär. 2009 (CET)
False positive in error 44 « Headlines in bold »
Hi Stefan, we got this false positive in French wiki :
== Acte au sens d'''instrumentum'' ==
The script does not distinguish the case of an apostrophe followed by italics.
Bye --66.131.214.76 03:04, 9. Mär. 2009 (CET)
- Very interesting, but do you think that we need italics in our headlines? I don't think so. I think every special font design is wrong in headlines. We had one css for all headlines and this is enough. -- sk 21:13, 11. Mär. 2009 (CET)
- Hello Mr Kühn.
- Italics in title is sometimes OK. If someone wants to devote a whole section to a painting from Monet, like Impression, soleil levant, he could insert :
- == On the importance of Impression, soleil levant ==
- Cantons-de-l'Est 09:04, 15. Mär. 2009 (CET)
The detection should be more precise :
It should not detect titles like this :== L'''Africa'' sous les Antonins == ,
but should detect== L''''Africa''' sous les Antonins == .
Regards, Cantons-de-l'Est 09:04, 15. Mär. 2009 (CET)
- Hello Cantons-de-l'Est, please say Stefan! For this headline you can also write
- == On the importance of "Impression, soleil levant" ==
- == On the importance of Impression, soleil levant ==
- It has the same information. We don't need this italic in headlines. -- sk 09:58, 15. Mär. 2009 (CET)
- As for your information we usually try to respect the typographic requirements as for titles of book, even in headlines :) In french, (I don't know for any other language), it's required (if I'm not mistaking) to write in Italic for various cases (except for h1): for those who understand french fr:Wikipédia:Conventions_typographiques#Italique. We usually avoid to use quotes/double quotes for showpieces. Loreleil 17:25, 16. Mär. 2009 (CET)
- I will try to fix this. -- sk 20:14, 16. Mär. 2009 (CET)
- , I have change my script. If I found '' and no ' before or behind in a headline with ''', then I will not declare this as error. Thanks for your information, this helps to improve the script. -- Oksk 20:22, 16. Mär. 2009 (CET)
error 56
hello, "Ascii art" is detected also in comments like here : http://fr.wikipedia.org/w/index.php?title=Warcraft&action=edit§ion=9 I don't think it's a good idea... Al1-fr 10:43, 16. Mär. 2009 (CET)
- Normaly the script exclude the comments for this error. I will check this. Thanks for this info. -- sk 13:44, 16. Mär. 2009 (CET)
- I have found the problem. This is a bug in my script. I detect <!--> as an complete comment. I will fix this tonight. -- sk 13:55, 16. Mär. 2009 (CET)
- Thanks ! Al1-fr 14:55, 16. Mär. 2009 (CET)
- , I have changed my script. -- Oksk 18:16, 16. Mär. 2009 (CET)
Existing error 3 not found
Hi Stefan. I wonder why the program didn't gave an error 3 (<ref> without <referenes>) in this version] of da:Saturn V? I it is checked as it was on the 2009-03-15 list with an error 48 (title in text). Regards, Byrial 13:35, 16. Mär. 2009 (CET)
- My script don't check all articles. When this article was in the last dump ok, then I don't find it now. An other script find changes in the Wikipedia. But I don't use all change, because this is to much. If this error in the next dump, then my script will find it. -- sk 13:48, 16. Mär. 2009 (CET)
- The error with the missing <references> tag must have been in the last dump, as it was introduced in the article in an edit from September 12, 2007 and have been there until today when I fixed it. So I suppose that could be an error in your script since the error was not detected. Byrial 14:20, 16. Mär. 2009 (CET)
- I found the problem. The <ref> is inside a table. When my script check the article it exclude the tables at the moment. So I don't find the <ref>. Yes my script is not perfect. :-) -- sk 14:29, 16. Mär. 2009 (CET)
Suggestion (wish list !)
Hello (again) ; it would be nice to have a list of titles ( == == ) with :
- a bad capitalization (all in caps : == SUGGESTION (WISH LIST !)==)
- an ending ":" (==Suggestion :==)
In french wiki, I found some of them thanks to error 44. I found some copyright violations with this format (the author has only added title marks), so I think it is an interesting case Al1-fr 15:04, 16. Mär. 2009 (CET)
- , I have insert both errors. But for this with capitalization I also test the length of this headline. Only if the length bigger then 10 (without "=" and " ") then my script say this is an error. So a normal abbreviation like "== UNO ==" is not a problem. I hope this is ok and we find not so much false errors. -- Oksk 18:44, 16. Mär. 2009 (CET)
mdash or ndash
The correct character for Em dash (—) = "\u2014" and for En dash (–) "\u2013". Why replacing those characters? Is there a Wikipedia guideline that says that character 2013 in the source is preferred above the HTML entity – ?
2013 and 2014 are outside the normal characterrange (ASCII 32-127) and replacing the characters make the source more difficult to process. Character 2013 looks in the edit window like a minus sign, but those characters visual appear differently (differently rendered by the browser).
87.212.82.171 12:44, 15. Mär. 2009 (CET)
- Hello IP, my intention is that this code – and — is not so good like the correct Unicode character. I will not replace the Unicode character, I will only replace this variante with code. I hope I have found the right words. -- sk 15:25, 15. Mär. 2009 (CET)
- I do not fully understand it. Can you give an example where you solved the problem? 87.212.82.171 16:34, 15. Mär. 2009 (CET)
- Sorry, this was my fault. This code "–" has to relpace by the correct Unicode. So that we have in the article text only the unicode and not the code "–".-- sk 20:37, 15. Mär. 2009 (CET)
- See this one in French Wikipedia. --66.131.214.76 00:25, 17. Mär. 2009 (CET)
- Oke clear. On the Dutch wikipedia all the ndash and mdash are replaced. 87.212.82.171 18:56, 17. Mär. 2009 (CET)
Exclude blockquotes from error 39 < p >
Hi Stefan, HTML symbol < p > is necessary within <blockquote>
blocks since it does not allow normal «white line » paragraph separation, for example :
<blockquote>Paragraph one...
Paragraph two.</blockquote>
brings
Paragraph one... Paragraph two.
Please exclude blockquotes from this detection, if possible. Note that in French Wikipedia we use template {{Citation bloc|quote text}} as an equivalent to the <blockquote></blockquote>
pair. -- Laddo 66.131.214.76 00:43, 17. Mär. 2009 (CET)
- Thanks for this info. At the moment I don't detect this blockquote. But I can include this in the future. -- sk 22:07, 17. Mär. 2009 (CET)
Shorter text for error 54 (Line break at the end of a list item)
Hi Stefan, some of the items listed in new error 54 span 5-15 lines, which is making that section of the WikiCheck report uselessly large. I suggest that you retain at most 60 characters (or so), so that if it is longer, you output <first 30 char>...<last 30 char>. Just a suggestion, of course... ;) -- Laddo 66.131.214.76 02:10, 17. Mär. 2009 (CET)
- Good idea, I will try this in the next days. -- sk 10:12, 17. Mär. 2009 (CET)
- , I have change this in my script. -- Oksk 22:03, 17. Mär. 2009 (CET)
Dependency on coordinates.pm
The perl script seems to need suddenly coordinates.pm. Therefor it does not run on my default perl installation. I'm not a perl guru, but can this dependency avoided? 87.212.82.171 18:58, 17. Mär. 2009 (CET)
- Oh, yes I have add something in the last days and not change the zip. I will fix this in the next days. This is a package from me for geocoordinates. I will check this also with this script. -- sk 22:04, 17. Mär. 2009 (CET)
Errorcode 3
On the Dutch wikipedia we have als a template called Unreferenced to include references. As follows: {{Unreferenced|date=maart 2009}}
Can you include this in your perl script? 87.212.82.171 20:17, 17. Mär. 2009 (CET)
- , I have change this. -- Oksk 22:06, 17. Mär. 2009 (CET)
New detection for < /div >
Hi, I saw this case of a closing </div> with no opening counterpart. Possibly detect such cases? no rush, we're busy enough on French Wiki ;) -- Laddo 66.131.214.76 02:35, 19. Mär. 2009 (CET)
- This is also a good idea. At the moment I don't check the div-tag. -- sk 11:41, 19. Mär. 2009 (CET)
wiki with lots of errors french wiki
hello Stefan, On French Wiki, we have some error with low severity, and with multiple detection ( > 6000 ! ) When your script analyses a dump, it stops before ending because max of 40,000 errors is reached. With this mechanism, some "high severity" errors are not detected. I think it is a good idea to limit the number of "low severity" errors to 20,000 (or other value), to give chance to all "high severity" errors to be detected [ or any other variant : limit the number of errors per case, ..... ] ? Al1 08:34, 19. Mär. 2009 (CET)
- At the moment I design a new system. The result is that not every day all articles with errors will be checked. So I only check for every error maximum 100 articles. If my script found in this 100 article only 60 articles with this error, then I check more 40 articles with this error. So the time for the scans will be shorter. For example if an error was found 20000 times, then I check only the 100 articles for the output and not more at one day. So I will fix this problem in the next time. -- sk 11:48, 19. Mär. 2009 (CET)
No detection by error 58 (Section title all in uppercase)
Hi Stefan, this new error did not detect anything in French nor Deutsch (still awaiting English report). I have no example to show, but it sounds suspect that nothing is detected at all. -- Laddo 199.22.61.2 17:52, 18. Mär. 2009 (CET)
- Also nothing in en, so I will change the length from 10 to 8. I have test my script with 10 and it works very well. --sk 11:40, 19. Mär. 2009 (CET)
- Fair, then , thanks -- Laddo 66.131.214.76 02:43, 20. Mär. 2009 (CET)
Title in text
Using wikilink on the title in the text has a valid use when <onlyinclude> is used for transclusion in other pages (portals etc). This is often used on the introduction/summary at least at nowiki. Stigmj 13:45, 13. Mär. 2009 (CET)
- It is ok, if it is in the namespace of portals (Nr. 100/101). But my script search only in the article-namespace (Nr. 0) and in the namespace of files (Nr. 6). See: namespace for de and namespace for no. I think it is not a good idea to link in this two namespace at the title. Maybe you can give me a better example. -- sk 20:51, 13. Mär. 2009 (CET)
- We use pages in the portal-namespace which transclude pages from the article-namespace. This is a perfectly valid usecase in nowiki, and requires the article in namespace 0 to have a link to itself in the text. There has also been discussions about using transclusion of featured articles on the mainpage, which on nowiki is in namespace 0 as well. Stigmj 16:09, 15. Mär. 2009 (CET)
- Hello Stigmj, can you give me an example for an article. So that I can see where the problem is or how I can solve this. Thanks! -- sk 16:26, 15. Mär. 2009 (CET)
- no:Deltaprosjektet ist en typisches Beispiel. MfG BjørnN 17:01, 16. Mär. 2009 (CET)
- I understand! But I think the <onlyinclude> tag should only insert in template-pages. In German we have this rule and I think in English it is the same. At the moment in nowiki the script found 386 articles and in the output of 200 I found only 20 articles with the construct ]]''', which is an indicator for this onlyinclude problem. I think you should fix this 10% or you fix the other articles and then you deactivate this error in nowiki. At the moment I don't see a chance to change this in my script. -- sk 20:44, 16. Mär. 2009 (CET)
- We have deactivated this error in nowiki. I have fixed almost 100 of them, so there should be about 300 left. Is it possible to put the list of remaining articles at my userpage no:Bruker:Wikijens/checkwikipedia and I can fix the remaining errors without creating problems for pages that are being transcluded? Wikijens 14:46, 21. Mär. 2009 (CET)
- , I have create this page. -- Oksk 16:56, 21. Mär. 2009 (CET)
- Thanks. Wikijens 17:00, 21. Mär. 2009 (CET)
Truncated output for error 56 (Arrow art)
Hi Stefan, the individual output for new detectons 56 (Arrow art) starts one character too late, like:
...->[[Catégorie:Chronologie de la sociologie|*1944 en...
−− should display the full-->
at the start
If possible, it would be even better if you could output a few characters before the faulty arrow, so that it would be easier to see if the case is really "arrow art" or if it is the mismatched ending of a comment, like this one was:
...{{Portail|sociologie}}-->[[Catégorie:Chronologie de la sociologie|*1944 en...
I guess 10-12 characters before the arrow would be enough, if at all possible. Thanks -- Puzzled Laddo 66.131.214.76 02:56, 20. Mär. 2009 (CET)
---
BTW it reported this false positive in this article:
L<small>i</small> <= dispo<small>i</small>
---
Finally (!) would it make sense to also detect "<->
" (↔) and "<=>
" (⇔) ? -- Laddo 66.131.214.76 03:35, 20. Mär. 2009 (CET)
- I can change this with the letter before. Also I can include "
<->
" (↔) and "<=>
" (⇔). No Problem. Why is this a false positive in this article? -- sk 08:27, 20. Mär. 2009 (CET)
- I can change this with the letter before. Also I can include "
- These two characters " <= " in fr:Algorithme du banquier mean "smaller or equal" -- this rule should detect " <== ", three characters. I cannot understand why it gets detected and reported here. -- Laddo 66.131.214.76 00:26, 21. Mär. 2009 (CET)
- If it mean "smaller or equal" than please use <math>. My script only detect "<-", "<=", "=>" and "->". So I get also all "<===" and so on. -- sk 13:27, 21. Mär. 2009 (CET)
- Ah, I see. I though you were only detecting three-character sequences. No worries. -- Laddo 66.131.214.76 01:43, 22. Mär. 2009 (CET)
Extra </nowiki>
The translation source text (for Japanese) has extra </nowiki> for error 57 and 58:
error_057_desc_script=One headline in this article end with a colon <nowiki>"== Headline : =="</nowiki>. This colon can be deleted.</nowiki>.
--fryed-peach 18:42, 20. Mär. 2009 (CET)
- This is right, but where is the problem? -- sk 13:30, 21. Mär. 2009 (CET)
- error_057_desc_script and error_058_desc_script contain a closing
</nowiki>.
but no opening<nowiki>
. Please delete the unnecessary</nowiki>.
from both error_057_desc_script and error_058_desc_script. Thank you! --UV 16:48, 21. Mär. 2009 (CET)
- error_057_desc_script and error_058_desc_script contain a closing
- , thanks for this info. -- Oksk 07:33, 22. Mär. 2009 (CET)
A solution for the 'small' controversy
Hi Stefan, as you know several projects had deactivated error 42 about the <small>; this is basically because it's not a deprecated tag, it's difficult to explain how to fix the "error", and also because there are semantic reasons to not use a span instead of a small [2]. I've a little proposal to change or to split the error, in order at least to detect the "bad use" of the small in those projects. I'll give you some examples:
- small inside a ref, like
<ref><small>text of the footnote</small></ref>
: this is bad, since the text size for the references is already set to 90%; another small will produce x-small text - small inside an image caption, like
[[Image:X.png|thumb|<small>Description.</small>]]
; again, it's already set to 94% in the stylesheet, and it's also a bad practice - double small, like
<small><small>text</small></small>
- small opened but not closed with </small>
I'm sure you and the others can think about more examples and a way to reduce the false positives. If you split the error it'll be just a bit more easy to explain, for example:
- error 42, article with a 'small' tag: don't use it if it's not really necessary
- error 63, bad use of the 'small' tag: you should never do this...
and then, every project can choose to deactivate error 42 or to keep both active. Unfortunately, the main interpretation problem of error 42 will remain, so the best solution is to change error 42 completely. I hope it's clear enough. --Red Power 17:06, 18. Mär. 2009 (CET)
- This is a good idea, I will create more errors. If you have more examples, then tell me this here. --sk 11:39, 19. Mär. 2009 (CET)
- I've just found this: [3]. I don't think that a small inside or before <sup></sup> or <sub></sub> is a good idea. --Red Power 19:02, 22. Mär. 2009 (CET)
Template not correct begin
Hi Stefan, on cawiki the script found an error on ca:Introducció_a_la_teoria_de_grups#Grups_de_Galois:
:<math alt="x = (menys b més menys l'arrelquadrada de (b al quadrat menys 4 a c)) over 2a">x = \frac{-b \pm \sqrt {b^2-4ac}}{2a}.</math>
{-b \pm \sqrt {b^2-4ac}} is considered a template with a non correct begin but it is not a template.
Tschüs!. --Loupeter 12:38, 22. Mär. 2009 (CET)
- , I change my script. Thanks for this info. -- Oksk 12:42, 22. Mär. 2009 (CET)
DEFAULTSORT wiki-pt
Could you please check what the error is here. Maybe there is something wrong in the translation, that is screwing up the table. Thanks. GoEThe 18:34, 22. Mär. 2009 (CET)
- , I change Okthis. -- sk 19:03, 22. Mär. 2009 (CET)
GoEThe, there is no table for that report, but only the links to the articles. --Red Power 20:15, 22. Mär. 2009 (CET)
DEFAULTSORT with special letters
Hi Stefan
Is it possible to exclude articles with just one character in the title from this scan? These articles are often about symbols or letters in a particular alphabet, and not too easy to sort alphabetically. --Helt 13:38, 22. Mär. 2009 (CET)
- Yes, this is possible. But at the moment I work at an other big problem. -- sk 10:12, 23. Mär. 2009 (CET)
Template not correct begin, error number 47
Your script has found this error in sv:8 Flora, but I can't see why. Can you please check if this is a false positive? /Sten André 21:55, 23. Mär. 2009 (CET)
- The problem is the line "|omloppsbana_ref = <ref name=JPL>{{JPL}} Läst 6 februari 2009}}</ref>". -- sk 22:30, 23. Mär. 2009 (CET)
- As description my script think at this point that there is the end of the "Infobox planet". So it found later the correct end and say, sorry there is no begin. -- sk 22:32, 23. Mär. 2009 (CET)
Headlines with capitalization
Within this category also headers consisting of numbers are listed, which should be excluded. --seismos 09:50, 22. Mär. 2009 (CET)
- , I have change this. Only A-Z will be counted. -- Oksk 11:54, 22. Mär. 2009 (CET)
- Now we have an other problem. I will change this tomorrow. -- sk 10:11, 23. Mär. 2009 (CET)
- I saw a couple more:
- Exclude any < ref > from the scan of this detection , see fr:115e régiment d'infanterie de ligne
- For a title to be uppercase, most characters must be uppercase, though not necessary all:
- fr:15e régiment d'artillerie got this false positive : « === Restauration, Monarchie de Juillet , Second Empire , IIIe République jusqu'à la Première Guerre mondiale === »
- fr:161e régiment d'infanterie de ligne got correctly identified with « ==== PERTES DU 161e REGIMENT D’INFANTERIE AU COURS DE LA CAMPAGNE 1914-1918 ==== » despite that it contained some lowercase
- Possibly report it only if at least 80% of alphabetical characters are uppercase.
- -- Laddo 199.22.57.2 12:50, 23. Mär. 2009 (CET)
- I saw a couple more:
- At the moment my script has a big bug. It count all big letters. So there is many false positive. -- sk 14:34, 23. Mär. 2009 (CET)
- Also - look for <10 chars words made only with capital letters, so it wont find "Różnice między kodowaniami MS-DOS CP852 a IBM CP852" (in Polish: "Differences between codepages MS-DOS CP852 and IBM CP852"). Looking for headlines with >80% capital letters may cause further problems. Matma Rex answer me on plwiki 17:19, 24. Mär. 2009 (CET)
- Another, better example: "XXXVIII FMFT - 2007" - roman year and abbreviation. Matma Rex answer me on plwiki 17:20, 24. Mär. 2009 (CET)
- Also - look for <10 chars words made only with capital letters, so it wont find "Różnice między kodowaniami MS-DOS CP852 a IBM CP852" (in Polish: "Differences between codepages MS-DOS CP852 and IBM CP852"). Looking for headlines with >80% capital letters may cause further problems. Matma Rex answer me on plwiki 17:19, 24. Mär. 2009 (CET)
- At the moment my script has a big bug. It count all big letters. So there is many false positive. -- sk 14:34, 23. Mär. 2009 (CET)
Ascii art
Fail on Ascii art: http://nl.wikipedia.org/wiki/Sommatie
87.212.82.171 23:28, 23. Mär. 2009 (CET)
- No, it is correct. See line "In [[C (programmeertaal)|C]]/[[C++]]/[[C sharp#|C]]/[[Java (programmeertaal)|Java]] kan je deze code gebruiken, mits n, m en x zijn ''int''-types. Dat x een ''array'' is. En dat <code>m</code> <= <code>n</code>:"
- There you find a "<=" this is an arrow. If you mean "smaller or equal" then use <math> for this phrase. -- sk 08:47, 24. Mär. 2009 (CET)
- Ah yes I see. Thanks! 87.212.82.171 11:30, 25. Mär. 2009 (CET)
error #43
Hi Stefan,
On this specific case, Error #43 is detected. I don't know if it's really a bug ? Al1 12:08, 25. Mär. 2009 (CET)
{{Infobox Communes de France
| nomcommune=Argonay
[...]
| géoloc-département=
|}}
- What is the name of the article? -- sk 13:25, 25. Mär. 2009 (CET)
- I forgot to mention it ! http://fr.wikipedia.org/w/index.php?title=Argonay&oldid=39267266 . Al1 14:09, 25. Mär. 2009 (CET)
- It is a bug, but I have no idea how to detect this. In a template with parameter like "nomcommune=XYZ" it is an bug. But in other templates without "parameter=value" it is not a bug. For this problem I need a list with parameter for every template. This is not possible. -- sk 14:35, 25. Mär. 2009 (CET)
- OK. On detections of today, I've changed |}} to | }} to avoid the detection, even if it's a bug. It happens very few times. Thanks for information. Al1 20:41, 25. Mär. 2009 (CET)
Reformat?
Per a comment at [4], I was wondering if it would be possible for you to modify the script to update the page on Wikipedia directly, rather than generating if on the toolserver. I see that you have some other things to deal with, so take your time... it's not really urgent or anything. Thanks for your time and this great script! -Drilnoth (Talk) 18:01, 25. Mär. 2009 (CET)
- Sorry, I am not a good programmer. I can only write this script for creating this output, but I have no idea how to modify a page at Wikipedia directly. If you found a way how I can make this with perl, I will try this. -- sk 05:19, 26. Mär. 2009 (CET)
- Okay; I thought it was worth asking. I don't know anything about using Perl. Thanks anyway! -Drilnoth (Talk) 15:31, 26. Mär. 2009 (CET)
fyi --AwOc 15:10, 29. Mär. 2009 (CEST)
A similar suggestion for splitting was done here: fr:Discussion Wikipédia:WikiProject Check Wikipedia#Affichage des erreurs -- Laddo 66.131.214.76 20:12, 29. Mär. 2009 (CEST)
- I set the limit from 100 to 50 article per error. Tomorrow the size of the page will be smaller. -- sk 22:02, 29. Mär. 2009 (CEST)
Feed-back on new statistics
Hi Stefan, your new statistics are much informative and interesting! The new links in the summary are very useful too, since in French Wikipedia where there are still so many detections. I am a bit confused with the new statements that appear in the report : at the top, you still indicate « improvement in 30468 articles », which is not the same number as « Today the script scan 7400 articles ». Is it possible to explain the new scheme that you have put in place ?
A couple of typos to correct:
- With the last scan the script checked 7400 articles. At the moment the script identified 33002 ideas for improvements in 30468 articles.
- Today the script scanned 7400 articles
- ...articles with errors from last dump or scan
-- Laddo 66.131.214.76 01:39, 31. Mär. 2009 (CEST)
- Hello Laddo, thanks for the correction of my English text. I will insert this tonight. Now the explain: If my script found in the dump 90000 errors in 60000 articles, then every day the script would scan this 60000 articles. This was a problem of time and resources. I think with some more errors I need more then 24 hours. So I change my script complete. I scan the full dump and find 90000 errors in 60000 articles. But now I only scan all new articles and change articles. For example I find in this scans 100 times the error nr.50, so I need not to scan one more of this 60000 articles for error nr.50. If the script only found 90 times of error nr.50 so I will scan the next 10 of 60000 with this error nr.50. If in this 10 I found only 2 new of error nr. 50 I will scan the next 8 of 60000 with this error nr.50 and so on. - With this work process I can improve the speed of a scan. Now it is possible to have more then 40000 errors (see en). With the next scan also in fr we see more then 40000 articles. :-) Thank at all in French Wikipedia for the good work. -- sk 10:09, 31. Mär. 2009 (CEST)
- Hi again Stefan, the improvement is great, your daily scan will never have to scan more than new+modified+as many identified bad as necessary, likely less than 10.000 even for largest Wikipedias! It would be best if it was possible to go over that list of 40000 during the full dump scan, though.
- Thinking of it, I see that you are still scanning files that you knew contained an error and were not modified. If you go one step further and keep the list of defective articles separately for each error type, you could save even more processing by keeping track of articles and errors that were previously identified. Let me illustrate:
- Say that detection nr.50 found 3000 errors in 3000 articles in the last dump. In the first report, you list 50 errors, and 35 of them get corrected by contributors during that day. The next day,
- 1) you scan 2100 new articles and find 5 brand new errors nr.50 (5 found so far, out of 2100 scanned)
- 2) you scan 4850 modified articles and find 4 new errors nr.50 (9 found so far, out of 6950 scanned)
- 3) now since you need 41 more errors nr.50 in order to list them in today's report, you can get back to your previous list of 3000, filter out those that were modified and re-scanned since [they were processed by step 2 above], so that there are, let's say, 2700 of those articles still unchanged since that previous scan. Then you can pick the first 41 errors of that list and put them in today's report (50 found, still only 6950 articles scanned).
- 4) If you maintain that list of 3000 by removing modified (and re-scanned) ones and adding newly found errors, then next day you will have a list of 3000 minus 300 plus 9 = 2709 articles with that error nr.50.
- At then end, you would only ever scan new or modified articles on a daily basis. Once a first full scan has been made for a given detection nr., it would incrementally maintain the list of articles where that type of error was found, without ever re-scanning an unchanged article.
- The full scan would remain necessary only for new detections or to catch modified articles that your script would not be aware of.
- I don't know if all of this is feasible, it would require that you separately maintain the list of articles for each type of detection (and the text of each error in each article); all that data could become messy. Think of it anyhow, and thanks again for your invaluable contribution to Wikipedia. ;)
- -- Laddo 199.22.57.2 19:37, 31. Mär. 2009 (CEST)
- If I understand all text right, then it this my workflow. First I scan only new and change articles. And if I don't found 50 of one error, then I scan the next articles from the big list with only this error. Maybe I don't understand it right or I don't describe my script not right. :-) -- sk 21:00, 31. Mär. 2009 (CEST)
- What you explained is clear: I understand that you maintain ONE big list of articles for all errors, from which you re-scan only when you need extra cases for a given type of detection. What I suggest is to maintain one list FOR EACH TYPE of detection, so that you would not need to scan them at all. -- Laddo 66.131.214.76 23:50, 31. Mär. 2009 (CEST)
- But my script produce this "for each type list" on the fly from the "big list". This is very easy and fast. With more list I have more problems to update and so on. I see not the improvement of this "for each type list". Did you need this "single error lists"? -- sk 13:25, 1. Apr. 2009 (CEST)
- OK, what you do is simpler and fast, all good. -- Laddo 66.131.214.76 04:19, 2. Apr. 2009 (CEST)
Unicode control characters
Occasionally templates don't work as invisible unicode characters get into the fields. Mediawiki strips whitespace and the like, but not these. This leads to parserfunctions not working. Would it be possible to check for these?
The characters would be (\u200E|\uFEFF|\u200B). Possibly there are uses for these, but I doubt. There is a related request at en:WT:AutoWikiBrowser/Feature_requests#Unicode_control_characters. -- User:Docu
- I know this problem see this change. I will try to detect this character inside the template. -- sk 09:57, 31. Mär. 2009 (CEST)
- , new error nr. 16 "Template with Unicode control characters". -- Oksk 16:57, 1. Apr. 2009 (CEST)
- Thanks. The other day I had to fix a whole series [5]. I updated the description for en.wp accordingly. -- User:Docu
- Isn't error 16 still deactivated ? :
error_016_prio_script=-1
-- Laddo 66.131.214.76 04:28, 2. Apr. 2009 (CEST)
- Isn't error 16 still deactivated ? :
- @Laddo: copy the new translation at your page of translation. I have switch the old error 11 and 16 in complete new errors with new description. -- sk 09:55, 2. Apr. 2009 (CEST)
- . Section "News" is a great idea too ! -- ErledigtLaddo 66.131.214.76 02:57, 3. Apr. 2009 (CEST)
Headlines with bold
Some articles use bolds to emphasize something in their headlines. For example, ja:フランスにおける日本の漫画 has a headline "フランス語の '''{{lang|fr|manga}}''' という語について", which indicates "manga" is written in non-Japanese characters. I think such partial uses should be accepted. --fryed-peach 04:19, 3. Mär. 2009 (CET)
- If this is a problem in Japan then deactivate this error. -- sk 17:55, 3. Apr. 2009 (CEST)
Reformatting
I thought that you might want to look at en:Wikipedia_talk:WikiProject_Check_Wikipedia#Reformatting. -Drilnoth (Talk) 19:08, 20. Mär. 2009 (CET)
Template parameter with problem
In no:wiki wurde folgenden Fehler entdeckt, ist aber kein Fehler:
{{veibilde|Floyen1024 019 new.jpg|På [[Nygårdstangen]] i [[Bergen]] møtes [[Image:Tabliczka E16.svg|24px|link=Europavei 16]], [[Image:Tabliczka E39.svg|24px|link=Europavei 39]] og {{riksvei|555}}.}}
MfG, BjørnN 10:45, 31. Mär. 2009 (CEST)
"unsuale" in error_060_desc_script is a typo? --fryed-peach 17:57, 31. Mär. 2009 (CEST)
In plwiki in pl:szablon:związek chemiczny infobox are parameters:
- Pochodne ?
- ?
Those names of parameters issue errors in new error section, but they don't. Malarz pl 21:41, 31. Mär. 2009 (CEST)
- Thanks for all informations. I will fix this as soon as possible. @Fryed-peach: Ok! @Fryed-peach: Now "?" is possible. -- sk 05:58, 1. Apr. 2009 (CEST)
- Here's another one: http://nl.wikipedia.org/wiki/Limes. Sjabloon:Auteur does not need a parameter, but just text why copyright is violated. Rudolphous 07:58, 1. Apr. 2009 (CEST)
- At en.wp there is a series as well [6]. Interesting idea to add question marks to the parameters, but probably problematic in some applications. Anyways, if one sorts the templates by name, they are easy to spot. -- User:Docu
On pl.wiki:
pl:King Biscuit Flower Hour Presents: Deep Purple in Concert | Lista utworów, * "Highway star" (Blackmore/Gillan/Glover/Lord/Paice) * "Not Fade Away" (Petty/Hardin) |
pl:Perfect Strangers | Lista utworów, * występuje tylko na kasetach i CD, brak na winylowym LP |
In those two articles there are no errors. The '*' char is part of parameter value. Malarz pl 21:53, 9. Apr. 2009 (CEST)
error 52 (again)
hello,
we have some inconsistent "error 52" in French wiki too
sample here : [7]
the section is [8]
and the code : {{Événement à venir|nature=un [[:Catégorie:Projet de prolongement de transport en Île-de-France|projet de prolongement transport en Île-de-France]]|catégorie=[[Catégorie:Projet de prolongement de transport en Île-de-France]]}} the syntax is used to pass a category name to a generic template. I think you should'nt check category names inside model arguments.
tks
Al1
- I think this is a bad example for a template. It is possible to make a categorie with a template. See de:Freital there is the city template for German cities. In one parameter we write "Bundesland=Sachsen" and this template generate the categorie Kategorie:Gemeinde in Sachsen (Village in the federal state Saxony). I think the same is possible in fr. Please ask in fr:Projet:Modèle. Maybe they can fix this problem. -- sk 21:06, 13. Mär. 2009 (CET)