Jump to content

MediaWiki talk:Modernisation.js

Add topic
From Wikisource
Latest comment: 7 years ago by Nemo bis in topic Broken JavaScript

Language variant conversion in php

[edit]

A reminder it exists already a builtin facility in mw core for language variant conversion, see /language/LanguageConvert.php. Actually we do it in javascript, but what about php in future ? [1], [2]Phe 15:54, 28 September 2011 (UTC)Reply

Catalan localisation

[edit]

Hi. Could it be added those variables for ca.source, please?

dictionary_page
	'ca':'Viquitexts:Diccionari'
ws_alphabet
	'ca':'a-zçàéèíïóòüúA-ZÇÀÉÈÍÓÒÜÚ'

And what about parametrize French words "Texte modernisé"? ("Text modernitzat" for Catalan). Thanks. -Aleator 19:21, 26 February 2011 (UTC)Reply

Hi!
You may have interest in testing the updated version available on Portuguesemultilingual Wikisource. See for example the page s:pt:Elementos de Arithmetica/Capítulo 1 (there will be a menu on top of page). That version is also used in other projects.
To install it on ca.wikisource, just copy/translate this configuration to your /common.js (or /vector.js) page and this CSS to your /common.css (or /vector.css) page.
For more information, see s:fr:Wikisource:Scriptorium/Janvier 2011#New version of script for modernization. Helder 18:36, 2 March 2011 (UTC)Reply
Yes, I saw examples at es.souce and I find it great! I'll make some tests in some days. Thanks a lot! -Aleator 23:33, 2 March 2011 (UTC)Reply
Is this problem already solved by using the new script version ? (I see this version has not been updated to add what you needed). — Phe 19:07, 25 September 2011 (UTC)Reply
Yes, each wiki can translate the text of the links and customize variables such as the list of characters used by the (new version of the) script. Helder 21:58, 25 September 2011 (UTC)Reply

MediaWiki:Modernisation.js

[edit]

Moved from Wikisource talk:ProofreadPage

First a thought on when to use this script, it makes little sense to use it to modernize very old text, let say Mid-English to modern English, modernizing words or group of words in this case will only produce boulgui-boulgua, a language using grammatical and idiomatic expression of Mid-English mixed with modern English words, such lang doesn't exists so it's pointless imho, Mid-English to modern English require a real translation (I'm using it as an example, I don't think such things has been done on en.ws).

Now the bug:

  1. Local dictionary are not honored, a local dictionary is a dictionary defined in the page to modernize itself, as opposed to the per-wiki global dictionary. It allows to decrease the size of the global dictionary for very rarely used words.
    overlooking of it, the trouble was in our local dictionary, anyway I changed a bit the regexp parsing local dict to make it more robust.
  2. Local dictionary must be multi-level when sub-pages are used, i.e. for Title Book/Part IV/Chapter III, each level of sub-pages, if existing, can contains a dictionary.
  3. Local dictionary must have precedence over global dictionary, it's unclear if this is actually true.
  4. Must we allow some part of text to not be modernized by protecting with a class i.e. <span or div class="no-modernization">...</..>
  5. Future, it'll useful to have an external tool to manage dictionary. Detecting rare words which are candidate to be put in local dict, and the reverse case. Detecting words in a local dict but used in page w/o such words in the local dict (i.e. a candidate for addition in local or global dictionary.) Detecting word in both local/global dict with the same definition.

My plan is to handle multi-level local dict, but it's a per-wiki decision to use only a global dict (easier) or to add local one (more efficient as the global dict can become huge). Use of multi-level local should be configurable.

Beside that two links User_talk:ThomasV#Modernisations and User_talk:ThomasV#Improving_Modernisation.js for proposed changes. And this is probably the last version of this script [3].

Phe 12:24, 18 September 2011 (UTC)Reply

As announced on French Scriptorium some months ago, that is indeed the latest version of the modernization script, and it is also being tested in some other projects. I've added some documentation on English Wikipedia, and there are more details on Portuguese Wikisource (but it is in Portuguese, sorry).
The local dictionary has precedence over global dictionary and it is possible to avoid the conversion of parts of the text by adding a class to an HTML element. See pt:Template:Manter ortografia. The only tool I currently have to analyse the dictionaries is this scritp which change the background of each rule depending on how much the original and the modernized expressions differ (see w:Levenshtein distance). This sometimes helps to find unappropriated rules.
Could you elaborate on your second point?
PS: Shouldn't this section be on MediaWiki talk:Modernisation.js? Helder 14:54, 24 September 2011 (UTC)Reply
PS. done
The point 2 example used too many level, let see with a page named The Book with sub-page The Book/Chapter 1 etc. This book contains rare words, but widely used in the whole book. Rare mean it's not a good idea to put them in the global dict, widely used in the whole book means it makes sense to allow to put these words in the page The Book rather than in each chapter sub-pages. So dict will be applied in this way, first global dict, then top level page, the sub-page of the top-level pages and so on down to the viewed page itself. It's not really costly to do because the api allow to get the content of many pages in only one query, there is some complication because the query will return pages contents out of the order we need. Beside that I suggest the local dict itself must be get through the api call, there is two different parser to parse dict, one for the local dict, one for the global, the local one is parsed through html which is pretty fragile and we can remove this part of the code this way.
Apart that, perhaps this script waited an enough long time and before the code is changed we could deploy it in place of the old one (except perhaps [4]). There is three questions to solve first. First must we deploy it in a place it'll protected against everyone edit and we can both edit it ? part b, is this useful, or do you want to take the whole maintenance ? Second do you expect any change in the behavior of the old script ? only minor change in rare corner case ? none ? Third, what about performance compared to the old script, roughly the same ?
We will see later for future improvement. — Phe 19:07, 25 September 2011 (UTC)Reply
The script already allows one to define multiple dictionaries and I think this could be used for the purpose you suggested. E.g.: On Portuguese Wikisource, we defined
global_dic_page : {
	'pt-br':'Wikisource:Modernização/Dicionário/pt-PT|Wikisource:Modernização/Dicionário/pt-BR',
	'pt-pt':'Wikisource:Modernização/Dicionário/pt-PT'
},
This way, when converting the text to pt-br, the script will first process the content of the (common) pt-PT dictionary and then redefine any rules which are also defined on pt-BR. You could make those dictionary names to be defined by splitting wgPageName on '/' to get the title parts as needed.
About you three questions:
  1. I think it is good idea to have more people improving the script, so I agree that we should put it where both of us could edit it. Since you are an admin here, I copied the script to User:He7d3r/Tools/LanguageConverter.js and will post a comment in the wikis where the script is being used suggesting them to update to the new location.
  2. If I remember correctly, the behavior didn't change during the updates (except for the treatment of the capitalization of "sequences of words", which I tried to improve a little). Most of it should work as before.
  3. I didn't really profiled the performance of both versions to see which one is best. Maybe we could do some tests using jsperf.
Helder 21:58, 25 September 2011 (UTC)Reply
I installed it on my local machine and got trouble from the typo_changes array, my config is here (but not working at all on old), trouble come from a page like User:Phe/Test1, in the word "divertiſſement" only the first instance of ſ is replaced in the whole text by a "s", any idea ? — Phe 12:46, 27 September 2011 (UTC)Reply
I got it working by changing [ 'ſ', 's' ], to [ /ſ/g, 's' ], is this intended or I'm missing a point ? [5] show typo_changes are not defined as regular expression but mere strings, this file is not correct ? — Phe 14:54, 27 September 2011 (UTC)Reply
Sorry for causing trouble. There is an undocumented but expected behavior. Typo changes can be defined in two different ways. Using strings (which will be used for global search and replace):
typo_changes : {
	'lang-code': {
		'old string': 'new string',
		'old string2': 'new string2',
	}
}
or using regexes:
typo_changes : {
	'lang-code': {
		[ /a global regex should have "g" here:/g: 'new string' ],
		[ /a inSEnSitIVe and global regex should have "i" here:/gi: 'new string' ]
	}
}
Take a look in the part of the code right after "if ( changes.constructor === Array ){".
But I noticed one file which is likely wrongly defined: pt:MediaWiki:Gadget-LanguageConverter.js. The [ '«', '“' ] should likely be changed to [ /«/g, '“' ]. The script should probably be improved to test if the first element is an string or a regex object and if it is a string, do a global search. Helder 15:55, 28 September 2011 (UTC)Reply

Broken JavaScript

[edit]

MediaWiki developers found that this page probably breaks JavaScript for users (example: not seeing the buttons when editing a page). You probably need to edit this .js page and/or MediaWiki:Gadgets-definition as in the examples at phabricator:T122755. List more pages to check.

If you have questions or need help, please ask at phabricator:T164242. You can login with your wiki account. Best wishes, Nemo 09:49, 14 May 2017 (UTC)Reply