User:Glrx

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Notes about images[edit]

Mona Lisa JPEG image is 90 MB. The served JPEG is 21 kB.

WMF does some interesting processing when it displays images. One might think that a JPEG image of the Mona Lisa is just transmitted to a browser, but that is not the case. The File:Mona Lisa, by Leonardo da Vinci, from C2RMF retouched.jpg is 90 MB, and that much data takes time to transmit (about 1 second at 1 Gbit/s or 10 seconds at 100 Mbit/s) and could put a big dent in a modest cellular telephone data plan.

Instead, WMF does something different. The 90 MB, 7,479 × 11,146 pixel, JPEG is downsampled to the display size. The result is a small image is transferred over the network. The transfer is faster, and the impact on data bandwidth and data plans is much smaller. For example, the downsampled JPEG is just 21143 bytes, a mere fraction of 90 MB.

Information for downsampling[edit]

The wiki markup will specify a particular image to display. The markup will specify a desired size (such as a width of 160 pixels).

Below is speculation. JPEG and PNG files may be reparsed at each inclusion. I need to check the sources.

I expect a database to hold critical properties. With quick access to some properties, it will not be necessary to parse the image file or the image description page. Such properties would be a base URL, file type, width, and height should be accessed quickly. From that information, one could quickly determine the HTML to include the image. The img element's width, height, and URL can be determined without reading the image file.

That is not true for SVG files. The image size can be determined, but SVG URLs many need to specify the desired language. My understanding is the SVG file must be parsed to learn the IETF language tags that the file supports (that step can be skipped if |lang= is set or for the en.Wiki). Reparsing the file makes using an SVG file expensive.

Note: Looks like SVG files need not be reparsed for languages. The MW database has an edited list of languages in its metadata. A MediaWiki API imageinfo query provides metadata and languages (MW type not IETF?):
Languages are filtered: zh_CN does not appear. Non-existent langtags also appear: zh.
https://www.mediawiki.org/wiki/Manual:File_metadata_handling

Caching[edit]

The WMF servers confront a computational burden because they must downsample the image, but local computation is less expensive than data bandwidth.

In addition, WMF servers also cache the images that it downsamples. If I ask for Mona Lisa at a particular width, then a WMF server will generate that size. That work is stored in a cache, so if I or somebody else asks for the same size image at another time, the cached version is supplied rather than re-downsampling the image. The cache saves computation and time.

The moment when the caching is done is also significant. Although I can ask for images at particular sizes, the usual scenario is the image (such as Mona Lisa) is used on a wiki article page. When a wiki page is updated, MediaWiki rebuilds the wiki page (creating a cached version of the wiki page) and also caching any new images that were added to the page. That can require a lot of computation, but the result is the wiki page and all of its images are now in the server caches. Preloading the cache reduces the latency that a user would experience when he views the page. He need only wait for the data to be transferred; he would not wait for the downsampling because it has already been done.

Moreover, WMF is telling my browser to cache local copies on my computer. If I view a wiki page with the Mona Lisa image on it, the wiki page and the Mona Lisa image are copied to my computer. I can leave that wiki page, but the local copies remain on my computer. If I reload the page later, my browser can display the page without re-downloading the page and image from the WMF server.

That local caching interaction can be involved. The mechanism is part of the Hyper Text Transfer Protocol (HTTP). When a server transfers web pages or images using HTTP, it will specify some caching information. That information tells my browser if it may cache the data and how long the cached data is accurate.[1]

Caches can cause trouble[edit]

Say wiki page ABC uses an image XYZ.

If page ABC is rebuilt every time it is accessed, then the page will always be up to date. If the page is cached, then the cache may have a stale version.

If somebody edits page ABC, then it is clear that page ABC should be purged from the cache.

If somebody edits image XYZ, then the cache should be cleared of XYZ. But the appearance of page ABC may also change even though none of the wikitext for page ABC has changed. How does page ABC get updated?

If the aspect ratio of XYZ does not change, then nothing much needs to happen. When page ABC is accessed, it comes out of the cache. The cached page has a reference to XYZ, but that image has been invalidated, so the new version of XYZ is fetched.

If the aspect ratio of XYZ changed, then the layout of ABC may have been altered. ABC needs to be rebuilt. MW maintains a database of where each image is used, so MW can invalidate all of the pages that use XYZ. There is a cascade: the invalidated pages may be transcluded, so more pages may need to be invalidated and rebuilt.

More on server caching[edit]

The server cache can be a separate set of servers positioned between the users and the actual servers. WMF uses Varnish.

Domain names[edit]

Domain names such as commons.wikimedia.org or upload.wikimedia.org must be resolved to an IP address. That resolution need not be to a single IP address. Check namespace resolution and redirect messages as ways to shuffle the load.

A domain name resolves to one IP address. Many domain names may resolve to the same IP address.[2] But I think a name may have many A records. I'm looking for information about random selection.[3]

Alt text[edit]

Proposal about alt= text being added to HTML.

Page regeneration[edit]

Consider a typical Wikipedia page. It will use templates and images.

If one of the templates is edited, then the Wikipedia page probably needs to be rebuilt. The template may affect the page content or layout. MediaWiki keeps track of which pages use a template, so when a template is edited, then MediaWiki knows which pages need to be regenerated. There can also be as cascade because some templates use other templates.

That means that editing a template that is used on thousands of Wikipedia pages would trigger the regeneration of thousands of Wikipedia pages. Editing commonly used templates should not be done lightly. Commonly used templates may be protected. For example, {{Cite book}} on the English Wikipedia affects almost 1.5 million pages.

Editing an image does not require rebuilding the pages that use the image. The page still references the same image name, but now the image scalers will supply the new image rather than the old one. The cached HTML of the Wikipedia page is still good.

Well, not quite. When MediaWiki builds a page, it specifies the width and height attributes of the img element. That allows the browser to layout the page before it has downloaded all the images. That avoids continual layout adjustments as image sizes are learned. So rebuilding pages may be required. WikiMedia could just do it all the time, or it could update pages only if a significant change occurred. If the image aspect ratio changed, then img elements would need to be updated. If a multilingual SVG file added another language, then pages may need updating.

SVG images[edit]

SVG Map of Gibraltar is 290 kB. The served PNG is 48 kB.

WMF processes SVG images in a similar manner as JPEG images. Instead of serving the actual SVG file on a wiki page, WMF builds a PNG file of the requested width and serves the PNG. There are a couple of advantages to serving a PNG.

First, serving the PNG file can be much smaller than the SVG file. For example, the SVG map of Gibraltar is 290 kB. The request above produced a 48 kB PNG file:

accept-ranges: bytes
access-control-allow-origin: *
access-control-expose-headers: Age, Date, Content-Length, Content-Range, X-Content-Duration, X-Cache
age: 74796
content-disposition: inline;filename*=UTF-8''Gibraltar_map-en.svg.png
content-length: 47987
content-type: image/png
date: Tue, 21 Dec 2021 02:25:44 GMT
etag: 8391f68640a7f0cedd3971fef7b8b3d3
last-modified: Mon, 01 Feb 2021 12:23:52 GMT
nel: { "report_to": "wm_nel", "max_age": 86400, "failure_fraction": 0.05, "success_fraction": 0.0}
permissions-policy: interest-cohort=()
report-to: { "group": "wm_nel", "max_age": 86400, "endpoints": [{ "url": "https://intake-logging.wikimedia.org/v1/events?stream=w3c.reportingapi.network_error&schema_uri=/w3c/reportingapi/network_error/1.0.0" }] }
server: ATS/8.0.8
server-timing: cache;desc="hit-front", host;desc="cp1078"
strict-transport-security: max-age=106384710; includeSubDomains; preload
timing-allow-origin: *
x-cache: cp1078 hit, cp1078 hit/2
x-cache-status: hit-front
x-timestamp: 1612182231.00644

However, the SVG file is transferred with GZip compression; the transfer size is only 89 kB. The compression factor is 290/89 = 3.26. The transfer size is larger than the PNG, but it is less than twice the size of the PNG (89/48 = 1.85).

Second, when WMF started supporting the SVG file format, the browser support for SVG was nonexistent or uneven. Serving PNG files had strong support. Serving PNG renditions of SVG files also gives a uniform presentation. SVG images can vary depending upon the availability of particular fonts and the depth of SVG support.

Directly serving SVG[edit]

SVG client side rendering (Phabricator T5593)

The img element allows animations but should block scripts. The object element allows scripts.

Scripts can be malicious. WMF blocks uploading SVG files that contain scripts.

What happens to mouse clicks? Wrap an a element around a bitmap file. Wrap it around an SVG file.

SVG files can be malicious. An SVG file could be a computational nightmare that taxes the computer. PNG files will render in finite time. The SVG renderer on WMF servers put a time limit on the rendering. If it does not complete within a few seconds, then the process is terminated. There are some SVG files on Commons that can hit that time limit.

There are some language translation differences when an SVG file is directly served; see below.

SVG is XML or not[edit]

Dislike XML.

SVG has namespaces, but HTML does not. HTML lossage creeps in.

If XML is so good, why is CSS not XML?

XML details[edit]

Some notes for later.

The XML Spec 1.0 (Fifth edition). https://www.w3.org/TR/xml/

The XML prolog is optional.

  • XML version
  • encoding (ASCII / ISO issue) EBCDIC and UTF 16.
  • standalone
    From the XML specification § 2.9:
    The standalone document declaration must have the value "no" if any external markup declarations contain declarations of:
    • attributes with default values, if elements to which these attributes apply appear in the document without specifications of values for these attributes, or
    • entities (other than amp, lt, gt, apos, quot), if references to those entities appear in the document, or
    • attributes with tokenized types, where the attribute appears in the document with a value such that normalization will produce a different value from that which would be produced in the absence of the declaration, or
    • element types with element content, if white space occurs directly within any instance of those types.

The default attribute values raise issues with #REQUIRED, #IMPLIED, #FIXED and default. https://www.w3.org/TR/xml/#dt-default

The SVG 1.1 style element has the type attribute, and that attribute is #REQUIRED. See https://www.w3.org/Graphics/SVG/1.1/styling.html#StyleElement That means Phab:T68672 ("SVG style element ignored if no type attribute is specified") may have been invalid, and that Commons:Commons SVG Checker should require type="text/css".

The SVG 1.1 DTD has

<!ATTLIST %SVG.style.qname;
    xml:space ( preserve ) #FIXED 'preserve'
    %SVG.id.attrib;
    %SVG.base.attrib;
    %SVG.lang.attrib;
    %SVG.Core.extra.attrib;
    type %ContentType.datatype; #REQUIRED
    media %MediaDesc.datatype; #IMPLIED
    title %Text.datatype; #IMPLIED
>

For SVG 2.0, the type attribute has an initial value of text/css. https://svgwg.org/svg2-draft/styling.html#StyleElement

There is also the style element content being a CDATA section. The SVG 1.1 conservative view was

  <defs>
    <style type="text/css"><![CDATA[
      rect {
        fill: red;
        stroke: blue;
        stroke-width: 3
      }
    ]]></style>
  </defs>

The CDATA section was needed to avoid entity interpretation and <. I remember having trouble at some point, but I think that was resolved by using CSS character literals rather than XML character literals. It may also be that the modern style element is a CDATA section rather than PCDATA. Find the references.

That SVG snippet also shows the style element within a defs element. That used to be common practice, but it may have never been needed. The advantage of the defs element was its content would never be rendered. There is more to say about defs; many elements (such as linearGradient) do not need to be within a defs element.

SVG DOM[edit]

Significant advantage.

Looking for type hierarchy, but not seeing what I want.

Descriptive elements desc and title.

Metadata element metadata.

Container elements such as g. SVG 2.0 says, "An element which can have graphics elements and other container elements as child elements. Specifically: ‘a’, ‘clipPath’, ‘defs’, ‘g’, ‘marker’, ‘mask’, ‘pattern’, ‘svg’, ‘switch’ and ‘symbol’."

Graphics elements such as line and text. Inherits from SVGGraphicsElement, so it has some methods, but not a type? Interface SVGGraphicsElement.

Style information in the DOM[edit]

I do not believe the SVG DOM makes all the style information available. A couple years ago I went looking for aural stylesheet information, and it was not there. Consequently, I do not believe that style properties such as -inkscape-font-specification are broken out. Does that mean that they disappear completely when the DOM is written out?

It may be that none of such style properties make any difference, so removing them could be seen as beneficial.

Attributes that could be removed[edit]

Is there a list of Inkscape attributes that are always safe to remove? For example, Inkscape could always regenerate the node type list. If Inkscape can regenerate the information, then why keep it?

Some Inkscape and sodipodi attributes should be preserved. Some g elements are identified as layers. Information about drawing grids does not take up much space, so removing that information does not have much benefit.

The significant benefit is the verbose style information.

Additional information to remove would be needless graphics state. For example, if stroke="none", then we probably do not care about stroke-width, stroke-dasharray, stroke-dashoffset, line joins, and end caps. Some font information may be a little different. If text has been converted to curves, keeping that information around would help in reconstructing the text.

SVG recommendations[edit]

Small size is a significant goal.

SVG is not arbitrarily scalable. Scalable is more about eliminating jaggies.

Fixed width lines. (CSS can adjust.)

SVG is not a good file format for bitmap images such as bar codes and QR codes. Those objects are not arbitrarily scalable; they must fit on a pixel grid. One could use barcode fonts within an SVG file; fonts will align to an underlying pixel grid.

Not for photographs (but can be used to label photographs).

Limited colors (can use color gradients). Color blocking suggestions rather than enormous detail.

Filters can produce complex objects such as chalk textures and clouds.

File size[edit]

SVG files can be small, but they can also be surprisingly large. Consider some images from Category:NAVAID pictograms:[4]

The first image is a central dot and 12 line segments, and it has a simple representation. The second image is problematic. it is a central dot, a central solid ring, and 10 dotted radial rings. It has a lot of dots, but why does it need so many bytes? Each dot is not a circle element, but rather a path that looks like a circle. The third image is nearly as complex (only 7 dotted rings), but it is a more efficient representation. Instead of round dots, it uses stroke-dasharray for the dots. Notice that the dash array has some issues along the north axis.

We can get dots easily. Use a circle element (stroked but no fill), set the stroke-dasharray="0 xxx", and set stroke-linecap="round". The value xxx is chosen to be an integer fraction of the circumference. A close look at the NDB-DME symbol shows dashes instead of dots.

There is a problem with librsvg: the stroke-dasharray attribute must use commas rather than spaces.

Does SVG have trouble with polar tiles? Can a tile be more general?

Title and description[edit]

It is reasonable to include title and desc elements.

SVG 2.0 will allow language versions (using the lang or xml:lang attribute — not the systemLanguage attribute).[5] The acceptance of the language versions is not clear, and it has an at-risk warning in the SVG 2.0 specification.

The title and desc elements are not display elements. For that reason, they cannot be selected within a switch element. In that context, the elements would be giving a title and a description to the parent switch element.

There should be support for the Dublin Core dc:title and dc:description elements with xml:lang attributes.

Metadata[edit]

Metadata and copyright are intertwined. Metadata should include information about the origin of an image, and several copyright licenses require that some information be provided.

The Creative Commons licenses require some specific information. For example, there should be a link to the CC license. Derivative works need to say what was changed. In many cases, these requirements may not be met.

I believe all SVG files should include metadata. It is not hard to add, and it can be useful. Including license data in the image metadata may fulfill licensing requirements or at least provide a colorable defense. Failure to follow all licensing requirements may lead to trouble.[6][7][8]

Moral rights. Even if I do not need to credit an author, there may be a moral obligation to give them credit. Sometimes that moral right can become a legal right. Some contributors allow free use of some or all of their work. LadyofHats is a notable example. That means I can use the work for any purpose, and I do not need to give anyone credit. That does not seem reasonable or even right. I could take Herman Melville's Moby Dick and publish it under my own name. It seems far better to say it is Melville's work.

Providing metadata also makes it easier for someone else to check the licensing rights. Commons encourages everybody (not just its wikipedia projects) to use the available art. Say Alice uses some CC0 SVG images from Commons on her website. The images are CC0, so Alice does not mention any licensing details. Bob sees Alice's website and likes the images, but how can Bob determine the licensing of the images?

The license check would be simple if the SVG file included the licensing information. Just given an image, it may be hard to find out who made it. If the image has metadata, then that information may be easy to find. The information may not be accurate (somebody may be license washing), but it is a starting point and could serve as a good defense.

https://www.dublincore.org/specifications/dublin-core/dcq-rdf-xml/

Other metadata[edit]

Metadata is not about just copyright information. Metadata can include other relevant information.

For maps, the metadata may include information about the map projection. With that information, one could take the (x, y) location of a point on the map and convert it to the corresponding latitude and longitude.

For chemicals, the metadata may include structured descriptions of the chemical.

I'm leery of too much metadata. SVG should be more of an output format rather than a container for detailed information. Providing a small amount of information is reasonable, but including lots of information may be inappropriate. The intended use of SVG is to display an image.

The mess that is xml:lang[edit]

The issues with lang and xml:lang. Watch out for accidental captures.

Creative Commons license requirements[edit]

Creative Commons licenses are used extensively on Commons and WMF servers.

State the common legal requirements of CC licenses.

  1. CC- requires a link to the CC license. That means a it is easy to find the license terms. Check if a CC0 license also has this requirement.
  2. CC-BY must provide reasonable attribution. May distribute and alter. May impose more restrictive license.
  3. CC-SA (implies a derivative work) must not use a more restrictive license and must describe the changes.
  4. CC-ND allows use but not modification.
  5. CC-NC does not allow commercial use. (What is the constraint on commercial? May a nonprofit use the work in its fundraising? May the Girl Scouts use it to sell their cookies? In the US, agency settles some of these questions.)

State the failings.

The file description pages are often inadequate. Sometimes there are gross errors such as an improper license. Derivative works often omit the attribution information in the license. The description of a derivative work often fails to describe the changes made to the original work.

Most file uses on WMF servers satisfy the requirements because MW links the file to its description page:

[[File:Yellow banana.svg|A picture of a yellow banana.]]

Presumably, the file description page has a link to the CC license and meets the attribution and modification requirements.

However, the file use may alter that link (MW:Help:Images).

[[File:Yellow banana.svg|link=https://www.nowhere.com/bitbucket|A picture of a yellow banana.]]

or

[[File:Yellow banana.svg|link=|A picture of a yellow banana.]]

If the override link does not provide the needed licensing information, then the license is violated. There can be disastrous ramifications. MW should not allow such links for CC-licensed material.

Dublin Core and Creative Commons[edit]

Reasonable SVG metadata should use both Dublin Core and Creative Commons vocabularies. The metadata can be expressed using RDF.

Dublin Core[edit]

A general reference:

It suggests some vocabularies. Looking for "Terman, Frederick" gives the MARC value

Dublin Core provides a vocabulary for references. There are two Dublin Core namespaces:

Sometimes, the dcterms namespace uses the dc prefix. The goal is to use the dcterms vocabulary rather than the 15-element dc namespace. It is possible to translate dc to dcterms (e.g., using XSLT), but that translation may confuse existing software.

Dublin Core elements/1.1/ is a short (15 term), general, vocabulary for works:

  • dc:title (there is also an SVG title element)
  • dc:date
  • dc:creator
  • dc:contributor (I would use for translators)
  • dc:source
  • dc:format (less important) for SVG, use image/svg+xml
  • dc:type (less important) often rdf:resource="http://purl.org/dc/dcmitype/StillImage"
  • dc:publisher (If empty, I would have this point to Wikimedia Commons)
  • dc:subject DC states, "Typically, the subject will be represented using keywords, key phrases, or classification codes. Recommended best practice is to use a controlled vocabulary." I do not see a widely adopted practice here. Most people would probably use a text string of comma-separated keyword phrases. That would match the HTML meta tag: e.g., <meta name="keywords" content="HTML, CSS, Javascript" >. However, the obvious RDF approach would use an rdf:Bag that holds each keyword phrase: <cc:license><rdf:Bag><rdf:li>HTML</rdf:li><rdf:li>CSS</rdf:li><rdf:li>Javascript</rdf:li></rdf:Bag></cc:license>. The dcterms: mirror is not a list of keywords.
  • dc:coverage Time or location. Not widely used? E.g., Port Royal earthquake.
  • dc:description
  • dc:identifier
  • dc:language
  • dc:relation
  • dc:rights The clearer practice here would be to use cc:license

The Dublin Core vocabulary uses general rather than specific terms. For example, the dc:creator predicate covers several possibilities such as author, composer, lyricist, illustrator, and photographer. There are vocabularies that make finer distinctions,[9] but those distinctions may not be necessary for many works, and most applications probably do not support the terms.

Usage examples:

Interesting metadata in

Specifies data types.

Dublin Core schemas:

Looking at a schema for elements

Looks like an arbitrary sequence of the 15 elements. Looks like the element content is text only (xml:lang attributes are allowed).

  <xs:complexType name="elementType">
    <xs:simpleContent>
      <xs:extension base="xs:string">
        <xs:attribute ref="xml:lang" use="optional"/>
      </xs:extension>
    </xs:simpleContent>
  </xs:complexType>

Significantly, this declaration does not show using rdf:resource attribute.

I expected the dcterms schema to be more restrictive.

However, the schema states

Encoding schemes are defined as complexTypes which are restrictions of the dc:SimpleLiteral complexType. These complexTypes restrict values to an appropriates syntax or format using data typing, regular expressions, or enumerated lists. In order to specify one of these encodings an xsi:type attribute must be used in the instance document. Also, note that one shortcoming of this approach is that any type can be applied to any of the elements or refinements. There is no convenient way to restrict types to specific elements using this approach.

Here's a dcterms to dc and what looks like a W3C Date-Time Format.

<xs:element name="date" substitutionGroup="dc:date"/>

<xs:complexType name="W3CDTF">
  <xs:simpleContent>
    <xs:restriction base="dc:SimpleLiteral">
      <xs:simpleType>
        <xs:union memberTypes="xs:gYear xs:gYearMonth xs:date xs:dateTime"/>
      </xs:simpleType>
      <xs:attribute ref="xml:lang" use="prohibited"/>
    </xs:restriction>
  </xs:simpleContent>
</xs:complexType>

The details are both troubling and confusing. Dublin Core looks like simple text (simpleType). What impact does that have? For multiple authors, one either uses several creator elements or puts the list in simple text. The dcterms set does not provide access to rdf:Seq.

Does common usage of Dublin Core violate the schema?

The schemas, without more, do not do a sensible validation of, for example, date syntax.

A reification from dcterms to elements is clear.

Creative Commons[edit]

Creative Commons adds some terms for specifying the license and attribution:

  • cc:license (said to be the same as xhtml:license; Commons does not allow uploading SVG that uses the xhtml namespace)
  • cc:attributionURL (may be needed for CC-BY; I would have this point to the File: page on Commons)
  • cc:attributionName (may be needed for CC-BY)

For Commons files, making the cc:attributionURL point to the file page on Commons may satisfy the attribution requirements of CC-BY licenses.

Resource Description Framework (RDF) statements have a subject, a predicate, and an object.

Although vocabularies are specified, how those vocabularies should be used is not nailed down. If there are two creators, how should that be specified? Should there be an RDF dc:creator statement for each creator? Should there be one dc:creator statement whose object is a set of the creators? The situation for licenses is more obvious. If the user gets to choose which of several licenses, then there should be one cc:license, and the object should be an rdf:Alt that identifies the alternative licenses. However, most software probably expects exactly one license rather than a list of alternatives. The simple approach is to offer only one license.

The lack of consistency implies problems. If a graphics program does not understand the input RDF, then it may get corrupted on output. The appropriate goal is to have metadata that most graphics editors understand. That way, the metadata is preserved during import and export.

Consistency and accuracy are also missing in many Commons licenses.

Say Alice creates a CC-BY-SA image and uploads it to Commons. Bob then reuses Alice's image. Bob is required to use a CC-BY-SA license, and Bob's image must carry attribution to Alice. Many mistakes happen on Commons. Bob's image may not mention Alice's licensed image. Bob may claim his work is CC0 (license washing). Bob may use CC-BY-SA, but he may not point out that Alice must be acknowledged, too. Given license information on Commons may be missing or incomplete, it is no surprise that license metadata may be haphazard, too.

A CC-BY-SA license permits modification (i.e., derivative works). The licenses require the modifier to describe the changes, but Creative Commons does not have a vocabulary term for describing the modifications.

Creative Commons and closure[edit]

Creative Commons does a good job for the original work. The license is declared, and there are constructs for attribution. If a work is used without modification, then the metadata has the information for proper attribution.

The metadata is insufficient when the original work is modified. The license requires that the changes be identified, but there are no XML elements for describing the changes.

List the licenses and the issues.

  • 0
  • -BY
  • -SA
  • -ND
  • -NC

Another issue is how graphics editors can merge metadata.

Adobe Systems XMP[edit]

Adobe XMP uses the elements namespace:

Sigh.

Adobe Systems includes metadata, and it has settled on a specific syntax with its eXtensible Metadata Platform (XMP). Adobe solves the multiple creator problem by always using a set of creators (even if there is only one creator). Adobe also restricts the use of complex RDF syntax.

In XMP, the dc:creator should be an ordered list of ProperName.[10] A ProperName is a simple text value.

<rdf:Description rdf:about="">
  <dc:creator>
    <rdf:Seq>
      <rdf:li>John Smith</rdf:li>
      <rdf:li>Richard Roe</rdf:li>
    </rdf:Seq>
  </dc:creator>
</rdf:Description>

Should discuss the equivalent form.

<cc:Work rdf:about="">
  <dc:creator>Alice</dc:creator>
</cc:Work>
<rdf:Description rdf:about="">
  <rdf:type rdf:resource="http://creativecommons.org/ns#Work" />
  <dc:creator>Alice</dc:creator>
</rdf:Description>

Inkscape metadata[edit]

Which namespace does Inkscape use? elements or dcterms? If it uses elements, then it should upgrade. Or at least accept one or the other. I'm looking at a file I believe to be Inkscape, and it has xmlns:dc="http://purl.org/dc/elements/1.1/".

Inkscape has a metadata form to fill in, but Inkscape uses an agent description. (Pull a copy of Inkscape metadata).

<dc:creator>
  <cc:Agent>
    <dc:title>Andy Fitzsimon</dc:title>
  </cc:Agent>
</dc:creator>

Please note that cc:Agent is not part of the http://creativecommons.org/ns namespace.

Dublin Core has a dc:Agent, so it is possible that Inkscape meant dc:Agent rather than cc:Agent.

I'm not a happy camper...

The page does not include the attributionName or attributionURL elements. It has a set of licenses. It also points to some SIL licenses.

There is a significant but unresolved issue here.[11] An original goal is to identify the license and the creator. Not a lot of information is needed to acknowledge those rights; a simple text reference to a name might be good enough. However, more details can be given about the rights holder, so should the representation give more details? At what point would there be too much information. More information could be added, but very few systems will be able to process that information. The simple approach is to keep the information simple enough to satisfy license requirements and avoid adding extraneous details.

A URL is a better method of identifying a person than some text. Many people have the name John Smith, but the URL https://www.imdb.com/name/nm0808774/ identifies a particular John Smith. Unfortunately, many applications probably expect a text string and cannot handle a URL. If an application expects this input

<dc:creator>John Smith</dc:creator>

then how will it handle this input (i.e., a URL that identifies a particular John Smith)?

<dc:creator rdf:resource="https://www.imdb.com/name/nm0808774/" />

Try this out in Inkscape.... Try this out in Adobe Illustrator....

Well-known licenses[edit]

Creative Commons wants a cc:License element that summarizes the license, but I do not like that practice for well-known licenses. What happens if the summary is inaccurate? Say the license URL is CC-BY-SA 4.0 but the license summary prohibits commercial use? Does the summary take precedence over the URL?

In theory, it should be easy to obtain the RDF description of a well-known license. For example, the license HTML at

has a link in the HTML

  • <link rel="alternate" type="application/rdf+xml" href="rdf" />
    

which refers to the license RDF at

Consequently, an RDF description of a well-known license is available.

It is possible to check whether the license summary is consistent with the published URL.

Creators also misuse CC-BY licenses on Commons by stating additional license terms. For example, the creator may state that the attribution must appear next to where the image is used. Creative Commons CC-BY licenses require attribution, but the license lets the licensee use any reasonable method of attribution. Here's the text about attribution from CC-BY-SA 4.0:

Attribution — You must give appropriate credit, provide a link to the license, and indicate if changes were made. You may do so in any reasonable manner, but not in any way that suggests the licensor endorses you or your use.

If a creator states the license is CC-BY-SA 4.0, then the creator should not be able to state additional requirements. Additional requirements contradict the terms of CC-BY-SA 4.0.

Metadata checker[edit]

A few years ago, I did some tests on RDF XML validation.

A sophisticated metadata checker could...

  • look for appropriate namespaces
  • check value consistency (ISO dates, finite set ranges)
  • calculate a list of ranges
  • validate XML schemas (valid RDF, valid CC, ...)
  • learn frequency of metadata (lots of image/xml+svg but little cc:creator)
  • possibility of rewriting metadata
  • possibility of adding metadata

My notes on RDF. GRIDDL.

Related topics[edit]

A more general topic, SVG validation, is Help:SVG. There were more involved discussions about SVG validation. Validation is often too strict (complaining about extensions such as Inkscape or new SVG features).

Commons:Overwriting existing files.

Removing metadata[edit]

To me, removing metadata from an SVG file (or any other file) may be inappropriate. The bottom line is removing metadata may trigger legal issues but leaving it in has minimal cost.

Consider a signed painting. Should someone come along and paint over the signature?

Removing metadata is similar to removing a watermark. See Legal issues with the removal of watermarks and Removal of watermarks from Commons images. WMF legal staff opines that removing a watermark could violate the DMCA and even violate the terms of some Creative Commons licenses.

Compare to removing a watermark that was not part of the original image. Sometimes a person who hosts an image may add a watermark.

Sometimes metadata is inadvertently removed. When librsvg produces a PNG, I doubt that it copies metadata from the SVG to the PNG.

Other uses of metadata[edit]

Images on Commons should have free licenses, but many uploads violate the creator's license. Generally, Commons relies on its users to upload only free material.

Some of that checking can be done automatically. Consider an image that is published on some website, and the website states a non-free license for that image. Alice likes the image, so she uploads it to Commons claiming it is her own work. Commons does not know.

Now consider that the image has metadata that says the creator is Bob and the license is CC-BY-NC. Commons could read the metadata, realize it does not know that Alice is Bob, and recognize the CC-BY-NC license is not compatible with Commons. Thus Commons could refuse the upload automatically.

At upload, Commons could also notice that a work is CC-BY-SA with required attribution. Commons could fill in the attribution details.

Graphics applications might also warn users about editing files that carry CC-BY-ND licenses.

Detect symbol candidates[edit]

Artists often copy-paste an image component rather than creating and using a symbol.

Also works for text-to-curve images.

Use styles[edit]

There is a difference between content and style. The content is the information, and the style is how it is displayed. Content that is in a particular class may be displayed the same way by using CSS to select and style SVG elements in that class.

In general, it is better to use CSS to achieve a consistent display rather than individually formatting graphics elements.

Consider a map. We may want the rivers and the names of rivers displayed in blue. A river class can set the color for both the rivers and the font fill. Cities with a population under 100,000 may use a small dot and a small font, and cities over that size may use a larger dot and a larger font. CSS can set the font size and even the radius of a circle. Capital cities may use a star instead of a dot.

Using CSS and the class attribute can make the display both consistent and easy to change. Fill colors and font families are set in just the CSS rather than on each SVG element. Changing the CSS will apply the change to all elements in the class.

Graphics editors should have a way to manage styles, but they may not round-trip them.

SVG text[edit]

In general, the text within an SVG file should be in SVG text elements. Avoid converting text to paths/curves. Such path text expands the size of the file, and it is often unnecessary. Artistic text (such as used in logos) may need to be converted to curves.

In general, if an SVG file contains text, then users should be able to copy and paste that text from the SVG file. A simple test is to load the SVG file into a browser and then try to select all the text (control-A in Windows). If no text is selected, then the diagram's text has been converted to curves.

The text that is selected should be readable, grouped appropriately, and spaced correctly. Independent phrases should be in their own text element; they should not be combined with other phrases. Independent phrases that need two lines should not use two text elements but rather code the lines in tspan elements. That keeps the phrase together.

In addition, the selected text should not be missing spaces or have extra spaces. If the text is displayed on two lines, then it should have a space between those two lines. For example, the better result is "Holy Roman Empire" rather than "Holy RomanEmpire". Unfortunately, SVG does not handle spaces well. Spaces at the beginning or end of a line may not align as expected (the SVG hanging space problem of text-anchor).

<text><tspan>Holy Roman</tspan><tspan x="0" dy="20">Empire</tspan></text>
<text><tspan>Holy Roman </tspan><tspan x="0" dy="20">Empire</tspan></text>

Sometimes, text is spaced for emphasis. For example, a map of the United States may have text that looks like United States. That text should copy and paste without the additional spaces. Instead of inserting actual spaces to achieve the effect, the graphic artist should set the letter-spacing of the string. Furthermore, do not space text by individually placing the characters. That makes the text difficult to translate, and it may render poorly when fonts are substituted. Use the mechanisms that SVG provides.

Similarly, a string that displays as all capital letters should use text-transform: uppercase. For example, United States uses a text transform and will copy-paste as "United States". There are other text transforms, but they are less useful.

The perils of hidden text. It can confuse editors. Any hidden elements can cause confusion.

Fonts[edit]

Point to section about fonts and what scaling them means.

A good example of the benefits of nonlinear scaling of fonts is a bar code font. The font symbols are scaled to integer pixel widths. The symbols use Manhattan geometry, so the edges are sharp; no anti-aliasing is needed. The strict symbol geometries are maintained.

Recommend the CSS generic fonts serif and sans-serif. If possible, do not use exotic fonts.

WMF also has problems because librsvg. There are times that we want a text string to be an exact length. SVG supports that with the textLength attribute, but librsvg does not support it.

Character encoding[edit]

Unicode is common now, so most SVG files will use Unicode or a Unicode-compatible subset. In practice, that means UTF-8, but UTF-16 is also a possibility. UTF-16 wants a byte-order mark (BOM), and some UTF-8 files will also include a BOM. Software should handle those cases.

Even though a file may claim to be Unicode, that does not mean the file uses Unicode. There are many special fonts that put exotic glyphs in non-Unicode character positions. The Adobe Symbol font, for example, uses its own character encoding.[12] Zapf Dingbats[13] and Adobe Sonata[14] also use their own encodings.

Symbol
ABCDEabcde → ABCDEabcde
Zapf Dingbats, Wingdings
ABCDEabcde → ABCDEabcde
Sonata
ABCDEabcde → ABCDEabcde

Even common fonts may have non-Unicode character assignments.[15] For example, many Adobe fonts use the Adobe Standard Encoding[16] which puts a dagger at 0xD1 (Ñ instead of U+2020: †) and the "fi" ligature at 0xAE (® instead of U+FB01: fi).

Files that claim to be Unicode but use non-Unicode fonts should be recoded with Unicode fonts. Font substitution may not work when fonts use non-Unicode character encodings.

Files that use less common character encodings (such as Shift-JIS) do not need to be recoded if they use Unicode fonts. XML files that use such encodings can convert the text to Unicode.

Detecting non-Unicode files would be involved. The first step is converting to Unicode. The XML charset attribute should be authoritative, and it offers a clear route to convert an XML file to Unicode. The XML DOM should automatically convert a known charset to a DOMString, which is essentially Unicode. (XML DOM now hides the character encoding.)

The second step is searching for fonts within an SVG file. If a font is Unicode, then the text content is OK. If the font uses a non-Unicode charset, then the text content should be searched for non-Unicode characters. If no non-Unicode assignments are used, then the text content is OK. If non-Unicode assignments are used, then select a Unicode font replacement, and edit the text content to change the non-Unicode characters to equivalent Unicode characters.

Those steps require a significant database.

  1. font family
  2. character encoding (points to a (possibly standard) table)
  3. replacement font
see w:Mojibake

Path text[edit]

Talk about what path text looks like in some contexts.

There are files that should have path text removed.

SVG files that have converted text to path are often marked with {{Path text SVG}}. Another (earlier?) convention was to explicitly categorize files that should use text to Category:Convert to TXT. The category has JPEG, PNG, and SVG files in it. Ah! It's from {{ShouldBeText}}. That template wants the figures to be converted to wikitext rather than using an illustration. It may be better to mark some files with {{Path text SVG}} or {{Convert to SVG}}.

Here is a file in the category:

The file has not only converted the text to paths, each letter is a symbol, and the text is typeset by placing those symbols. This file is also interesting in that it describes a technical standard, and the text in the file are candidates for translate="no".

Files with even more confusion (text as symbols and curves drawn as line segments):

Glyphs[edit]

A (non) candidate for glyphs. 115 kB.

SVG 1.0 and 1.1 have SVG elements that allow a user to embed a font.

For those cases where converting text to curves makes sense, using glyphs offers potential benefits.

Discussion at Commons:Graphic_Lab/Illustration_workshop/Archive/2021#Vietnamese-style_seal_of_the_Government-General_of_French_Indo-China (and several sections immediately following)

General information about w:en:Seal script.

The seal is 115 kB for 15 characters. That is about 7700 bytes per character, which is rather large. Using the path element, one should be able to describe a line segment in less than 100 bytes. Examining the image with magnification shows that the character strokes have a lot of noise.

Modern script (not seal script) using writing-mode: vertical-rl:

大法國欽命
總統東法全
權大臣管理

Some SVG files embedded commercial fonts as glyphs. For example, an Adobe Illustration file might embed a portion of the Arial font in an SVG file. That practice should be discouraged.

SVG 2.0 drops glyph[edit]

SVG was developed when web fonts were not available, so SVG included a rudimentary embedded font mechanism.[17] With web fonts, such a facility is not as important, so the mechanism has been deprecrated. As of 2021, support may still be found in the Safari and Android browsers.

Glyphs would not work with some scripts[edit]

The Unicode specification will not add any new composed characters. That simplifies the number of characters needed. For example, Siddham script has thousands of glyphs, but most of those glyphs are composed characters. In Unicode, Siddham has a small number of fundamental characters. Composed characters are still drawn, but they no longer have exposed codepoints.

WMF prohibits web fonts[edit]

SVG 2.0 may have dropped glyph support because web fonts are now available. In the past, web pages depended upon the fonts that a user already had on his local machine. If the local machine did not have the font, then it would substitute some other font. Those substitutions could lead to bizarre results.

It gets even more troublesome when the desired font is for uncommon Unicode scripts. Unicode supports many scripts, but most users will not have those scripts. Unicode has assignments for Egyptian hieroglyphics and ancient Sanskrit Siddham.

CSS now has a mechanism to load web font.

Google offers a lot of fonts, and it also has CSS files to use those fonts as web fonts. (Reference)

The downside is the webfonts allow some tracking. The web font files have a long caching time (was it a year?). A browser would download the font and use it without continually querying the Google servers. The CSS files have relatively short cache times, so the browser would be contacting Google servers often. (Reference)

Alberta road signs[edit]

Road signs can be thorny. They may contain artistic text, and they may contain ordinary text.

Even with artistic text, the file sizes are often not large because the signs are simple (they do not contain much text).

Old Alberta road signs could sensibly use a stylized font.

The modern road signs are too stylized.

See Category:Alberta Highway shields

User:Highway Route Marker Bot

Fonts are not that important to signs. See File:AB69ewSigns-TwoFontsYMM (28172571140).jpg which shows two road signs using the old Alberta logo but the highway numbers are in different fonts.

Font height may remain fixed, but the font weight (e.g., bold) or font stretch (e.g., condensed) may vary.

2, 2A, 93, 93A

w:en:Symbols_of_Alberta

File:Alberta wordmark 2009.svg

File:AB-provincial highway.svg

Text anchors[edit]

SVG files should use reasonable anchors. The usual choices are left aligned, center aligned, or right aligned. If I want text aligned on the right edge, I should not insert some left-aligned text and then move the position of the whole string so the right edge ends whene I want it to end.

Alignment is important because font metrics vary. Text that seems to align correctly with one font may look ragged with another font.

Choosing the correct text anchor is a simple defense against varying font metrics.

There are subtle problems with SVG text anchors. The SVG semantics do not play will with text direction. If the anchor sets a starting point, then left-to-right text builds to the right, but right-to-left text builds to the left. That can give screwy results.

The issue is a bit more complex. There is an interaction between the specified text direction and the Unicode BIDI algorithm. They will give reasonable results in simple cases.

Ideally, the text element should set the text direction that is appropriate for the script. English text should set the text direction as left-to-right. Arabic text should set the text direction as right-to-left. Unicode BIDI will then layout the strings correctly, but now the layout will head in opposite directions. For expected results, one must change both the text anchor and the text direction. That's a headache.

In theory, CSS can fix the problem, but SVG agents may have weak CSS implementations.

Phab:T271663

SVG warts[edit]

SVG is not HTML[edit]

SVG 1.1 used xlink; namespace. HTML does not have namespaces, so HTML uses just href rather than xlink:href. For some reason (perhaps embedding SVG within HTML), SVG 2.0 has decided to use href.

The problem with xml:lang and lang.

SVG should be about making marks on a screen or a piece of paper. It should not be about myriad other topics. If the semantics are not about marks on paper, then the semantics do not belong in the SVG specification.

For example, there is a notion that some text might be translated to another language, while other text should not be. People who were interested in XML markup developed the Internationalization Tag Set for making such notations. Consequently, one could add rules and attributes to an XML file that translation utilities could use. The attribute its:translate="no" means do not translate the content, and its:translate="yes" means translate the content. The specification also included rules using XPATH patterns to identify what should or should not be translated. Everything in the its namespace is distinct from other namespaces (and the default namespace).

HTML does not have namespaces, so the use of ITS is a bit awkward. So HTML added the translate attribute. It is not as powerful as the ITS specification, but it is simple. SVG comes along and copies the HTML translate attribute. There is no reason. SVG can support the its: namespace; it is not crippled like HTML.

HTML does not have namespaces. Instead of following XML and adopting namespaces (as XHTML did), HTML invented a poor man's namespace. If attributes start with data- or aria-, then they are in a quasi-namespace. SVG is XML, so it need not stoop to such measures. SVG should have used data: and aria: namespaces.

HTML ignores capitalization. Consequently, <Head> is the same as <HEAD>. It is the same for attributes. However, the data- attributes wanted to have database keys that were cases sensitive. So HTML uses a hyphen algorithm. Everything after the data- prefix is in lowercase unless it is immediately preceded by a hyphen. The attribute DATA-NAME="Smith" sets database["name"] = "Smith". If we wanted the database key to be all capitals, we must say data--N-A-M-E="Smith". That's due to the case-insensitive nature of HTML. XML and SVG can be much simpler. They could say either data:name="Smith" or data:NAME="Smith". No need for a pseudo namespace, and no need for a hyphen capitalization rule.

WMF whitelisted namespaces[edit]

WMF prohibits all but a select set of namespaces.

It looks like the test requires elements to be in whitelisted namespaces but does not require attributes to be in whitelisted namespaces. I should check that distinction. Might try

<svg version="1.1"
     xmlns="http://www.w3.org/2000/svg"
     xmlns:xlink="http://www.w3.org/1999/xlink"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://namespace... http://schemalocation/schema.xsd">

See also Help:SVG.

Some absent namespaces are significant. When Dublin Core came out in 2000, it provided a succinct set of terms in the dc/elements/1.1/ namespace. The next year, it came out with an expanded dc/terms/ namespace and vocabulary. In 2008, it encouraged dropping the first namespace in favor of the dc/terms/ namespace. WMF accepts the former but not the latter namespace.

The MathML namespace is also not whitelisted. MathML allows sophisticated mathematical typesetting, but WMF blocks its upload. Users cannot upload this file:

<?xml version="1.0" encoding="utf-8"?>
<svg viewBox="0 0 300 200"
     version="1.1"
     xmlns="http://www.w3.org/2000/svg"
     xmlns:xlink="http://www.w3.org/1999/xlink">
  <title>SVG MathML test</title>
  <desc>Test if MathML is available in SVG. Will not upload to Commons due to MathML namespace.</desc>

  <metadata>
    <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
             xmlns:dc="http://purl.org/dc/terms/"
             xmlns:cc="http://creativecommons.org/ns#" >
      <cc:Work rdf:about="">
        <dc:publisher>Wikimedia Commons</dc:publisher>
        <cc:license rdf:resource="https://creativecommons.org/publicdomain/zero/1.0/"/>
        <cc:attributionName rdf:resouce="http://commons.wikimedia.org/wiki/User:Glrx" />
        <cc:attributionURL rdf:resource="http://commons.wikimedia.org/wiki/File:SVG_MathML_test.svg" />
      </cc:Work>
    </rdf:RDF>
  </metadata>

  <text x="150" y="40" text-anchor="middle">SVG MathML test</text>

  <switch transform="translate(50,100)">
    <foreignObject width="200" height="50"
                   requiredExtensions="http://www.w3.org/1998/Math/MathML">
        <math xmlns="http://www.w3.org/1998/Math/MathML">
          <msqrt>
            <msup><mi>x</mi><mn>2</mn></msup>
            <mo>+</mo>
            <msup><mi>y</mi><mn>2</mn></msup>
          </msqrt>
        </math>
    </foreignObject>
    <text>\sqrt{x^2 + y^2}</text>
  </switch>

  <text x="10" y="175" font-size="8">should display a formula in either MathML or TeX</text>
</svg>

Future[edit]

Color picture that should be drawn differently in black and white.

SVG allows images to adapt. Printing a color image in black and white may not produce a satisfactory result. A red fill may look similar to a dark gray fill. SVG can use CSS media queries and adjust the presentation.

Consider the image on the right. If the media supports color, then the picture can have a blue background with white text and lines. If the media is black and white, then the background can be white, and the text and lines can be black. Mechanical drawings can be more complex. For color media, solid color fills may distinguish different components; for black and white media, crosshatches may distinguish the components.

An SVG style element has a media attribute. CSS syntax allows @media queries.

SVG 1.1 / CSS 2 media queries are very limited. SVG 2.0 is much richer.

There is some support, but it may not work well. In tests tried around 2019, one browser could distinguish color and monochrome requests, but it would not follow changes in the printer properties.

General issues[edit]

librsvg does not turn off its clip region.

Converting bitmaps to SVG[edit]

Photographs are usually poor choices for vectorization. (25 kB)

Many files on Commons are bitmaps, but some would be more useful as SVG files. Bitmap files are great with large, orthogonal, features, but they can struggle with thin features and curves. Zooming in on a feature will show more anti-aliasing fuzz or jagged edges. More details require more bits. Bitmap files can be difficult to edit. Changing lines or text involves not only adding the new content, but also erasing the old. Erasures can be difficult because the background must be reconstructed. It is difficult to copy text that is in a bitmap: the text is just a picture that must be converted to characters. It takes a lot of work to translate a bitmap to another language. Bitmap files that are good candidates for vectorization can be marked with {{Convert to SVG}}.

Expensive parser tests:

  • {{PAGESIZE:{{FULLPAGENAME}} → 169,485
  • {{PAGESIZE:File:Silversmith.jpg}} → 2,601 (size of description page)

Unfortunately, converting a bitmap file to a vector file may not be an easy task. It also may not be desired.

Technical bitmaps such as a QR code should remain bitmaps. (Do not convert QR code PNG files to JPEG bitmaps.)

Converting a photograph or other continuous-tone image to SVG is usually inappropriate. See w:Image tracing. Good candidates for conversion need to have significant structure. Some continuous-tone images have structured color gradients, so they can be vectorized.

Images with a lot of random details may be inappropriate. it does not take much information to describe a long straight line, but it does take a lot of information to describe 10,000 individual objects. There are times that randomness can be described by a pseudorandom process. (For example, MPEG replaces fricative sounds with a noise generator.)

Here is a progression of changes to a subject image. The details and appearance of an image can be improved and still be an efficient representation of the object. The last image has the detail of the gun powder grains without individually drawing each grain.

Many technical images can be good candidates for conversion to SVG. See, for example,.

Issues[edit]

This section is confused. It should start with straightforward conversions such as diagrams that are easy to redraw.

Next, it can address stepped conversions. A stepped conversion is where a bitmap is still present in the SVG, but parts of the bitmap are replaced with SVG elements. Eventually, the SVG elements may eliminate the bitmap. A "stepped conversion" may include SVG files that will always contain a bitmap image. For example, the bitmap may be a photograph, but SVG may use text elements to label the photograph.

From there, it can address the random process methods. The section should not lead with the most difficult conversions. It can also serve as a counterpoint to not converting the Mona Lisa to SVG.

Straightforward conversions[edit]

Here is an example of a PNG file that has been converted to SVG.

The files are not widely used, but the SVG format makes it easier to fix some minor issues with the original file. For example, the variable such as VTorpedo can be edited to use the more common italic-variable convention of VTorpedo. The arrows for the torpedo and target velocities look like velocity vectors, but they do not make sense as velocity vectors. The diagram suggests that by the time the torpedo reaches the target's track, the target has already gone by that point.

The SVG also brings up an issue with SVG marker elements. In the past, I have created a new marker for each fill. There is a technical issue about inheritance of attributes such as stroke and fill. A use instance will not inherit from its environment because it is not part of the DOM tree. In some cases, an instance will use attributes that are set on the use element because they are part of the (inaccessible) tree.

Conversions with gradients[edit]

An image may look complex, but it may just need the appropriate construct.

Conversions should be good[edit]

Suggested replacements should only be used if they are superior. Replacements may not be supperior.

The original JPEG is simple and clean. The vector replacement has problems. The JPEG uses a single font. The SVG uses several font sizes and uses colors. A yellow font can be lost in a white background, so the yellow font is stroked. The paper towels are flat in the JPEG, but they are wavy in SVG. One purpose of the paper towels is to evenly distribute the weight; wavy towels (especially when the waves line up) do not convey that purpose. The solution is divided in the JPEG but connected in the SVG. What is the distinction between Southern blot and northern blot?

Despite the image having simple vector shapes, the majority of the image is a bitmap.

The SVG vector was derived from File:Capillary blot setup.svg.

The file descriptions are slightly different: the first is about a Southern blot while the second is about a northern blot. The first is for DNA and the second is for RNA, but both procedures use agarose gel electrophoresis.[18]

Electroblotting makes more sense as a blot, but the electro-transfer is vertical. That has issues with applying voltage in the given images.

w:File:Electroblot.gif is public domain, but not yet transferred to Commons. It shows the vertical electrodes. w:Northern blot states, "Strictly speaking, the term 'northern blot' refers specifically to the capillary transfer of RNA from the electrophoresis gel to the blotting membrane."

investigate ...

Multiline text causes trouble[edit]

For translations, try to keep the text on one line. Text that is broken into many lines is troublesome.

  • Cathode
  • Aston Dark Space
  • Cathode Glow
  • Cathode Dark Space
  • Negative Glow
  • Faraday Space
  • Positive Column
  • Anode Glow
  • Anode Dark Space
  • Anode

The diagram has unconventional leader lines. The diagram has negative shading: the dark spaces are white; some glows are dark.

Stepped conversions[edit]

Here are files that can be converted in steps. The first step would use an underlying bitmap file with overlaid SVG text elements. Later, the bitmap image could be converted to SVG.

A stepped conversion with difficult image[edit]

Here's a file that has conversion problems and can be converted in steps.

For the first step, the PNG can be edited to remove the text and leader lines. That PNG can be inserted into an SVG file, and the text and leader lines can be redrawn using SVG primitives. Removing the text is usually simple, but removing the leader lines can be tricky. In some cases, the leader lines can be retained. In either case, the leader lines pose a problem with text alignment. The current layout requires the text to fit the space between the margin and the start of the leader line. That strategy works for PNG files, but it has problems with SVG because font metrics may change slightly. A substituted font with slightly different metrics may not fit between the margins and the leader lines. One fix would be to add a background filter to the text; it would overwrite the leader line with white (see filter below). Alternatively (and probably better) would be to right align the lefthand text and left align the righthand text. Another text fitting problem is the title: it runs from the left margin to the right margin. A slightly wider font would go outside both margins.

<filter id="textflood" filterUnits="objectBoundingBox" primitiveUnits="objectBoundingBox">
    <feFlood flood-color="white" flood-opacity="1.0" x="0" y="0" width="1.0" height="1.0" result="back"/>
    <feMerge>
        <feMergeNode in="back" />
        <feMergeNode in="SourceGraphic" />
    </feMerge>
</filter>

For the second step, the body image could be redone as SVG. Completely converting the image to SVG is hard because the image has gradient fills; a raster-to-vector conversion application will probably not have a good result. Rendering the intestines looks difficult, too. There are many twists and turns, so shading is difficult. Perhaps a good place for filter primitives.


A simpler target is the following image.

Significant detail[edit]

Trees have significant detail[edit]

Here, the JPEG image is much higher quality than the SVG. Both files are about the same size.

Even synthetic images can have significant detail[edit]

The JPEG image has more character than the SVG. The files have similar size.

Stamps[edit]

Originally, I thought some files were bitmaps, but now it looks like something much stranger happened. The original artist made an SVG with Inkscape and used an appropriate filter, but somehow the SVG file bloated out of control. Why?

See Category:Powered by Wikidata.

Rectangular stamp[edit]

Font family is "Sans", but the SVG text was converted to curves. There are many instances of filters, and those instances include "Rubber Stamp" and "Chalk and Sponge". The defs section is huge, and it has several huge clipping paths. However, only one clipping path is used. The wiki barcode does not use a clipping path, so it is drawn without special effects.

The SVG files have neutered flowRoot elements.

This SVG file uses the "Gill Sans" font.

Round stamp[edit]

The stamp runs into librsvg inability to do textPath. It is a lot of bytes for a simple image; individual debris has a lot of information. There is debris even in the unstamped areas. Some of the debris is black. Most of the debris is polygonal. The WIKIDATA rectangle is filled, so not all of the apparent background is transparent; clipping would be appropriate.

Redid as an SVG with a random process filter.

Vectorization[edit]

Find mirrored or rotated components of an image

How good can automatic vectorization be?

Recovering text[edit]

Files on Commons can be OCR'd (produces JSON with a text key with lines of OCR/d text):

English
Arm Bones.png
Chinese
(zh)Illu epithelium.jpg

There are tools for identifying fonts.

Translations: internationalization and localization[edit]

Commons supports wikis in many different languages. Ideally, an image would be available in any language, but the reality is many images on Commons are just available in English. Images in a bitmap format such as PNG have painted in the text, so the text is not easy to change.

SVG can support translations.

Sadly, SVG has made some unusual choices. The class attribute is a space-separated list of tokens, but the systemLanguage attribute is a comma-separated list of tokens. The commas added confusion (some implementations used space-separated IETF language tags) and complicate pattern matching. Compare CSS [systemLanguage~="en"] (which wants a space-separated list) and [systemLanguage|="en"]] (which does not want a list).

Translations are welcome, but they have costs[edit]

There are important diagrams that have many translations.

MediaWiki problems[edit]

I'm seeing some multilingual files that MediaWiki does not offer to show in various languages ("Render this image in ..."). I've come across a couple in the last month, and they are not the 256 kB case. Possibly newer page builder? Also may be one explicit langtag and a default.

Maybe this is a clue. Go to

and it will display the language drop down box. Now click on the "(default language)" option and GO. The language dropdown box disappears.

That is the same as going to

Alternatively, go to the Klingon version, which does not have a render this image in (language) selector:

So somebody creates a diagram in English on Commons. Somebody then applies SVG Translate to add a language such as French. SVG translate does not do the triple clause thing; instead it just adds "fr" clauses and keeps the default. The Commons file page will not display the potential languages. SVG Translate should ask for the default language (or use the lang= or xml:lang= attributes if available.

MediaWiki language advertisements[edit]

Say a file is used on langtag.Wiki but does not have a langtag translation. In that case, the SVG file might have a link to SVG Translate. Alternatively, MW might notice that a file is switch translated but the file does not have the desired language. Then it could insert not only the img tag but also a link to translate the file. It could use the wiki symbol for translate: File:Translate (CoreUI Icons v1.0.0).svg — except it has a CC-BY 4.0 requirement so putting it in the SVG file would be cumbersome.

The information section of a file page often lists other versions of a file. A file may, for example, have PNG and SVG versions. There may also be different language versions.

A template is often used to keep the other-versions information up-to-date on the different file pages. For example, {{Other versions/War in Ukraine (2022)}} is transcluded on many file pages. The template usually consists of an image gallery that lists each file with a comment.

For translations, the comment often identifies the language. That raises the question of how to identify the language.

  1. Using English words such as French, Russian, or Japanese. This approach only works for English readers and does little for other languages.
  2. Using langtag such as fr, ru, or ja. This approach is too cryptic for most users; they would not know what the strings mean.
  3. Using MW {{#language: fr}} to obtain the language in its representation. Anyone visiting the file page would see français, русский, 日本語. That makes it easy for native users to recognize their language, but the most users would have trouble recognizing all the languages. A user seeing lietuvių may not know that means Lithuanian.
  4. Using {{language|fr}} to obtain the language in the MW page's language. The English version of the file page would show French, Russian, Japanese. The German version of the file page would show Französisch, Russisch, Japanisch. The translations depend upon the file page's uselang URL parameter.

Methods 3 and 4 are the better approaches. I previously used method 3, but now I think method 4 is better.

The advent of multilingual SVG files raised an issue of how they should be represented in gallery. Should there be one file that lists all the languages that it supports, or should the file be repeated for each language?

I prefer the former. The gallery is usually so small that the translations do not show up. Painting essentially the same file 15 times on the file page seems wasteful. It is very repetitive for files such as File:Map of USA with state names.svg, a multilingual map with 150 languages.

Clicking on the image or a link. Should it select the render this image in language?

The language gallery templates. More to say. {{Svg lang}}, {{Lang gallery}}, and {{Lgallery}}.

{{Lgallery}} supports Category:Switch-controlled SVG which exploits systemLanguage that browsers will not understand.

Switch translated files already have a general request to add translations. How about attracting translators for a specific language? Say XYZ.svg is an image used on the abc.Wiki but does not have an abc translation. How can we seek translators for the image?

  • Embed a translation request in the image. That would pollute the image.
  • Start building categories.

For the latter, make a {{Translation request}} template. Add the template to the File: page with the langtag abc. The template can link to https:svgtranslate.toolforge.org/BASEPAGE. The template can also add the File: page to the Category:Translation requests abc.

Does SVG translate have URL parameters to specify the source and target languages?
github repo

A wiki could encourage its users to go to the appropriate category page. Or there could be a translate a random page.

How MediaWiki handles images[edit]

When MediaWiki builds a page, it makes HTML img elements that will display image.

For a JPEG image:

For an SVG image

The image URL pattern is [URL prefix]/H/HH/[filename]/[size]px-[filename][suffix]

When my browser displays the page, it processes the img elements. The browser will use the src attribute to make an HTTP request for the image. First, the browser will look in the cache to see if it has a local copy. If that local copy exists and is current, then it will use the local copy. If the local copy exists but is stale, then it will make a network request asking the remote server whether the local copy is still current. If the local copy is still current, then the browser will use the local copy. If there is no local copy or that copy is no longer current, then the image will be transferred over the network.

When the server gets an HTTP image request, it will look in its cache to see if it has that image ready to go. If it does, then it can answer the request from its cache. Otherwise, the server will pick apart the URL, process the request, store the result in its cache, and transfer the result over the network. (Processing the request might say the image is still current, or processing may involve scaling the image.)

Real life is a bit more complicated. WMF's wikis are high-traffic websites, so just one server cannot do all the work. A connection to a wiki will go to one of many servers. That server may ask other WMF computers to do the work. The /H/HH/ pattern in the image URL is from an MD5 hash code. It offers an easy method of load leveling work among up to 256 computers.

MediaWiki source code[edit]

SvgReader has many uglies including the exponential language list. SvgHandler trims the list.

SvgHander::normaliseParamsInternal() is where the lang param must be in the lang list.

MediaWiki file page.

MediaWiki compiling a page.

MediaWiki serving an image. MediaHandler/ImageHandler/SvgHandler

  • SvgHandler::rasterize()

Thumbor serving a page.

Thumbor 7 changed to Python 3, a breaking change:[19]

Release 7.0.0 introduces a major breaking change due to the migration to python 3 and the modernization of our codebase. Please read the release notes for details on how to upgrade.

Gilles retires from WMF... https://mobile.twitter.com/monsieurperf/status/1409444342352187400

Language variants[edit]

Asking Klingon (tlh) — URL has .../langtlh-220px-
Asking English (en) — URL does not have .../langen-220px-

It's been a long time, and I need to check these claims. I want to point to the code.

Building a wiki page that contains an SVG file is a little more involved. There are circumstances where MediaWiki will include a language specifier in the image URL:

  • ...[filename]/lang[language]-[size]px-[filename].png

The language specifier is included if the wiki text has an explicit |lang= parameter. That is the user making an explicit request, and that request is honored even if the SVG file does not have that language.

On the English wikipedia, if there is no |lang= parameter, then no language specifier is emitted. This practice is due to WMF servers defaulting the SVG language to generic English. That practice makes it difficult to ask for the SVG file's default language. To get the default clause, one must ask for a langtag that does not exist in the file (e.g., tlh Klingon). (Make a table showing the issue.)

On other wikipedias, if there is no |lang= parameter, there is an attempt to use that wikipedia's default language.

MediaWiki checks the SVG file to see if it has any language dependencies. The check is simple, and the check can be fooled. Currently, it reads the first 256 kB of the file looking for systemLanguage attributes. As it finds those attributes, it builds a list of languages the SVG file supports.

If the wikipedia's default language is in that list, then MediaWiki emits a URL that requests that language.

There is logic behind these choices. Most SVG files are not multilingual, and even if they are multilingual, they often do not support many languages. The goal is to avoid building language-specific URLs that do not affect the image. If an SVG file does not support Russian, then it does not make sense to scale and cache a Russian version of the SVG that looks exactly the same as the English version.

Languages and fonts[edit]

Unicode does not specify variants. (Well, it does sometimes. Normal "r" and rounded "ꝛ". Normal "d" and insular "ꝺ".[20]) For example, the Latin small letter a has two major variants: double-story a or single-story a. Chinese ideographs have similar variations, and some languages use specific variants. Chinese, Japanese, and Korean may draw the same ideograph differently. Unicode does not distinguish the character, so the font selection must make the change.

CSS can select an appropriate font.

:lang(zh) {font-family: ...; }
:lang(ja) {font-family: ...; }
:lang(ko) {font-family: ...; }

The problem is the :lang selector is not supported by librsvg. Also, we would want to distinguish between zh-Hans and zh-Hant. Unfortunately, old versions of librsvg only distinguish up to the first hyphen.

There is not a good solution for the systemLanguage attribute. CSS can do case-insensitive, partial, matches to an IETF langtag:

[systemLanguage|="zh" i] {font-family: ...; }
[systemLanguage|="ja" i] {font-family: ...; }
[systemLanguage|="ko" i] {font-family: ...; }

But CSS is not designed to parse comma-separated lists (SVG should have made the systemLanguage attribute a space separated list just like the class attribute). Even then, CSS does not have prefix matching (=|) on space-separated token lists (~= matching). One can use several selectors to cover the cases, but it is cumbersome.

Languages and layout[edit]

Vertical layout tests for English and Chinese. The green text would work for a plot's y-axis, but the librsvg used by WMF does not handle the CSS.

Consider an x-y plot. The x-axis label will be horizontal and handled normally. The y-axis label is often written rotated by 90° with a text anchor of start or middle. That works for Western European languages, but Chinese should not rotate the characters but rather write them top to bottom.

The normal method of producing the y-axis label for a Western European language would be to rotate the text by -90°. The rotation point would be logically on the font baseline. For Chinese, the normal method would not be a rotation but to set the writing-mode to top-to-bottom. The logical baseline is no longer the bottom of the text but rather the center of the text. If the Western text used a start anchor point, then the Chinese text would use an end anchor point.

CSS can do the transform or set the writing mode, but there are subtle issues. Using the CSS transform property will trump any transform attribute on the element (CSS priority). Similarly, CSS would not trump a transform in a style attribute. Also, such transforms are applied before a text element's x and y attributes. Coordinates on tspan elements may be problematic.

The green text in the "Vertical Layout tests" to the right uses CSS to adjust a possible y-axis label. It could use a better Western European language default. The CSS is

:lang(zh-Hans) { font-family: NSimSun, sans-serif; }
:lang(zh-Hant) { font-family: PMingLiU-ExtB, MingLiU_HKSCS-ExtB, Microsoft JhengHei, sans-serif; }
.vert { fill: green; }
.vert:lang(en) { transform: rotate(-90deg); transform-origin: 0px 0px; }
.vert:lang(zh) { writing-mode: tb; text-anchor: end; transform: translate(-0.5em, 0em); }

For English, the text is rotated. For Chinese, the writing-mode is changed; the text is offset to the left to compensate for the different baselines.

Currently, the WMF rasterizer does not handle the example.

Internationalization (i18n) and localization (i10n)[edit]

Many SVG files are in just one language.

SVG files that use the switch element and the systemLanguage attribute are internationalized. One SVG file supports many languages. Such SVG files are also known as multilingual. There are not separate SVG files for each language.

There are systems that support many languages but produce output files that are monolingual. The output of these systems localized (specialized to a specific locale).

MediaWiki uses internationalized/multilingual SVG files to produce localized PNG files. The PNG files that librsvg produces are not multilingual.

That leads to semantic differences. When MediaWiki displays a multilingual SVG file, it displays the language desired by the wiki, but when I display an SVG file in my browser, my browser displays it in my preferred language.

XSLT localizer. Transform multilingual SVG to monolingual. It can also strip unneeded namespaces such as inkscape: (that will not remove properties in style elements or attributes.

Explain the lang= URL parameter on Commons. Does that demand the /lang in the URL? There are multiple levels here, too. If I'm on a wiki and click an SVG image, it takes me to a File: page on that wiki that displays the wiki's language version. From there, I can click on the Commons link. That takes me to Commons and will display the default language version.

  • Phab:T134408 Thumbnail-like rendering of localized SVGs for client-side rendering, 4 May 2016. Early recognition of localizing SVG.
  • Phab:T134455 Add experimental option for direct SVG output via srcset, 4 May 2016. Needs a localizer.
  • Phab:T134407 Provide a way to reference fonts for client-side SVG rendering, 4 May 2016. CSS would win here.
  • Phab:T134482 Beta feature for opt-in client side SVG rendering, 5 May 2016. This seems problematic. Each wiki page would need either some JavaScript to select the SVG or PNG, or there would be an HTTP vary on the user's option that would double the cache requirements.

List of languages[edit]

See Phab:T259018.

MediaWiki API will report the available languages:

See the metadata: [ {"name": "translations", "value": [] } ]. It is clearly from the switch information. It will have entries such as { "name": "en", "value": 2 }. IIRC, the 1 and 2 values are whether it is a substring match or an exact match. Find the code to be sure.

I'm presuming this metadata is stored in the database rather than triggering a reparsing of the file. Check that out.

Problem file[edit]

2022 Russian invasion of Ukraine

File:2022 Russian invasion of Ukraine.svg is an important map on Commons, but it is a mess. The map is needed in many languages, and how those translations are handled is a difficult issue. There are many localized versions of the map, but they may not get continuing updates to the original file. The conflict is active, so updates are desired.

The file can be improved in several ways, but some improvements may make the file difficult to edit. There are tradeoffs, and this file shows some of the problems.

Planar translations[edit]

The original map now has some multilingual additions, but they are essentially planar translations that SVG Translate and other translation tools cannot handle.

✓ Done Unwinding planar translations is tough. One needs to match the text elements by their position, but the positions may have moved slightly.

Now that the file uses the translation units that SVG Translate wants, several languages have been added. Some additional translations have been so close in time that I feared they would overwrite each other, but it does not appear that happened. SVG Translate may have a significant update model that allows concurrent translations.

Inkscape[edit]

The author of this SVG file uses Inkscape, so SVG Translate and hand edits to the file should not prevent the author from making changes. If the author has trouble, then it is important to list those troubles. It is possible the author with have trouble with class attributes.

A significant problem is users do not know how to add new date labels and text. Copying some text and then editing often produces confused translations or untranslateable text. If an entire switch is copied, then the English text is changed but the default text and all the other languages stay the same. The translations are confused. If just the text element is copied, then it also carries the systemLanguage attribute. That attribute prevents SVG Translate from translating the text. The best approach is to just insert new text; do not copy it from elsewhere in the image.

There are also strange edits that appear.

A switch element may contain several unrelated (and ultimately undisplayable) text elements. This may come about from copying the text elements. The copy somehow ends up within the switch. It should not display on the screen, so it would confuse the user.

Geometry elements are being inserted into switch elements that should contain only text elements. What determines where Inkscape will insert a new element? It should be treating a switch atomically. I have deleted several spurious geometry elements already, and now there are more:

<switch fill="#ffffff" transform="translate(1827.3,587.38)" id="switch4938">
  <rect style="display:inline;opacity:0.948718;fill:#dc0000;fill-opacity:1;stroke:#000000;stroke-width:0.245063;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-dashoffset:1.92453;stroke-opacity:1"
        id="rect352325-8-9-2-1-4-3-9-29-8-3"
        width="25.413" height="5.2650332" 
        x="-12.691059" y="-3.4384575" ry="1.2425818" 
        transform="rotate(0.33424498)"/>
  <text id="trsvg995" systemLanguage="fr"><tspan id="trsvg798">1er avril</tspan></text>
  <text id="trsvg996-tr" systemLanguage="tr"><tspan id="trsvg799-tr">1 Nisan</tspan></text>
  <text id="trsvg996-it" systemLanguage="it"><tspan id="trsvg799-it">1º aprile</tspan></text>
  <text id="trsvg996-ru" systemLanguage="ru"><tspan id="trsvg799-ru">1 апреля</tspan></text>
  <text id="trsvg996-pt" systemLanguage="pt"><tspan id="trsvg799-pt">1 de Abril</tspan></text>
  <text id="trsvg996-el" systemLanguage="el"><tspan id="trsvg799-el">1 Απριλίου</tspan></text>
  <text id="trsvg996-ca" systemLanguage="ca"><tspan id="trsvg799-ca">1 d'abril</tspan></text>
  <text id="trsvg996-vi" systemLanguage="vi"><tspan id="trsvg799-vi">1 tháng 4</tspan></text>
  <text id="trsvg996"><tspan id="trsvg799">1 April</tspan></text>
</switch>

✓ Done The rect element will prevent any display of text. Also notice that the systemLanguage="en" clause was removed; it was probably replaced with the rect element. There is also the sneaky rotate by less than 1 degree transform. Inkscape is also inserting copious style information.

✓ Done Also, instead of editing a symbol definition, the use was exploded and the result edited in place.

Colors[edit]

Many people want the map colors changed. One concern was using web safe/colorblind-friendly colors. Consistent (and easily changed) colors can be done with styles.

Place names[edit]

The map is already large. There are hundreds of community names on the map. That presents the same translation bloat problem that a 100-language version of a US map presents. The map should use a skeleton file that is localized with a database of translations. WMF does not have that capability for SVG files. SVG also does not have an easy line-breaking method.

Need to work with what we have today. To keep the file size down, the switch elements are given or inherit styling from class="place". That allows the fill color, font family, and text-anchor to specified in one place rather than repeated on each element. The font size is also given or inherited. The font size is a function of the city's population. The text position is also specified on the switch element so it need not be repeated for each translation.

Finding place names[edit]

Using WikiData to translate place names is complicated by difficult-to-resolve Ukrainian place names. For example, "Pershotravneve" maps to more than 30 WikiData items.[22] To automate the search, the name should be attached to a map point; that practice is not common on SVG maps. The projection parameters can be found by following sources back to File:Ukraine adm location map.svg; the base map claims to be an equirectangular projection that includes administrative regions. The SVG size is 1,546 × 1,038. Then invert that point with the map projection to get a latitude and longitude of the community. Then do the WikiData query that coincides with that position.

Equirectangular projection, vertical stretching 150 %
Border coordinates

52.7
21.5←↕→40.7
44.1

Pictogram voting info.svg Info This map is part of a series of location maps with unified standards: SVG as file format, standardised colours and name scheme. The boundaries on these maps always show the de facto situation and do not imply any endorsement or acceptance. In case of changes of the shown area the file is updated. The old version will be uploaded as a new file and thus is still available.

The file is 2,199 × 1,478 px. Radekhiv is at (350.01 px, 413.3 px) → 50.3° N, 24.56° E. Google Maps says 50.28° N, 24.60° E.[23]. The WikiData item is Radekhiv (Q904046).

The vertical stretching comment of 150% is the same as shrinking the horizontal by 2/3. That gives the standard parallels as = ±48.1897.

Locations use circles; it might be better to use symbols.

 <g fill="#ff4" stroke="#777" stroke-width=".71">
  <circle cx="950.74" cy="379.56" r="2.49"/>
  <circle cx="246.42" cy="424.61" r="2.49"/>
  <circle cx="350.01" cy="413.3" r="2.49"/>
  <circle cx="340.11" cy="175.71" r="2.49"/>
  <circle cx="1252.6" cy="439.46" r="2.49"/>
  <circle cx="1283.4" cy="500.98" r="2.49"/>
  <circle cx="288.49" cy="210.71" r="2.49"/>
  <circle cx="297.69" cy="259.51" r="2.49"/>
  <circle cx="307.23" cy="319.26" r="2.49"/>
  <circle cx="372.29" cy="378.3" r="2.49"/>
  <circle cx="463.15" cy="243.6" r="2.49"/>
  <circle cx="527.85" cy="150.26" r="2.49"/>
  <circle cx="1596.7" cy="465.5" r="2.49"/>
  <circle cx="1671.8" cy="477.12" r="2.49"/>
 </g>
 <g fill="#ff4" stroke-width=".71">
  <g stroke="#777">
   <circle cx="1598.5" cy="574.58" r="2.49"/>
   <circle cx="1648.1" cy="611.47" r="2.49"/>
   <circle cx="1687.9" cy="570.18" r="2.49"/>
   <circle cx="1782.4" cy="651.77" r="2.49"/>
   <circle cx="1722.5" cy="661.25" r="2.49"/>
   <circle cx="1722.1" cy="533.15" r="2.49"/>
   <circle cx="1700.9" cy="655.44" r="3.2"/>
   <circle cx="1540.9" cy="377.32" r="2.49"/>
  </g>
  <circle cx="1577.5" cy="330.57" r="2.49" stroke="#787877"/>
  <g stroke="#777">
   <circle cx="1374.1" cy="333.64" r="2.49"/>
   <circle cx="1330.1" cy="312.07" r="2.49"/>
   <circle cx="1464.5" cy="261.7" r="2.49"/>
  </g>
 </g>

Sensible grouping may be done by finding a location circle and then finding nearby text. Alternatively, locate all circles near some text. The grouping also allows translation issues to be detected. For example, the anchor point of some text may need to be moved if a translation is significantly longer or shorter than the original.

<g>
  <circle class="city" r="3" />
  <text class="city" x="10" y="0">City Name</text>
</g>

The map may use a more sensible grouping of communities within districts.

Would like to detect content that is a date.

The text should use class attributes and CSS for the formatting. Map (font, font size, color) → class.

Several g elements are used to default the font size (or other formatting characteristics) of their contained text elements. Unwinding those groups is a difficult problem. Perhaps detect a group that has presentation attributes and only text children.

<g font-family="Calibri" font-size="3.27" font-weight="bold" stroke-width=".61">
  <text x="923.1" y="241.1">1 April</text>
  <text x="1180.34" y="253.12">1 April</text>
  <text x="1133.34" y="158.79">2 April</text>
  <text x="1372.74" y="238.32">2 April</text>
  <text x="1446.72" y="159.12">4 April</text>
  <text x="936.76" y="345.97">30 March</text>
  <text x="983.05" y="390.31">31 March</text>
  <text x="1047.81" y="215.22">31 March</text>
  <text x="1180.34" y="204.35">1 April</text>
</g>

✓ Done There are several screwy transform attributes. For example, transform="scale(1.000,1)". Some other transforms have rotations of a fraction of a degree. The effective rotations are small enough that they can be ignored (except for the additional translation they introduce). Some matrices have a similar small rotation. There is even the bizarre:

<circle transform="scale(1 -1)" cx="1291.2" cy="-261.91" r="2.49"/>
<circle cx="1128.4" cy="255.56" r="2.49" fill-rule="evenodd" stroke-linecap="round" stroke-linejoin="round"/>

✓ Done There are text elements that are not stroked but have stroking attributes.

✓ Done There are switch elements that have a single default clause. Other editors have used SVG Translate on the file, so the problem has disappeared.

✓ Done Bombing locations should be symbols, but they may not even be grouped. These use relative coordinates, so the path matching may be easier than expected. The DOM dropped path primitives.

 <path d="m791.33 600.27 5.164 3.958 1.051-3.741 1.484 3.803 4.113-3.556-2.226 4.607 3.525-.124-2.288 2.876 3.061 1.979-3.649.062 3.216 3.865-4.607-2.134-.495 4.143-2.257-3.34-3.154 4.298.309-5.38-4.298.866 2.69-3.525-2.196-3.587 3.061.588z" fill="red"/>
 <path d="m792.98 602.35 3.877 2.972.789-2.809 1.114 2.856 3.088-2.67-1.672 3.459 2.647-.093-1.718 2.159 2.299 1.486-2.74.047 2.415 2.902-3.459-1.602-.372 3.111-1.695-2.507-2.368 3.227.232-4.04-3.227.65 2.02-2.647-1.648-2.693 2.299.441z" fill="#ff8000"/>
 <path d="m794.64 604.5 2.49 1.908.507-1.804.716 1.834 1.983-1.715-1.074 2.221 1.7-.06-1.103 1.387 1.476.954-1.759.03 1.551 1.864-2.222-1.029-.239 1.998-1.088-1.61-1.521 2.072.149-2.594-2.073.418 1.297-1.7-1.059-1.729 1.476.283z" fill="#ff0"/>
 <path d="M798.43 608.4a.777.777 0 1 1-1.553 0 .777.777 0 1 1 1.553 0z"/>

The last path uses two arcs to make a circle of radius 0.777 and diameter 1.553. It would give the center of the bombing location and the presumptive symbol origin.

It may be better to localize the file with XSLT, fix some issues, and then restart the translations.

Source data[edit]

See w:Template:Russo-Ukrainian War detailed map and w:Module:Russo-Ukrainian War detailed map. These maps are made with technology on the English Wikipedia. The air bases and nuclear installations have names. Cities with latitude and longitude by Oblast. Labels are wikilinks such as Zelenodolsk, and following that link produces the WikiData item Zelenodolsk (Q640713). The diagram has lost of lot of information.

The module has some data in an apparent Lua object. I do not know if the data is available as JSON.

How do I find which items have links to a Wikipedia article?

What is the best query approach?

Wikidata API queries[edit]

From a wiki article, find the Q-item? See https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bwbentityusage

returns

{
    "batchcomplete": "",
    "query": {
        "pages": {
            "33276544": {
                "pageid": 33276544,
                "ns": 0,
                "title": "Zelenodolsk, Ukraine",
                "wbentityusage": {
                    "Q10172305": {
                        "aspects": [
                            "S"
                        ]
                    },
                    "Q640713": {
                        "aspects": [
                            "C",
                            "D.en",
                            "O",
                            "S",
                            "T"
                        ]
                    }
                }
            }
        }
    }
}

Can a SPARQL query find which item has a link to the article?

Position-based SPARQL query[edit]

Find a settlement using its latitude and longitude.

For example, Zelenodolsk is at Point(33.652359815 47.563096347). Find the settlements near that point:

#title: places in Ukraine near a coordinate
# SELECT ?place ?placeLabel ?location WHERE {
#  wd:Q640713 wdt:P625 ?coord.                         # coordinates of the location
#  ?place wdt:P17 wd:Q212;                               # country: Ukraine
#         wdt:P625 ?location.
#  FILTER(geof:distance(?location, ?coord) < 10). # less than 10 km away
SELECT DISTINCT ?place ?placeLabel ?oblastLabel ?location ?distance WHERE {
  Bind("Point(33.652359815 47.563096347)"^^geo:wktLiteral as ?coord).
  ?place wdt:P31/wdt:P279* wd:Q12051488 . # populated place in Ukraine
  # ?place wdt:P131* ?oblast.
  # ?oblast wdt:P31 wd:Q3348196. # located in Ukrainian oblast
  
  # Search by Nearest
  SERVICE wikibase:around { 
    ?place wdt:P625 ?location . 
    bd:serviceParam wikibase:center ?coord .
    bd:serviceParam wikibase:radius "10" . 
    bd:serviceParam wikibase:distance ?distance .
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} 
Order by ?distance

Try it!

Inverted SPARQL query[edit]

Alternatively, invert the problem. Get a list of human settlements in Ukraine and use that list to match the names. This query takes less than 4 seconds. It does not find Kyiv. Kyiv is not located in an Oblast — probably like Washington D.C. is not located within a state. So the Oblast could be optional. Furthermore, not all settlements have a population. If I do not acquire population, then the query takes 30 seconds.

#title: populated places in Ukraine
# -> 30,000 results w/o population, 6000 w population, 1700 w pop >= 1000
SELECT DISTINCT ?place ?placeLabel ?oblastLabel ?location ?population ?native WHERE {
  ?place wdt:P31/wdt:P279* wd:Q12051488 . # populated place in Ukraine
  ?place wdt:P625 ?location . # coordinates of the location
  optional {
    ?place wdt:P1082 ?population . # population
    # filter (?population >= 200000) .
  }
  optional {
    ?place wdt:P131* ?oblast . # located in an administrative region
    ?oblast wdt:P31 wd:Q3348196. # that is a Ukrainian oblast
  } # located in an admin region
  optional {?place wdt:P1705 ?native .}
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en" }
} 
Order by ?placeLabel

Try it!

There are more issues for oblasts. Several settlements are repeated because their oblast changed over time. Consequently, start times and end times for administrative regions are important. Is there an easy way to screen for outdated oblasts?

The name matching does not work well. The map has about 600 place names, but only 249 matches are found. Many settlements do not have a native label (they may have a Ukrainian label). In addition, the English spelling used on the map does not always match the WikiData label. Approximate string matching may help.

Fixes[edit]

I made some fixes to the file, and there were surprises. Several graphic elements had been merged or shuffled, and it takes a lot of work to find even simple cases. It is tedious work by hand. Another problem with working on a frequently updated file: new revisions. A recent revision caught me half-way through doing some housekeeping edits. Now I need to figure out how to merge them. That is further complicated by Inkscape's verbose output: one attribute per line (with the addition of an id attribute to every element). It is tougher to edit the file by hand. Time to run it through an XML pretty printer. I cannot really complain. Inkscape maintained the file structure and even the XML comments of my most recent upload. More importantly, the recent edit added content.

Another realization is another SVG Translate issue. Most of the file is a planar translation. It has a high-level switch with the separate planes as g elements. SVG translate leaves the complicated groups with systemLanguage attributes alone, but it apparently processes the default clause. That processing includes adding switch translations to every text element.

I got caught again. This time, the file was changed with SVG Translate (the legend is now a good target for SVG Translate, but not the rest of the file) while I was working on some changes.

Planning other fixes....

Locations[edit]

✓ Done The circles used to display cities are inconsistent. There are several radii to represent the size of the city, but the Ukrainian cities have a gray stroke while the Russian-held cities do not have a stroked border:

<circle cx="1597.2" cy="395.17" r="2.49" fill="#ff4" stroke="#777" stroke-width=".71"/>
<circle cx="1680.7" cy="408.34" r="2.49" fill="red"/>

✓ Done The stroke width is almost exclusively 0.71, but there are some cases with 1.09, 0.5, and 0.41. Some CSS would be neater and allow quickly adding a border:

circle.uk {fill: #ff4; stroke: #777; stroke-width: .71px; }
circle.ru {fill: red; stroke: none; stroke-width: .71px; }
<circle cx="1597.2" cy="395.17" r="2.49" class="uk"/>
<circle cx="1680.7" cy="408.34" r="2.49" class="ru"/>

✓ Done The Russian fill is usually red 00, but sometimes it is #fa2c29 00.

  • 0#fa2c290
  • 0#ff00000

Just using red seems reasonable.

✓ Done Date label fills use yellow and a darker red:

  • 0#ff00
  • 0#dc00000

Much of the placename text is a blue #04a 00.

The placename text usually is the same as the placenames used in the English version. Just use the English placenames and then add back the few changes (e.g., French uses Kiev).

The biggest problem with placenames is the dot size and the font size. Those sizes reflect the population, but consistent handling of those items is tough. In addition, some placename text may need different text anchors. Putting a size value in the class would work to set the font size, but it may not work for SVG 1.1 circle elements. The r radius is a geometry property that can be set with CSS in SVG 2.0, but it is just an attribute in SVG 1.1.[24]

The issue of dot size.

Towns and Villages
Population Dot Size Possible r Label Size Possible font-size Contested city size
Capital Size: 35 8.71 label size: 140 17.79
Population 1M + Size: 28 label size: 130
Population 500K + Size: 24 label size: 120
Population 200K + Size: 20 label size: 110
Population 100K + Size: 16 label size: 100
Population 50K + Size: 14 label size: 90
Population 20K + Size: 12 label size: 80
Population 10K + Size: 10 label size: 70
Population 5K + Size: 8 label size: 60
Population < 5K Size: 6 label size: 0 or 50
	--Towns & Villages
			-- Dotsize vs. Population
	--Arranged by Oblasts, then cities, alphabetical order

Locations

Styling with class[edit]

I would like to use class attribute and CSS to set styling. I did that in the map legend, but some web searches suggest that it is difficult to use class/CSS formatting in Inkscape. I need to find out more to avoid making the file difficult for others to edit.

Some comments suggested that class must be set in the XML editor (which might be daunting for many editors and have substantial peril). In addition, changing the class may not cause Inkscape's visual display to be updated. How does Inkscape handle styling? There were also comments about using Inkscape styling extensions, but extensions are not a good route.

Dates[edit]

✓ Done The date text does vary among versions, but the translations are direct. A wholesale use of the systemLanguage="en" group followed by editing the dates should work.

Dates are done in Calibri bold. The date text depends on the background. Russian dates are white, Ukrainian dates are black. Unfortunately, librsvg does not handle class conjunctions:[25]

text.date { font-family: Calibri; font-weight: bold; font-size: 3.27px; text-anchor: middle; }
text.date.ru { fill: #FFF; }
text.date.uk { fill: #000; }

✓ Done A date label was made with a rect element for the background and a text element for the date. I changed the rect elements to use the #labelru and #labeluk symbols. I also paired the symbols with their corresponding text, so the SVG now looks like:

<use xlink:href="#labelru" x="961.3" y="378.84" />
<switch fill="#fff" transform="translate(965.88, 382.3)">
  <text systemLanguage="en"><tspan>25 February</tspan></text>
  <text systemLanguage="fr"><tspan>25 février</tspan></text>
  <text systemLanguage="tr"><tspan>25 Şubat</tspan></text>
  <text><tspan>25 February</tspan></text>
</switch>

✓ Done The text does not use text-anchor="middle", so the "25 Subat" will skew to the left.

✓ Done The text x-coordinate should be shifted to the midpoint of the use element. That would be .

Ideally, the origin of the symbol and the midpoint of the text would coincide. The #labelru and #labeluk symbols can be shifted to use the same origin as the text.

Placing both elements in a group would allow positioning both. Such a grouping may be confusing to others.

Several labels have the same date and consequently redundant translations. A simplification would be to put each date into a symbol where it would be translated once. It could even be used in both symbols:

<use xlink:href="#labelru" ...>
<use xlink:href="#april_15" fill="#fff"/>

<use xlink:href="#labeluk" ...>
<use xlink:href="#april_15" fill="#000"/>

That change may also be confusing to others.

The Intl package can format international dates.[26]

var date = new Date(2022, 2, 15);
new Intl.DateTimeFormat("de", {day: "numeric", month: "long"}).format(date);

There are some issues: "en" → "March 15", "en-GB" → "15 March". For German, a period is added after the day. Hand translation for French gives "1er avril".[27] For Italian, "1º aprile".[28] The flourishes are only for the first day of the month.

It would be better if the dates were generated automatically rather than manually translated.

I use the Date object to parse the default date (Date.parse(el.textContent + " 2022")). Then I use the Intl package to compare the dates in the switch element clauses.

From a data standpoint, a more sensible default clause would use an ISO date format.

<switch>
  <text systemLanguage="en">15 March</text>
  <text>2022-03-15</text>
</switch>

Unfortunately, that would confuse SVG Translate. Going from English to another language would present "15 March", but going from default would present "2022-03-15".

I'm seeing strange changes to the SVG. Notice the y="31.370001. That suggests the number 31.37 was bumped by a single-float epsilon. Furthermore, transform="rotate(9.267) translate(1485.7, Y)" was rewritten as transform="rotate(9.267,-957.72641,9148.3009)". It is rotating the origin to a desired location!
Gross check (needs work):
1460.7007169041, 273.62571979975
1460.7006854713, 273.62571624018
Is the rewrite done by Inkscape or by SVG Translate?

<use xlink:href="#labelru" transform="rotate(9.267)" x="1473" y="31.370001" id="use4827" width="100%" height="100%"/>
<switch fill="#ffffff" transform="rotate(9.267,-957.72641,9148.3009)" id="switch4839">
  <text systemLanguage="en" id="trsvg973"><tspan id="trsvg776">25 February</tspan></text>
  <text systemLanguage="fr" id="trsvg974"><tspan id="trsvg777">25 février</tspan></text>
  <text id="trsvg975-tr" systemLanguage="tr"><tspan id="trsvg778-tr">25 Şubat</tspan></text>
  <text id="trsvg975-it" systemLanguage="it"><tspan id="trsvg778-it">25 febbraio</tspan></text>
  <text id="trsvg975-ru" systemLanguage="ru"><tspan id="trsvg778-ru">25 февраля</tspan></text>
  <text id="trsvg975-pt" systemLanguage="pt"><tspan id="trsvg778-pt">25 de Fevereiro</tspan></text>
  <text id="trsvg975-el" systemLanguage="el"><tspan id="trsvg778-el">25 Φεβρουαρίου</tspan></text>
  <text id="trsvg975"><tspan id="trsvg778">25 February</tspan></text>
</switch>

Somebody is going nuts duplicating rotated use elements.

<use xlink:href="#labelru" x="1937.40" y="544.58002"/>
<use xlink:href="#labelru" x="1937.40" y="544.58002" transform="rotate(0.334,-4875.8035,-16138.871)"/>
<use xlink:href="#labelru" x="1937.40" y="544.58002" transform="rotate(0.668,-5215.2903,-325.84574)"/>
<switch fill="#ffffff" transform="translate(1924.7,548.29)">
  <text systemLanguage="en"><tspan>6 March</tspan></text>
  <text systemLanguage="fr"><tspan>6 mars</tspan></text>
  <text systemLanguage="tr"><tspan>6 Mart</tspan></text>
  <text systemLanguage="it"><tspan>6 marzo</tspan></text>
  <text systemLanguage="ru"><tspan>6 марта</tspan></text>
  <text systemLanguage="pt"><tspan>6 de Março</tspan></text>
  <text systemLanguage="el"><tspan>6 Μαρτίου</tspan></text>
  <text systemLanguage="ca"><tspan>6 de març</tspan></text>
  <text><tspan>6 March</tspan></text>
</switch>

See SVGAnimatedTransformList. The API has a wonderful .consolidate() method. The API is incomplete. There is not a method to copy a transform or a list of transforms. Instead of using the API to concatenate two transform lists, it was easier to concatenate text strings:

el2.setAttribute("transform", el1.getAttribute("transform") + " " + el2.getAttribute("transform"));
Symbols[edit]

There are more symbols to extract: air bases, harbors, and power plants.

✓ Done Contested city

✓ Done The air base icon

✓ Done The harbor icon

✓ Done The power plant icons have changed from their original form. The Ukrainian version is a solid fill rather than a gradient. The Russian version still has a gradient, but it is not prominent. If I use a solid fill, then they can be a single symbol and the fill can be determined with class="uk" or class="ru".

Hydroelectric plant (not used?)

SVG Translate bogus langtags[edit]

Change bogus systemLanguage="zh_HANT" to systemLanguage="zh-HANT". Quick and dirty would select all the systemLanguage attributes and change underscores to hyphens. Killing bad langtags is good practice, but it will give horrible user interactions in SVG Translate. Users may continually try to translate a phrase that already has a translation. That would mean keeping both the bad langtag (to satisfy SVG Trnaslate) and the good langtag (to satisfy SVG). Then updates to the bad langtag would have to be copied to the good langtag. What a mess.

SVG Translate seems to be duplicating clauses on subsequent invocations.

Text element within switches with coordinates[edit]

I fixed a few switch element bodies that have translated text. Ideally, the switch element's transform property sets the starting text position. The text element and its first tspan element should not have x, y, or transform attributes.

<switch id="switch4565-3-6-9" transform="translate(1354.865,893.27667)" class="place" font-size="5.34px"
      style="font-family:'Liberation Sans', Arial, sans-serif;text-anchor:middle;fill:#0044aa">
    <text systemLanguage="en" id="trsvg2351-1-1-88" x="11.933496" y="1.381068"><tspan id="tspan17037-0-9-2">Vysokopillia</tspan></text>
    <text id="text4206-2-uk-1-5" inkscape:label="text4206-2" systemLanguage="uk" x="12" y="2"><tspan id="tspan16021-7-uk-5-0">Високопілля</tspan></text>
    <text id="text4206-2-7-94" inkscape:label="text4206-2" x="12" y="0"><tspan id="tspan16021-7-6-0">Vysokopillia</tspan></text>
</switch>

The problem may be common enough so I should try to detect it.

  • switch/text[x] | switch/text[y] | switch/text[transform]
  • switch/text[x or y or transform]
Elements with redundant style information[edit]

Replace style attribute with equivalent class value.

Perhaps hoist information to the parent switch element.

A better understanding of casual group formatting[edit]

The file uses casual groups to impose a common style on items that do not make their own sensible group. For example, a small set of Ukrainian villages may be grouped to impose a Ukrainian fill color on the group. The group of villages does not have a good reason to exist as a group. If the Russians gained control of one of the villages, then the group would need to be pierced to change its rendering.

Similarly, several text elements may be grouped to impose a common font selection, size, or color. That is presentation style rather than semantics, and it should be done with CSS.

Many of the formatting groups have been removed from cities, places, and dates. That puts the city circles at toplevel. The places are one-level down inside a group of places. The dates are one-level down inside a group of dates.

There are elements that should be grouped, and they should be grouped to the point of making a symbol. For example, contested cities are represented as a checkerboard. The checkerboard is four grouped path elements followed by an outside-the-group rectangle. All of those elements are semantically related, and they should be a symbol in the defs section. These groups have been converted to use a g or symbol inside the defs element. That step uncovered some issues. The origin for an SVG 1.1 symbol is always the upper-left corner. Furthermore, Inkscape has trouble cloning something that is inside the defs element.

Some groupings would make sense. The circles used for cities should be grouped with the names of the city. The symbols used for nuclear power plants should be grouped with the name of the power plant. The arrows showing the troop movements should be grouped with their dates. The date label boxes should be grouped with the date text.

SVG does not have the notion of associated labels. Consider a flag note that contains a date. The file does that by drawing a rectangle and then overlaying that rectangle with text. That takes two elements. Logically, the elements should be grouped so they move together. To change the text, one must penetrate the grouping.

Given the restrictions on the symbol origin, a possibility for dates would be

<g transform="translate(...)">
  <use xlink:href="#labelru" x="0" y="0" />
  <switch class="date" fill="fff">
    <text systemLanguage="en">March 15</text>
    <text>March 15</text>
  </switch>
</g>

So all the group's children are at the origin. A symbol would need a negative offset. Limitations on the WMF renderer prevents using CSS to set the text color.

An alternative is to use a filter on the text. The filter would automatically size for the text, so it could be better than a fixed label size. Inkscape may start making copies....

Does Inkscape's notion of layers allow easy editing? That is, does it avoid the cumbersome ungrouping and regrouping of an ordinary g element?

What does it take to make an Inkscape layer? Presumably toplevel g elements with two inkscape: attributes and an id of the form layern (just like Inkscape identifies all its elements).[29]

<g
  inkscape:label="Layer 1"
  inkscape:groupmode="layer"
  id="layer1" />

To use layers, must all graphics elements be in toplevel layers? If not, what happens to graphics elements that are outside of the layers?

Furthermore, removing the two attributes or the id may not be wise.

Layers and objects can be locked with sodipodi:insensitive="true"; it is the presence of the attribute and not its value that matters. See https://wiki.inkscape.org/wiki/Inkscape-specific_XML_attributes and example at https://gist.github.com/hedefalk/5b428772f7deefc906a194f297371e9e . The latter file suggests the group id does not need a layer name.

See also https://wiki.inkscape.org/wiki/index.php/Inkscape_SVG_vs._plain_SVG .

Inkscape has the notion of symbols and clones, but I'm not sure that it expects to clone objects in the defs section. What sort of access does Inkscape give to the defs section?

Layers would do a better job of enforcing painting order. While the order was consistent, it is now confused. Some arrows are drawn before place names are rendered, and some arrows are drawn after.

Languages: LTR and RTL[edit]

The Hebrew and Arabic versions of this file swap the graphics on the map legend. There is also a cosmology illustration that does something similar. What is a good way to handle that problem? Two separate map legends?

Metadata for the file[edit]

Consider adding some metadata to the file. There was a recent comment that Inkscape prevented a user from editing someone else's SVG. Could that be a result of CC-BY-ND license? Or describing the license with a similar requirement? How could SVG Translate figure that out? How would it be handled in a production environment?

A CC license should have a URL link to the license.

A CC-BY license should have the attribution names or an attribution URL.

In a derivative work, it is not enough to just give attribution to the creator of the derivative work. The license attribution requirements do not disappear for a derivative work. Many Commons files state incomplete attributions.

There is a question about crediting the source maps. It may be that a link to their pages on Commons is enough. Should the metadata include the license information for the images that the derivative work uses?

<metadata>
  <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
           xmlns:dc="http://purl.org/dc/terms/"
           xmlns:cc="http://creativecommons.org/ns#" >
    <cc:Work rdf:about="">
      <dc:creator rdf:resource="https://commons.wikimedia.org/wiki/User_talk:Viewsridge"/>
      <dc:source>
        <rdf:Bag>
          <rdf:li rdf:resource="https://commons.wikimedia.org/wiki/File:Russo-Ukraine_Conflict_(2014-2021).svg"/>
          <rdf:li rdf:resource="https://commons.wikimedia.org/wiki/File:Ukraine_adm_location_map.svg"/>
        </rdf:Bag>
      </dc:source>
      <dc:publisher rdf:resource="http://commons.wikimedia.org"/>
      <cc:license rdf:resource="https://creativecommons.org/licenses/by-sa/4.0/deed.en"/>
      <cc:attributionName>
        <rdf:Seq>
          <rdf:li rdf:resource="https://commons.wikimedia.org/wiki/User_talk:Viewsridge"/>
          <rdf:li rdf:resource="https://commons.wikimedia.org/wiki/User:Rr016"/>
          <rdf:li rdf:resource="https://commons.wikimedia.org/wiki/User:NordNordWest"/>
        </rdf:Seq>
      </cc:attributionName>
      <cc:attributionUrl rdf:resource="https://commons.wikimedia.org/wiki/File:2022_Russian_invasion_of_Ukraine.svg" />
    </cc:Work>
  </rdf:RDF>
</metadata>

SVG Translate[edit]

SVG Translate is an application that helps users translate SVG files with text elements into other languages.

Discuss history, Summer of Code, ... XXX, Jarry1250,[30] WMF Community Tech.[31]

What it does.

Diagrams need to be simple for it to work well. Single line text is best. Leave plenty of room because some languages use more text than others.

See File:BirdBeaksA.svg and {{Other versions/BirdBeaksA}}. Several language versions that just vary the text labels.

Syntactically, the text to translate must be a text element with 0 or more tspan elements. The tspan elements may not have any children. The tool expects only text; it does not expect switch elements containing group (g) elements or other graphics elements.

What it does not do.

It does not handle complex text. It expects text to be lines of simple, unadorned, text. Text that tries to emphasize some words with bold or italic styles are not handled. Similarly, it does not handle changing text colors or fonts. It does not handle subscripts or superscripts.

It does not handle adjusting the position of the text or the anchors. That is more of a graphics task than a translation task. SVG Translate is a text rather than a graphics application. In commercial translation settings, translators are not responsible for text positioning.

Vertical text. There are different conventions. In the USA, a book title on the book's spine is rotated +90°. In Europe, the title on the spine is rotated -90°. In China, text is written vertically without rotating the characters; generally, English readers find such text difficult. Bizzare Chinese ambulance.

Numbers, currency, and dates. SVG should have better support. Javascript has Intl, but Commons prohibits scripts. Translation from English "May 12, 1944" to German "12 Mai 1944". HTML has <time datetime="1944-05-12">May 12, 1944</time>, but it does not imply processing; it is just a machine readable time.

Exotic CSS can fix some issues, but there must be support.

Good translation targets[edit]

There are English SVG files that are already included on many Wikipedias. Running SVG Translate on these files would make the files more accessible.

SVG Translate Issues[edit]

clean code.

The resulting file has lots of identifiers.

SVG Translate does not obey translate="no".

The default clause problem[edit]

The tool does not work well with language preferences. Adding a German translation to

<text>Hello, world</text>

produces something similar to

<switch>
  <text systemLanguage="de">Hallo Welt.</text>
  <text>Hello, world.</text>
</text>

That works OK with WMF tools, but can be bizarre when displayed in a browser. For example, one can set a browser to prefer English but accept German. In that circumstance, the English language text should be displayed, but the SVG above will display German. The SVG agent does not know the default is English. The translation should be

<switch>
  <text systemLanguage="en">Hello, World.</text>
  <text systemLanguage="de">Hallo Welt.</text>
  <text>Hello, world.</text>
</text>

The fix is to copy the default clause, add systemLanguage="en", and then insert that element into the switch element. The insertNodeBefore(newNode, referenceNode). The SVG DOM method .cloneNode(deep) copies the id attribute, so the identifiers must be changed or removed.

Descends too deeply[edit]

SVG Translate does not recognized planar translations. If it sees a switch element with systemLanguage clauses, then it should not process that subtree.

The underscore issue[edit]

test file

SVG Translate uses parochial systemLanguage identifiers instead of IETF langtags. This bug arises as a workaround to langtag processing bugs in librsvg and langtag passing methods in MediaWiki. First, the C-language version of librsvg that WMF uses only matches langtags to the first hyphen. It treats zh-Hans and zh-Hant as equivalent. Second, MediaWiki passes langtags in the LANG environment variable; that variable expects a locale string rather than an IETF langtag. Third, librsvg uses the LANG environment variable as a langtag. SVG Translate exploits those problems by using zh_HANS rather than the correct zh-Hant IETF langtag. It makes the display work on (broken) WMF servers, but the SVG files do not work on other SVG user agents.

The underscore issue is even more twisted. Multiple SVG Translate invocations are adding multiple identical clauses with non-unique identifiers:

<switch id="switch2174" transform="translate(1853.7,532.54)" class="place" font-size="5.34px">
  <text id="text3001-zh-tw" systemLanguage="zh_TW"><tspan id="trsvg142-zh-tw">基夫沙里夫卡</tspan></text>
  <text id="text3001-zh-hant" systemLanguage="zh_HANT"><tspan id="trsvg142-zh-hant">基夫沙里夫卡</tspan></text>
  <text id="text3001-zh-tw" systemLanguage="zh_TW"><tspan id="trsvg142-zh-tw">基夫沙里夫卡</tspan></text>
  <text id="text3001-zh-hant" systemLanguage="zh_HANT"><tspan id="trsvg142-zh-hant">基夫沙里夫卡</tspan></text>
  <text id="text6217" systemLanguage="zh_TW"><tspan id="tspan6215">基夫沙里夫卡</tspan></text>
  <text id="text6221" systemLanguage="zh_HANT"><tspan id="tspan6219">基夫沙里夫卡</tspan></text>
  <text id="text6225" systemLanguage="zh_TW"><tspan id="tspan6223">基夫沙里夫卡</tspan></text>
  <text id="text6229" systemLanguage="zh_HANT"><tspan id="tspan6227">基夫沙里夫卡</tspan></text>
  <text id="text6233" systemLanguage="zh_TW"><tspan id="tspan6231">基夫沙里夫卡</tspan></text>
  <text id="text6237" systemLanguage="zh_HANT"><tspan id="tspan6235">基夫沙里夫卡</tspan></text>
  <text id="text6241" systemLanguage="zh_TW"><tspan id="tspan6239">基夫沙里夫卡</tspan></text>
  <text id="text6245" systemLanguage="zh_HANT"><tspan id="tspan6243">基夫沙里夫卡</tspan></text>
  <text id="text6249" systemLanguage="zh_HANT"><tspan id="tspan6247">基夫沙里夫卡</tspan></text>
     
  <text systemLanguage="en" id="trsvg1842"><tspan id="trsvg1218">Kivsharivka</tspan></text>
  <text id="text3001-it" systemLanguage="it"><tspan id="trsvg142-it">Kovšarovka</tspan></text>
  <text id="text3001-fr" systemLanguage="fr"><tspan id="trsvg142-fr">Kivcharivka</tspan></text>
  <text id="text3001-el" systemLanguage="el"><tspan id="trsvg142-el">Κιβσαρίφσκα</tspan></text>
  <text id="text3001-ru" systemLanguage="ru"><tspan id="trsvg142-ru">Ковшаровка</tspan></text>
  <text id="text3001-uk" systemLanguage="uk"><tspan id="trsvg142-uk">Ківшарівка</tspan></text>
  <text id="text3001-ka" systemLanguage="ka"><tspan id="trsvg142-ka">კივშარივკა</tspan></text>
  <text id="text3001-lt" systemLanguage="lt"><tspan id="trsvg142-lt">Kivšarivka</tspan></text>
  <text id="text3001-ca" systemLanguage="ca"><tspan id="trsvg142-ca">Kivxàrivka</tspan></text>
  <text id="text3001"><tspan id="trsvg142">Kivsharivka</tspan></text>
</switch>

The relevant SVG Translate code:

OK, it looks like a problem with inadvertently distinguishing equivalent langtags. The code mistakenly distinguishes zh_hant, zh_Hant, and zh_HANT.

I believe $language will be lowercase because

                $langCode = str_replace('_', '-', strtolower($lang));

So the code reasonable canonizes all langtags to lower case and converts non-standard underscore langtags to hypen langtags.

Consequently, the code below will work for all-lowercase langtags (such as the usual en or de) that are present in the SVG file, but it will never match langtags with an uppercase character (such as zh-Hant) because they have a capital letter. Furthermore, it will never match the converted, non-standard, langtags (such as zh_HANT) even with a case-insensitive match because the underscore was changed to a hyphen. Also notice that if two or more text elements match, then nothing will be changed, and no error will be logged.

                // Put text tag into document
                $path = 'fallback' === $language ?
                    "svg:text[not(@systemLanguage)]|text[not(@systemLanguage)]" :
                    "svg:text[@systemLanguage='$language']|text[@systemLanguage='$language']";
                $existing = $this->xpath->query($path, $switch);
                if (1 == $existing->length) {
                    // Only one matching text node, replace if different
                    if ($this->nodeToArray($newTextTag) === $this->nodeToArray($existing->item(0))) {
                        continue;
                    }
                    $switch->replaceChild($newTextTag, $existing->item(0));
                } elseif (0 == $existing->length) {
                    // No matching text node for this language, so we'll create one
                    $switch->appendChild($newTextTag);
                }

OK, tried a file with systemLanguage="FR", added a German translation, and the French clause was duplicated. SVG Translate produced:

    <switch transform="translate(20, 60)">
      <title>This title should display.</title>
      <desc>Test that title, desc, and metadata process correctly.</desc>
      <metadata/>
      <text systemLanguage="tlh" id="trsvg13"><tspan id="trsvg2">Klingon</tspan></text>
      <text systemLanguage="FR" id="trsvg14"><tspan id="trsvg3">French</tspan></text>
      <text systemLanguage="en" id="trsvg15"><tspan id="trsvg4">English</tspan></text>
      <text id="trsvg14" systemLanguage="fr"><tspan id="trsvg3">French</tspan></text>
      <text id="trsvg16-de" systemLanguage="de"><tspan id="trsvg5-de">German 2</tspan></text>
      <text id="trsvg16"><tspan id="trsvg5">Default</tspan></text>
    </switch>

The basic issue above is the case-sensitive comparison in text[@systemLanguage='$language']. XPath 1.0 does not have a case-sensitive comparison. A possible fix is to modify the XPath filter to fix both the lowercase and the underscore issues:

text[translate(@systemLanguage, "ABCDEFGHIJKLMNOPQRSTUVWXYZ_", "abcdefghijklmnopqrstuvwxyz-")='$language']

There are still some questions about the code. How does systemLanguage="zh_HANT" enter? Are translation selections edited in the UI? And how does the file's systemLanguage="zh-Hant" turn into systemLanguage="zh_HANT"?

Looks like this routine will change zh-hant into zh_HANT:

    /**
     * @param string $langCode
     * @return string
     */
    private static function langCodeToOs(string $langCode): string
    {
        if (false === strpos($langCode, '-')) {
            // No territory specified, so no change to make (fr => fr)
            return $langCode;
        }
        [ $prefix, $suffix ] = explode('-', $langCode, 2);
        return $prefix.'_'.strtoupper($suffix);
    }

Hoisting style attributes[edit]

At https://github.com/wikimedia/svgtranslate/blob/master/src/Model/Svg/SvgFile.php I hope this does not do what I think it does:

            // Non-translatable style elements on texts get lost, so bump up to switch
            if ($text->hasAttribute('style')) {
                $style = $text->getAttribute('style');
                $text->parentNode->setAttribute('style', $style);
            }

Splitting lantag issue[edit]

Possible id duplication:

                foreach ($realLangs as $realLang) {
                    // Although the SVG spec supports multi-language text tags (e.g. "en,fr,de")
                    // these are a really poor idea since (a) they are confusing to read and (b) the
                    // desired translations could diverge at any point. So get rid.
                    $singleLanguageNode = $sibling->cloneNode(true);
                    $singleLanguageNode->setAttribute('systemLanguage', $realLang);

                    // @todo: Should also go into tspans and change their ids, too.
                    // $prefix = implode( '-', explode( '-', $singleLanguageNode->getAttribute( 'id' ), -1 ) );
                    // $singleLanguageNode->setAttribute( 'id', "$prefix-$realLang" );

                    // Add in new element
                    $switch->appendChild($singleLanguageNode);
                }
                $switch->removeChild($sibling);

Hoisting attributes[edit]

SVG Translate produces verbose output. One reason is that it copies all the attributes on the original text element to the switch element clauses. Consequently, we see something like

<switch>
  <text x="100" y="200" font-family="Arial" font-size="10" systemLanguage="en"><tspan>Hello, world.</tspan></text>
  <text x="100" y="200" font-family="Arial" font-size="10" systemLanguage="de"><tspan>Hallo Welt.</tspan></text>
  <text x="100" y="200" font-family="Arial" font-size="10"><tspan>Hello, world.</tspan></text>
</text>

A more concise version would be

<switch transform="translate(100, 200)" font-family="Arial" font-size="10">
  <text systemLanguage="en"><tspan>Hello, world.</tspan></text>
  <text systemLanguage="de"><tspan>Hallo Welt.</tspan></text>
  <text><tspan>Hello, world.</tspan></text>
</text>

There can be a list of attributes to promote. For example, attributes font-family, font-size, font-weight, and font-style. If a switch element has only text elements and each of those elements have the same attribute z, then delete z from each child and move it to the switch element. The replacement may override a value on the switch element.

Class and Style confuse the issue[edit]

The class and style attributes complicate the replacement. The simple method would insist those attributes are not present on any element. There may still be a CSS selector that matches the switch or text elements. The priority of CSS rules is higher than attributes, so the only real trouble is the class attribute. Leave the class attribute on any element.

Note: SVG 1.1 does not require CSS or style attribute support.[32]

There are many ways to specify presentation properties, and those specifications may conflict. The conflicts are resolved by assigning priorities to the property specifications:[33]

  • Attributes (lowest priority)
  • CSS selectors have calculated priorities based on specificity and order
  • inline styles (may be overridden with CSS !important selectors) (highest priority)

The style priority makes hoisting difficult. It also means that inline styles are not (usually) overridden with CSS patterns.

Mozilla https://developer.mozilla.org/en-US/docs/Web/CSS/Specificity says

Your global CSS file that sets visual aspects of your site globally may be overwritten by inline styles defined directly on individual elements. Both inline styles and !important are considered bad practice, but sometimes you need the latter to override the former.

Inkscape heavily uses inline styles. In a way, that makes interpretation easier: inline styles have the highest precedence, so it is the least confusing way to apply style information. It is unlikely to be overridden by other information. At the same time, it also means that converting inline style information to attributes may have unexpected results if a CSS selector applies contradicting information.

Prefer class[edit]

It makes sense to prefer class (or other selectors) over explicit style attributes. For example, say a class selector sets the font characteristics to certain values and the style attribute sets the same values. A class selector may apply to more elements, so it should be favored.

A quick and dirty way to play this game is to remove the style attribute and then Window.getComputedStyle(). Any properties in the style attribute that are already present may be removed. Coding can be a bit tricky.

Being more direct is also difficult. One can access the stylesheets and the style attribute, but the CSSRule (Media rules make a multiverse), CSSStyleRule (does not parse selectors), and CSSStyleDeclaration do not have more mechanism. CSSStyleRule provides the selectorText, but the interface does not provide a priority list of which rules apply to an element. Parsing and interpreting selectors is a difficult task.

There is querySelector, so the inverted test may be done. Should check at how well that method works. The method does not return the priority.

Pseudo selectors may be difficult to get right. For example, :active suggests the need to example all possibilities.

Animation may also cause trouble.

Transformation rewrites[edit]

The transform rewrite is more complicated. A potential method chooses a suitable translation, appends it to the switch element's transform, and then adjusts the coordinates of all the children. For the text element, if the coordinate adjusts to zero, then remove the attribute. Do not remove the zero attributes of tspan elements because they start new text chunks. A transform element on the text or tspan elements would cause a lot of trouble. So hoist the transform attributes first and give up if they do not hoist.

Is this step a good idea? If the file is localized, all this information would be moved back into the text element. In addition, the styling may be accomplished with a class attribute and CSS, so it is not a heavy penalty for each clause. Moving styling information into a text element may be the better goal. The issue of hoisting position information may still be reasonable.

WMF warts[edit]

SVG and Inkscape[edit]

Inkscape is an editor that will preserve a lot of SVG because Inkscape uses SVG as its internal representation. Unlike other graphics editors, Inkscape does not have a different native format.

Inkscape produces bizarre numbers. Single precision formatted as a double. Metric conversions.

Inkscape uses concrete bounding boxes. Test the following scenario. An SVG image with 4 circles and each circle points to a linearGradient element. The linear gradient uses the default gradientUnits="objectBoundingBox". I believe Inkscape will clone 4 <codeLlinearGradient elements, and those elemnents will be gradientUnits="userSpaceOnUse". Test if moving an object changes the coordinates or creates a new linear gradient.

Accuracy[edit]

Some random issues about technical accuracy.

Motors[edit]

I like this diagram of a 3-phase induction motor. (Compare File:Asynchronous Motor.svg.)

Motor winding

3-phase 4-pole motor winding with 24 slots
Terminal Phase Slot Slot connect to
1 1 1 6 12
6 2 3 8 2
8 3 5 10 4
4 1 7 12 6
9 2 9 14 20
5 3 11 16 22
7 1 13 18 24
12 2 15 20 14
2 3 17 22 16
10 1 19 24 18
3 2 21 2 8
11 3 23 4 10
  • also concentric windings...

Wiring diagram on Commons?

NEMA and IEC

General

I have trouble with this diagram of a shaded pole motor. The vertical flux paths are too thick. Compare to an actual design. Also winding flux path should be similar width.

Microphones and flux path[edit]

A ribbon microphone with no flux return path.

Vacuum pumps and details[edit]

Rotary vane pump.svg

Stator shape, reflood, exhaust valve, oil seals, oil pump, foam, vanes through the axle.

Clearances. Bearings and something like an Oldham coupling.

Biology[edit]

The comment ("This picture is obsolete. the pluripotent stemcell of the blood is giving origin to a lymphoid and a myeloid cell line.") at

and

Many have worked on similar diagrams, so sort out the effort.

  • File:Illu blood cell lineage.jpg 2006-05-17 on Commons. 77 kB. NIH (when?).
  • File:Hematopoiesis (human) diagram.png 2006-08-11 1.18 MB. A. Rad. Has dense text block. There is also an extensive description about the cell images. It has an incompatible Commons license: "GFDL-self. This image is released under the GFDL-self license and is considered freely distributable. This image or any reproductions/customizations thereof (or any reproductions/customizations of its reproductions/customizations, and so forth) may NOT be sold without my explicit consent." The separate licensing section has just {{self|GFDL}}, so the licensing terms are inconsistent.

The license issue is troubling. It affects all derivatives. It also has further issues because many files have been extracted from File:Hematopoiesis (human) diagram en.svg. See, for example, File:Monoblast.svg.

Comments on individual versions.

Look at Wikidata items. Examine instance and subclass relations. (develops from (P3094), follows (P155), followed by (P156))

Looking for files extracted from File:Hematopoiesis (human) diagram en.svg. Keeping track of derivatives is nice....

Copyright[edit]

References[edit]

  1. Caching, Mozilla.org
  2. https://www.alibabacloud.com/blog/what-is-domain-resolution-and-how-it-works_597610
  3. https://serverfault.com/questions/347689/how-to-share-domain-name-with-multiple-servers
  4. See https://aeronav.faa.gov/user_guide/20211202/cug-complete.pdf at page 43. In those images, the NDB symbol is a dot, ring, and only 5 dotted rings.
  5. SVG 2.0 Chapter 5 Document Structure § 5.8
  6. Cory Doctorow, A Bug in early Creative Commons licenses has enabled a new breed of superpredator
  7. Village Pump: Cory Doctorow post on "copyleft trolls" mentions Commons
  8. Village pump:cc-by < 4.0 not ok any more
  9. e.g., https://id.loc.gov/vocabulary/relators.html
  10. Adobe, XMP Specification Part 1 at Table 4.
  11. Nevile, Liddy; Lissonnet, Sophie (January 2004) The Case for a Person/Agent Dublin Core Metadata Element Set[1]
  12. https://www.compart.com/en/unicode/charsets/Adobe-Symbol-Encoding
  13. https://www.compart.com/en/unicode/charsets/x-Adobe-Zapf-Dingbats-Encoding
  14. https://fonts2u.com/sonata.font https://adobe-type-tools.github.io/font-tech-notes/pdfs/5045.Sonata.pdf
  15. https://stackoverflow.com/questions/36486716/the-14-standard-pdf-fonts-and-character-encoding
  16. https://www.compart.com/en/unicode/charsets/Adobe-Standard-Encoding
  17. Mozilla (2021) SVG Fonts[2]
  18. https://www.enzolifesciences.com/science-center/technotes/2019/december/what-are-the-differences-between-northern-southern-and-western-blotting?/
  19. https://thumbor.readthedocs.io/en/latest/
  20. “ꝺ” U+A77A Latin Small Letter Insular D Unicode Character
  21. https://linux.die.net/man/1/xsltproc
  22. https://www.wikidata.org/w/index.php?title=Special:Search&limit=100&offset=0&profile=default&search=Pershotravneve&ns0=1&ns120=1
  23. https://www.google.com/maps/place/Radekhiv,+Lviv+Oblast,+Ukraine/@50.2811748,24.6012475,13z
  24. https://developer.mozilla.org/en-US/docs/Web/SVG/Element/circle
  25. See File:SVG CSS Test.svg for a test of .cls2.cls3 selection.
  26. https://tc39.es/ecma402/#sec-intl-datetimeformat-constructor
  27. Write the Date in French, wikihow.com. The first is pronounced "premier".
  28. Italian Ordinal Numbers and Numerical Rank, thoughtco.com. "il primo".
  29. Inkscape Tutorial. Chapter 6. SVG File Format. https://inkscapetutorial.org/svg-file-format.html
  30. https://www.mediawiki.org/wiki/User:Jarry1250/GSoC_2012_roadmap
  31. Meta:Community Tech/SVG translation
  32. https://www.w3.org/TR/SVG/styling.html
  33. https://drafts.csswg.org/selectors/#specificity-rules