File talk:SVG Test TextAlign.svg

correct rendering

@Glrx: I currently setting up a svg test suite (=>User:JoKalliauer/SVG_test_suites#test_files_by_User:Glrx) Please add files which you think might be important to check.

Some elements are not clear to me:

fraction slash

division slash: 123 ⁴⁵⁶∕₇₈₉
fraction slash: 123 456⁄789

Is the fraction slash an optional text-enhancement or is the behaviour of this unicode-charater defined in the SVG 1.1-DTD?

@JoKalliauer:

I am impressed by your sophisticated questions. I'm not sure that I can answer them well.

The SVG 1.1-DTD does not define how characters are treated or painted. The SVG 1.1 DTD just tries to specify the XML syntax of SVG files: which elements are allowed, which attributes they may have, which elements may be descendants of other elements, and what values are legal for some attributes. DTDs were developed before XML namespaces, so DTDs are not well suited for specifying SVG syntax. The SVG 1.1 DTD plays some tricks to make it look like xlink:href refers to a namespace, but it really does not.

The SVG 1.1 specification does not require Unicode or any other character set. It makes some general pronouncements about how character positions should be computed, but even there the specification is sometimes ambiguous (e.g., the scope of a vertical offset) or misguided (e.g., putting starting and terminal spaces inside the margins). In particular, the SVG 1.1 specification does not impose any requirements on typesetting fractions.

Unicode has a notion of composed and decomposed characters. There is the Unicode character <a href="https://www.fileformat.info/info/unicode/char/00bd/index.htm">VULGAR FRACTION ONE HALF</a>: ½. That is a composed character; it can be broken down into a sequence of other Unicode characters. Unicode defines some string operations. One set of operations gets normalized forms. See <a href="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String/normalize">String.prototype.normalize()</a>. If I ask for the Normal Form Decomposition (NFD) of ½, I will get <fraction> 0031 2044 0032. Simply put, that is the character "1", the character fraction slash, and the character "2".

Unicode wants equal treatment for code sequences that represent equivalent character strings. The results should be the same if I enter the composed character "½" (½) or its decomposition "1⁄2" (1&frasl;2). The same should be true for the composed character "ü" (ü) or its decomposition "ü" (ü). It takes a lot of work, but that is usually what happens.

Generally, the operating system or a code library will be tasked with rendering a Unicode string as an image on the screen. It must do BIDI reordering, and it may do additional font operations such as ligatures (e.g., rendering the characters "fl" as the ligature "ﬂ"). The conversion of Unicode characters to their corresponding glyphs can be complex. The Unicode specification for a script may involve fewer than 256 characters, but those characters may represent thousands of possible glyphs. Devanagari and Siddham scripts are examples. Font specifications such as OpenType have sophisticated mechanisms to handle such glyph mapping.

Even though the specification may be clear, implementations may have issues. The Unicode character <a href="https://www.fileformat.info/info/unicode/char/215f/index.htm">FRACTION NUMERATOR ONE</a> (⅟) should work the same as its 1&frasl; decomposition, but it usually does not: ⅟17 and 1⁄17 are different on my browsers.

Glrx (talk) 19:50, 14 March 2021 (UTC)[reply]

نص المرساة 15 kV

The kV should be at the beginning because of direction="rtl" as currently rendered by Wikimedia's librsvg?

@JoKalliauer:

The issue is the Unicode Bidirectional Algorithm.

That algorithm determines how a character sequence is reordered. That reordering depends on the current direction and the classification of the individual characters.

The character sequence is

(strong RTL Arabic characters)
(neutral space)
(strong RTL Arabic characters)
(neutral space)
(weak LTR numbers)
(neutral space)
(strong LTR Latin characters).

Here are the characters in LTR order with no Unicode reordering:

bdo: نص المرساة 15 kV

Generally, characters will be ordered in the specified direction, but some strong and weak characters will cause a temporary direction change. That temporary direction change will continue until another character class is strong enough to change it.

ltr: نص المرساة 15 kV

The above span starts LTR, but it immediately runs into some strong RTL Arabic characters. Those characters are ordered to display RTL but the position will be determined by the LTR start. The neutral space is not strong enough to change direction, so the space is positioned LTR. The second group of Arabic characters is positioned RTL. The second space is neutral, so it is also positioned RTL. The weak LTR numbers are positioned LTR, but they are not strong enough to overpower the RTL positioning, so the numbers are positioned to the left of the Arabic characters. Then there is a neutral space, but it is followed by some strong LTR characters. That terminates the RTL layout caused by the Arabic characters. The space and the Latin characters are positioned to the right of the Arabic characters and given a LTR order.

rtl: نص المرساة 15 kV

The above span starts RTL. The Arabic characters and neutral spaces are compatible with that direction. The weak numbers are ordered LTR, but they are positioned to the left of the Arabic characters. The neutral space is positioned to the left of the numbers. The strong LTR Latin characters are ordered LTR but positioned to the left of the space.

Here are the same strings but with some weak LTR numbers (" 123") appended. The weak numbers associate with the strong Latin "kV", so they are positioned to the right of the "kV" in both LTR and RTL directions.

bdo: نص المرساة 15 kV 123

ltr: نص المرساة 15 kV 123

rtl: نص المرساة 15 kV 123

There are other complications, but that is the basic idea.

Glrx (talk) 18:06, 14 March 2021 (UTC)[reply]

small caps

true vs. scaled small caps

font-variant="small-caps" are always scaled and never true capitals?

— Johannes Kalliauer - Talk | Contributions 12:49, 14 March 2021 (UTC)[reply]

@JoKalliauer:

Yes. font-variant="small-caps" means the lowercase letters appear as uppercase letters but at a smaller size than actual uppercase characters. No, it does not necessarily mean the small caps are linearly scaled copies of the uppercase letters. A font could have independent glyphs. The small-caps variant also exists in HTML: Johannes Kalliauer. Glrx (talk) 16:47, 14 March 2021 (UTC)[reply]

File talk:SVG Test TextAlign.svg

Contents

correct rendering

fraction slash

نص المرساة 15 kV

small caps

Navigation menu

File talk:SVG Test TextAlign.svg

correct rendering

fraction slash

نص المرساة 15 kV

small caps

Navigation menu

Search