Описание тега line-breaks

Whitespace that forces text layout to continue at the start of the next line.

The characters that define line breaks vary between computer system families. Windows systems typically use carriage return + line feed (CR + LF) (0x0D, 0x0A in ASCII) characters to indicate a line break in text. Unix based systems typically use LF (0x0A) alone to indicate a line break.

In programming languages \n, \r or \r\n is often used to create line breaks in a string e.g.:

variable = "Line 1\nLine2"

There's the Mandatory Break (NL, PS) and the Line Feed (LF) and the Next Line (NEL) which causes a line break after and then the Carriage Return (CR) which causes a line break after EXCEPT between CR and LF. Then there's the Combining Marks prohibiting a break between neighboring characters and similarly the Word Joiner and Non-breaking Space which prohibiting a line break to the left or the right. And lets not forget the Zero Width Space and good ol' fashion Space which are just dying to be broken by a line-break.

Breaks can occur wherever there's a break opportunity and break opportunities occur before or after dashes, spaces, hyphens, punctuation, and general characters.

In the US spaces and hyphens determine breaks. But that's not the case everywhere. In East Asia, for example, line breaks can occur ANYWHERE unless they're explicitly prohibited.

In Thailand they don't recognize space at all. Word breaks are defined by syllable boundaries rather than spaces. Thankfully Unicode has no algorithms to define such boundaries and thus, we can only assume.. what happens in Thailand stays in Thailand.


iint
findLineBrk(enum break_class *pcls, enum break_action *pbrk, int cch) {

    if (!cch) return 0;

    enum break_class cls = pcls[0]; // class of 'before' character

    // treat SP at start of input as if it followed a WJ
    if (cls == SP) cls = WJ;

    // loop over all pairs in the string up to a hard break
    for (int ich = 1; (ich 

page 38