From 41a83520a0f8a1a84619324291d3b1c1e2a2ee15 Mon Sep 17 00:00:00 2001 From: Jay Bingham Date: Mon, 3 Jun 2019 13:59:33 -0300 Subject: Refactor RegExp Help page with improved contents 1) Mute some l10n strings Change-Id: Ia7a2c1898491c0f0352c9c8df519bba057b3bab6 Reviewed-on: https://gerrit.libreoffice.org/73408 Tested-by: Jenkins Reviewed-by: Olivier Hallot --- source/text/shared/01/02100001.xhp | 257 ++++++++++++++++++++++++------------- 1 file changed, 167 insertions(+), 90 deletions(-) diff --git a/source/text/shared/01/02100001.xhp b/source/text/shared/01/02100001.xhp index b4d037dd59..1f4b2dc771 100644 --- a/source/text/shared/01/02100001.xhp +++ b/source/text/shared/01/02100001.xhp @@ -27,6 +27,11 @@ regular expressions; list of +regular expressions; new line +regular expressions; empty paragraph +regular expressions; begin of word +regular expressions; begin of paragraph +regular expressions; end of paragraph lists; regular expressions replacing; tab stops (regular expressions) tab stops; regular expressions @@ -37,10 +42,10 @@ - Character + Term - Result/Use + Representation/Use @@ -48,96 +53,100 @@ Any character - Represents the given character unless otherwise specified. + The given character unless it is a regular expression meta character, which follow in this table. - . + . - Represents any single character except for a line break or paragraph break. For example, the search term "sh.rt" returns both "shirt" and "short". + Any single character except a line break or a paragraph break. For example, the search term "sh.rt" matches both "shirt" and "short". - ^ + ^ - Only finds the search term if the term is at the beginning of a paragraph. Special objects such as empty fields or character-anchored frames, at the beginning of a paragraph are ignored. Example: "^Peter". + The beginning of a paragraph or cell. Special objects such as empty fields or character-anchored frames, at the beginning of a paragraph are ignored. Example: "^Peter" matches the word "Peter" only when it is the first word of a paragraph. + - $ + $ - Only finds the search term if the term appears at the end of a paragraph. Special objects such as empty fields or character-anchored frames at the end of a paragraph are ignored. Example: "Peter$". + The end of a paragraph or cell. Special objects such as empty fields or character-anchored frames at the end of a paragraph are ignored. Example: "Peter$" matches only when the word "Peter" is the last word of a paragraph, note "Peter" cannot be followed by a period. $ on its own matches the end of a paragraph. This way it is possible to search and replace paragraph breaks. - * + * - Finds zero or more of the characters in front of the "*". For example, "Ab*c" finds "Ac", "Abc", "Abbc", "Abbbc", and so on. + Zero or more of the regular expression term immediately preceding it. For example, "Ab*c" matches "Ac", "Abc", "Abbc", "Abbbc", and so on. + - + + + - Finds one or more of the characters in front of the "+". For example, "AX.+4" finds "AXx4", but not "AX4". - The longest possible string that matches this search pattern in a paragraph is always found. If the paragraph contains the string "AX 4 AX4", the entire passage is highlighted. + One or more of the regular expression term immediately preceding it. For example, "AX.+4" finds "AXx4", but not "AX4". + The longest possible string that matches this regular expression in a paragraph is always matched. If the paragraph contains the string "AX 4 AX4", the entire passage is highlighted. - ? + ? - Finds zero or one of the characters in front of the "?". For example, "Texts?" finds "Text" and "Texts" and "x(ab|c)?y" finds "xy", "xaby", or "xcy". + Zero or one of the regular expression term immediately preceding it. For example, "Texts?" matches "Text" and "Texts" and "x(ab|c)?y" finds "xy", "xaby", or "xcy". - \ + \ - Search interprets the special character that follows the "\" as a normal character and not as a regular expression (except for the combinations \n, \t, \>, and \<). For example, "tree\." finds "tree.", not "treed" or "trees". + The special character that follows it is interpreted as a normal character and not as a regular expression meta character (except for the combinations "\n", "\t", "\b", "\>" and "\<"). For example, "tree\." matches "tree.", not "treed" or "trees". - \n + \n - Represents a line break that was inserted with the Shift+Enter key combination. To change a line break into a paragraph break, enter \n in the Find and Replace boxes, and then perform a search and replace. - \n in the Find text box stands for a line break that was inserted with the Shift+Enter key combination. - \n in the Replace text box stands for a paragraph break that can be entered with the Enter or Return key. + A line break that was inserted with the Shift+Enter key combination when in the Find text box. + A paragraph break that can be entered with the Enter or Return key when in the Replace text box. + To change line breaks into paragraph breaks, enter \n in both the Find and Replace boxes, and then perform a search and replace. - \t + \t - Represents a tab. You can also use this expression in the Replace box. + A tab character. Can also be used in the Replace box. + - \b + \b - Match a word boundary. For example, "\bbook" finds "bookmark" but not "checkbook" whereas "book\b" finds "checkbook" but not "bookmark". The discrete word "book" is found by both search terms. + A word boundary. For example, "\bbook" matches "bookmark" and "book" but not "checkbook" whereas "book\b" matches "checkbook" and "book" but not "bookmark". + Note, this form replaces the obsolete (although they still work for now) forms "\>" (match end of word) and "\<" (match start of word). - ^$ + ^$ Finds an empty paragraph. @@ -145,7 +154,7 @@ - ^. + ^. Finds the first character of a paragraph. @@ -156,103 +165,94 @@ & or $0 - Adds the string that was found by the search criteria in the Find box to the term in the Replace box when you make a replacement. - For example, if you enter "window" in the Find box and "&frame" in the Replace box, the word "window" is replaced with "windowframe". - You can also enter an "&" in the Replace box to modify the Attributes or the Format of the string found by the search criteria. + Adds the string that was found by the search criteria in the Find box to the term in the Replace box when you make a replacement. + For example, if you enter "window" in the Find box and "&frame" in the Replace box, the word "window" is replaced with "windowframe". + You can also enter an "&" in the Replace box to modify the Attributes or the Format of the string found by the search criteria. - [abc123] + [...] - Represents one of the characters that are between the brackets. + Any single occurrence of any one of the characters that are between the brackets. For example: "[abc123]" matches the characters ‘a’, ‘b’, ’c’, ‘1’, ‘2’ and ‘3’. "[a-e]" matches single occurrences of the characters a through e, inclusive (the range must be specified with the character having the smallest Unicode code number first). "[a-eh-x]" matches any single occurrence of the characters that are in the ranges ‘a’ through ‘e’ and ‘h’ through ‘x’. - [a-e] + [^...] - Represents any of the characters that are between a and e, including both start and end characters. - The characters are ordered by their code numbers. + Any single occurrence of a character, including Tab, Space and Line Break characters, that is not in the list of characters specified inclusive ranges are permitted. For example "[^a-syz]" matches all characters not in the inclusive range ‘a’ through ‘s’ or the characters ‘y’ and ‘z’. - [a-eh-x] + \uXXXX + \UXXXXXXXX - Represents any of the characters that are between a-e and h-x. + The character represented by the four-digit hexadecimal Unicode code (XXXX). + The character represented by the eight-digit hexadecimal Unicode code (XXXXXXXX). + For certain symbol fonts the Unicode code for special characters may depend on the font in use. The Unicode codes can be viewed by choosing Insert - Special Character. - [^a-s] + | - Represents everything that is not between a and s. + The infix operator delimiting alternatives. Matches the term preceding the "|" or the term following the "|". For example, "this|that" matches occurrences of both "this" and "that". - \uXXXX - \UXXXXXXXX + {N} - Represents a character based on its four-digit hexadecimal Unicode code (XXXX). - For obscure characters there is a separate variant with capital U and eight hexadecimal digits (XXXXXXXX). - For certain symbol fonts the code for special characters may depend on the used font. You can view the codes by choosing Insert - Special Character. + The post-fix repetition operator that specifies an exact number of occurrences ("N") of the regular expression term immediately preceding it must be present for a match to occur. For example, "tre{2}" matches "tree". - | + {N,M} - Finds the terms that occur before the "|" and also finds the terms that occur after the "|". For example, "this|that" finds "this" and "that". + The post-fix repetition operator that specifies a range (minimum of "N" to a maximum of "M") of occurrences of the regular expression term immediately preceding it that can be present for a match to occur. For example, "tre{1,2}" marches "tre" and "tree". - {2} + {N,} - Defines the number of times that the character in front of the opening bracket occurs. For example, "tre{2}" finds and selects "tree". + The post-fix repetition operator that specifies a range (minimum "N" to an unspecified maximum) of occurrences of the regular expression term immediately preceding it that can be present for a match to occur. (The maximum number of occurrences is limited only by the size of the document). For example, "tre{2,}" matches "tree", "treee", and "treeeee". - {1,2} + (...) - Defines the minimum and maximum number of times that the character in front of the opening bracket can occur. For example, "tre{1,2}" finds and selects "tre" and "tree". + The grouping construct that serves three purposes. + + + To enclose a set of ‘|’ alternatives. For example, the regular expression "b(oo|ac)k" matches both "book" and "back". + + + To group terms in a complex expression to be operated on by the post-fix operators: "*", "+" and "?" along with the post-fix repetition operators. For example, the regular expression "a(bc)?d" matches both "ad" and "abcd" in a search.; the regular expression "M(iss){2}ippi" matches "Mississippi". + + + To record the matched sub string inside the parentheses as a reference for later use in the Find box using the "\n" construct or in the Replace box using the "$n" construct, where the reference to the first matched sub string in the current expression in the Find box is represented by "\1" in the Find box and by "$1" in the Replace box, the reference to the second matched sub string by "\2" and "$2" respectively, and so on. + + + For example, the regular expression "(890)7\1\1" matches "8907890890". + With the regular expression "\b(fruit|truth)\b" in the Find box and the regular expression "$1ful" in the Replace box occurrences of the words "fruit" and "truth" can be replaced with the words "fruitful" and "truthful" respectively without affecting the words "fruitfully" and "truthfully" - {1,} - - - Defines the minimum number of times that the character in front of the opening bracket can occur. For example, "tre{2,}" finds "tree", "treee", and "treeeee". - - - - - ( ) - - - In the Find box: - Defines the characters inside the parentheses as a reference. You can then refer to the first reference in the current expression with "\1", to the second reference with "\2", and so on. - For example, if your text contains the number 13487889 and you search using the regular expression (8)7\1\1, "8788" is found. - You can also use () to group terms, for example, "a(bc)?d" finds "ad" or "abcd". - In the Replace box:i83322 - Use $ (dollar) instead of \ (backslash) to replace references. Use $0 to replace the whole found string. - - - - - [:alpha:] + [:alpha:] Represents an alphabetic character. Use [:alpha:]+ to find one of them. @@ -260,7 +260,7 @@ - [:digit:] + [:digit:] Represents a decimal digit. Use [:digit:]+ to find one of them. @@ -268,7 +268,7 @@ - [:alnum:] + [:alnum:] Represents an alphanumeric character ([:alpha:] and [:digit:]). @@ -276,7 +276,7 @@ - [:space:] + [:space:] Represents a space character (but not other whitespace characters).UFI: see #i41706# @@ -284,7 +284,7 @@ - [:print:] + [:print:] Represents a printable character. @@ -292,7 +292,7 @@ - [:cntrl:] + [:cntrl:] Represents a nonprinting character. @@ -300,7 +300,7 @@ - [:lower:] + [:lower:] Represents a lowercase character if Match case is selected in Options. @@ -308,7 +308,7 @@ - [:upper:] + [:upper:] Represents an uppercase character if Match case is selected in Options. @@ -316,16 +316,93 @@

For a full list of supported metacharacters and syntax, see ICU Regular Expressions documentation -Examples -e([:digit:])? -- finds 'e' followed by zero or one digit. Note that currently all named character classes like [:digit:] must be enclosed in parentheses.issue 64368 and 113035 -^([:digit:])$ -- finds lines or cells with exactly one digit. - You can combine the search terms to form complex searches. -To find three-digit numbers alone in a paragraph -^[:digit:]{3}$ - ^ means the match has to be at the start of a paragraph, - [:digit:] matches any decimal digit, - {3} means there must be exactly 3 copies of "digit", - $ means the match must end a paragraph. +Note that currently all named character class terms, [:alpha:] through [:upper:], must be enclosed in parentheses when used in a regular expression, see the examples that follow. +Regular expression terms can be combined to form complex and sophisticated regular expressions for searches as show in the following examples. +

Examples

+ + + + + Expression + + + Meaning + + + + + ^$ + + + An empty paragraph. + ^ specifies that the match must be at the start of a paragraph, + $ specifies that a paragraph mark or the end of a cell must follow the matched string. + + + + + ^. + + + The first character of a paragraph. + + . specifies any single character. + + + + + e([:digit:])? + + + Matches "e" by itself or an "e" followed by one digit. + e specifies the character "e", + [:digit:] specifies any decimal digit, + ? specifies zero or one occurrences of [:digit:]. + + + + + ^([:digit:])$ + + + Matches a paragraph or cells containing exactly one digit. + + + + + + + + ^[:digit:]{3}$ + + + Matches a paragraph or cell containing only three digit numbers + + + {3} specifies that [:digit:] must occur three times, + + + + + + \bconst(itu|ruc)tion\b + + + Matches the words "constitution" and "construction" but not the word "constitutional." + \b specifies that the match must begin at a word boundary, + const specifies the characters "const", + ( starts the group, + itu specifies the characters "itu", + | specifies the alternative, + ruc specifies the characters "ruc", + ) ends the group, + tion specifies the characters "tion", + /b specifies that the match must end at a word boundary. + + +

+ +

-- cgit