/shell/

input with e.g. some RTL_TEXTENCODING_MS_125x that happens to exhibit the broken behavior of not consuming erroneous input, then wants to try to re-read the erroneous input with RTL_TEXTENCODING_MS_1252. For example, opening sw/qa/core/data/ww8/pass/forcepoint50-grfanchor-1.doc triggers that code. For whatever reason, the am_faksas.dot attached to <https://bz.apache.org/ooo/show_bug.cgi?id=9240#c1> "Do not show lithuanian letter 'Š'" appears to not, or at least no longer, trigger that code.) Therefore, it would be useful to have a mode in which rtl_convertTextToUnicode does not consume erroneous input. (And I plan on doing changes in sal/osl/unx/file* that would benefit from that behavior, too.) But changing rtl_convertTextToUnicode to generally not consume erroneous input would not be feasible: If calls do not set RTL_TEXTTOUNICODE_FLAGS_FLUSH, part of an erroneous input can already have been consumed by a previous call, so the current call cannot undo that. But a change that looks like it can work is to change the behavior only if RTL_TEXTTOUNICODE_FLAGS_FLUSH is set. In that case we can at least not consume the part of an erroneous input that has not yet been consumed by a previous call (which would necessarily have been done with RTL_TEXTTOUNICODE_FLAGS_FLUSH unset). The expecation is that code that relies on the don't-consume behavior will do only single calls with RTL_TEXTTOUNICODE_FLAGS_FLUSH set (so reliably not consume the complete erroneous input), while other code (which might do calls in a loop) will not care whether erroneous input has been consumed, anyway. This can be considered a mild form of behavioral API CHANGE (but note that the old implementation didn't exhibit the requested behavior anyway). So all implementations of rtl_convertTextToUnicode for the various rtl_TextEncoding values have been adapted to the new behavior. The only exceptions are ImplDummyToUnicode (sal/textenc/textcvt.cxx), which is a special case anyway used by RTL_TEXTENCODING_DONTKNOW, and two out of three places (marked with a "TODO" each) in ImplUTF7ToUnicode (sal/textenc/tcvtutf7.cxx), where it is hard to retrofit the expected behaivor, and RTL_TEXTENCODING_UTF7 is probably not relevant for the use cases relying on the don't-consume--behavior, anyway. Whether a similar change should be done for rtl_convertUnicodeToText can be examined later. Change-Id: I1ac2c4cfd99e2a0eca219f9a3855ef110b254855 Reviewed-on: https://gerrit.libreoffice.org/78584 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
<http://udk.openoffice.org/cpp/man/spec/textconversion.html> specifies that
FLAGS_UNDEFINED_ERROR, FLAGS_MBUNDEFINED_ERROR, and FLAGS_INVALID_ERROR: "Read
past the [erroneous] code in the input buffer [...]"  But actual behavior of
rtl_convertTextToUnicode for the various rtl_TextEncoding values has been
inconsistent.  Some erroneous input (mostly single-byte UNDEFINED and INVALID
ones) has not been consumed at all, some (multi-byte MBUNDEFINED and INVALID)
has been consumed partly, and some has been consumed fully as required.

However, at least since 8dd4265b9ddbd7786b6237676909eae5b540da0e "CWS-TOOLING:
integrate CWS hb18", Custom8BitToUnicode in sw/source/filter/ww8/ww8par.cxx
appears to rely on the broken behavior of not consuming erroneous input.  (It
reads the chunk of valid input with e.g. some RTL_TEXTENCODING_MS_125x that
happens to exhibit the broken behavior of not consuming erroneous input, then
wants to try to re-read the erroneous input with RTL_TEXTENCODING_MS_1252.  For
example, opening sw/qa/core/data/ww8/pass/forcepoint50-grfanchor-1.doc triggers
that code.  For whatever reason, the am_faksas.dot attached to
<https://bz.apache.org/ooo/show_bug.cgi?id=9240#c1> "Do not show lithuanian
letter 'Š'" appears to not, or at least no longer, trigger that code.)

Therefore, it would be useful to have a mode in which rtl_convertTextToUnicode
does not consume erroneous input.  (And I plan on doing changes in
sal/osl/unx/file* that would benefit from that behavior, too.)  But changing
rtl_convertTextToUnicode to generally not consume erroneous input would not be
feasible:  If calls do not set RTL_TEXTTOUNICODE_FLAGS_FLUSH, part of an
erroneous input can already have been consumed by a previous call, so the
current call cannot undo that.

But a change that looks like it can work is to change the behavior only if
RTL_TEXTTOUNICODE_FLAGS_FLUSH is set.  In that case we can at least not consume
the part of an erroneous input that has not yet been consumed by a previous call
(which would necessarily have been done with RTL_TEXTTOUNICODE_FLAGS_FLUSH
unset).  The expecation is that code that relies on the don't-consume behavior
will do only single calls with RTL_TEXTTOUNICODE_FLAGS_FLUSH set (so reliably
not consume the complete erroneous input), while other code (which might do
calls in a loop) will not care whether erroneous input has been consumed,
anyway.  This can be considered a mild form of behavioral API CHANGE (but note
that the old implementation didn't exhibit the requested behavior anyway).

So all implementations of rtl_convertTextToUnicode for the various
rtl_TextEncoding values have been adapted to the new behavior.  The only
exceptions are ImplDummyToUnicode (sal/textenc/textcvt.cxx), which is a special
case anyway used by RTL_TEXTENCODING_DONTKNOW, and two out of three places
(marked with a "TODO" each) in ImplUTF7ToUnicode (sal/textenc/tcvtutf7.cxx),
where it is hard to retrofit the expected behaivor, and RTL_TEXTENCODING_UTF7 is
probably not relevant for the use cases relying on the don't-consume--behavior,
anyway.

Whether a similar change should be done for rtl_convertUnicodeToText can be
examined later.

Change-Id: I1ac2c4cfd99e2a0eca219f9a3855ef110b254855
Reviewed-on: https://gerrit.libreoffice.org/78584
Tested-by: Jenkins
Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
More loplugin:cstylecast: sal 2018-01-12T19:17:51+00:00 Stephan Bergmann sbergman@redhat.com 2018-01-12T19:17:51+00:00 5a3bb76cd384fa3760fe8481ce008791258595ad auto-rewrite with <https://gerrit.libreoffice.org/#/c/47798/> "Enable loplugin:cstylecast for some more cases" plus solenv/clang-format/reformat-formatted-files Change-Id: I7d89b011464ba5d2dd12e04d5fc9f65cb4daebde
auto-rewrite with <https://gerrit.libreoffice.org/#/c/47798/> "Enable
loplugin:cstylecast for some more cases" plus
solenv/clang-format/reformat-formatted-files

Change-Id: I7d89b011464ba5d2dd12e04d5fc9f65cb4daebde
loplugin:includeform: sal 2017-10-23T20:45:58+00:00 Stephan Bergmann sbergman@redhat.com 2017-10-23T20:34:03+00:00 c30f3ea3dd294210058c781145d6fdd320c975b4 Change-Id: I539ca8b9dee5edc5fc2282a2b9b0ffd78bad8b11
Change-Id: I539ca8b9dee5edc5fc2282a2b9b0ffd78bad8b11
RTL_UNICODETOTEXT_INFO_{DEST|SCR}BUFFERTOSMALL should use TOO, not TO 2017-07-17T19:47:38+00:00 Chris Sherlock chris.sherlock79@gmail.com 2017-07-15T04:35:50+00:00 9af8964dd2cb048e328732ee6eb809b0da46a3ae I have kept the old mispelled constant for backwards compatibility Change-Id: I128a2eec76d00cc5ef058cd6a0c35a7474d2411e Reviewed-on: https://gerrit.libreoffice.org/39995 Reviewed-by: Chris Sherlock <chris.sherlock79@gmail.com> Tested-by: Chris Sherlock <chris.sherlock79@gmail.com>
I have kept the old mispelled constant for backwards compatibility

Change-Id: I128a2eec76d00cc5ef058cd6a0c35a7474d2411e
Reviewed-on: https://gerrit.libreoffice.org/39995
Reviewed-by: Chris Sherlock <chris.sherlock79@gmail.com>
Tested-by: Chris Sherlock <chris.sherlock79@gmail.com>
clang-tidy: readability-else-after-return 2017-04-12T08:11:34+00:00 Noel Grandin noel.grandin@collabora.co.uk 2017-04-06T08:47:24+00:00 992a33313046f4a4d322db9464c474e7429a019a run it against sal,cppu,cppuhelper I had to run this multiple times to catch all the cases in each module, and it requires some hand-tweaking of the resulting output - clang-tidy is not very good about cleaning up trailing spaces, and aligning things nicely. Change-Id: I00336345f5f036e12422b98d66526509380c497a Reviewed-on: https://gerrit.libreoffice.org/36194 Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk> Tested-by: Noel Grandin <noel.grandin@collabora.co.uk>
run it against sal,cppu,cppuhelper

I had to run this multiple times to catch all the cases in each module,
and it requires some hand-tweaking of the resulting output - clang-tidy
is not very good about cleaning up trailing spaces, and aligning
things nicely.

Change-Id: I00336345f5f036e12422b98d66526509380c497a
Reviewed-on: https://gerrit.libreoffice.org/36194
Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk>
Tested-by: Noel Grandin <noel.grandin@collabora.co.uk>
loplugin:nullptr (automatic rewrite) 2015-11-10T09:31:35+00:00 Stephan Bergmann sbergman@redhat.com 2015-11-10T09:21:55+00:00 26f05d59bc1c25b8a0d19be7f4738fd12e557001 Change-Id: I1bc6c87fcd6e5e96362623be94c59be216a3b2b8
Change-Id: I1bc6c87fcd6e5e96362623be94c59be216a3b2b8
Clean up C-style casts from pointers to void 2015-03-28T18:09:24+00:00 Stephan Bergmann sbergman@redhat.com 2015-03-28T18:05:46+00:00 d57d81e529a44d8042401e36057a69ebe97e870a Change-Id: I5e370445affbcd32b05588111f74590bf24f39d6
Change-Id: I5e370445affbcd32b05588111f74590bf24f39d6
Some more loplugin:cstylecast: sal 2015-01-20T08:06:50+00:00 Stephan Bergmann sbergman@redhat.com 2015-01-17T18:01:27+00:00 b70b4e644b5bb5356509505855418453dc621cfc Change-Id: Ie54d340478412e62b87d66e287fd8a3963e97898
Change-Id: Ie54d340478412e62b87d66e287fd8a3963e97898