Age | Commit message (Collapse) | Author |
|
Change-Id: I11a54c1ddf73c16ce46a0d1c375bf43157870db7
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/155856
Tested-by: Jenkins
Reviewed-by: Miklos Vajna <vmiklos@collabora.com>
|
|
Change-Id: I3eb05d8f5b0761bc3b672d4c855eb469f8cc1a29
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/127375
Tested-by: Jenkins
Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com>
|
|
For one, that broke round-tripping with e.g. UTF-8 (see the test case added to
Test::testComplex in sal/qa/rtl/textenc/rtl_textcvt.cxx) which did not treat
noncharacters as invalid.
For another, <https://unicode.org/faq/private_use.html#nonchar7> is meanwhile
quite clear on the matter:
"Q: Are noncharacters prohibited in interchange?
"A: This question has led to some controversy, because the Unicode Standard has
been somewhat ambiguous about the status of noncharacters. The formal wording of
the definition of 'noncharacter' in the standard has always indicated that
noncharacters 'should never be interchanged.' That led some people to assume
that the definition actually meant 'shall not be interchanged' and that
therefore the presence of a noncharacter in any Unicode string immediately
rendered that string malformed according to the standard. But the intended use
of noncharacters requires the ability to exchange them in a limited context, at
least across APIs and even through data files and other means of 'interchange',
so that they can be processed as intended. The choice of the word 'should' in
the original definition was deliberate, and indicated that one should not try to
interchange noncharacters precisely because their interpretation is strictly
internal to whatever implementation uses them, so they have no publicly
interchangeable semantics. But other informative wording in the text of the core
specification and in the character names list was differently and more strongly
worded, leading to contradictory interpretations.
"Given this ambiguity of intent, in 2013 the UTC issued Corrigendum #9, which
deleted the phrase 'and that should never be interchanged' from the definition
of noncharacters, to make it clear that prohibition from interchange is not part
of the formal definition of noncharacters. Corrigendum #9 has been incorporated
into the core specification for Unicode 7.0.
"Q: Are noncharacters invalid in Unicode strings and UTFs?
"A: Absolutely not. Noncharacters do not cause a Unicode string to be ill-formed
in any UTF. This can be seen explicitly in the table above, where every
noncharacter code point has a well-formed representation in UTF-32, in UTF-16,
and in UTF-8. An implementation which converts noncharacter code points between
one UTF representation and another must preserve these values correctly. The
fact that they are called 'noncharacters' and are not intended for open
interchange does not mean that they are somehow illegal or invalid code points
which make strings containing them invalid."
Change-Id: I4fcc0156e3d2fd305a7c7bb0c7b3dbef846c9e64
Reviewed-on: https://gerrit.libreoffice.org/78598
Tested-by: Jenkins
Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
|
|
<http://udk.openoffice.org/cpp/man/spec/textconversion.html> specifies that
FLAGS_UNDEFINED_ERROR, FLAGS_MBUNDEFINED_ERROR, and FLAGS_INVALID_ERROR: "Read
past the [erroneous] code in the input buffer [...]" But actual behavior of
rtl_convertTextToUnicode for the various rtl_TextEncoding values has been
inconsistent. Some erroneous input (mostly single-byte UNDEFINED and INVALID
ones) has not been consumed at all, some (multi-byte MBUNDEFINED and INVALID)
has been consumed partly, and some has been consumed fully as required.
However, at least since 8dd4265b9ddbd7786b6237676909eae5b540da0e "CWS-TOOLING:
integrate CWS hb18", Custom8BitToUnicode in sw/source/filter/ww8/ww8par.cxx
appears to rely on the broken behavior of not consuming erroneous input. (It
reads the chunk of valid input with e.g. some RTL_TEXTENCODING_MS_125x that
happens to exhibit the broken behavior of not consuming erroneous input, then
wants to try to re-read the erroneous input with RTL_TEXTENCODING_MS_1252. For
example, opening sw/qa/core/data/ww8/pass/forcepoint50-grfanchor-1.doc triggers
that code. For whatever reason, the am_faksas.dot attached to
<https://bz.apache.org/ooo/show_bug.cgi?id=9240#c1> "Do not show lithuanian
letter 'Š'" appears to not, or at least no longer, trigger that code.)
Therefore, it would be useful to have a mode in which rtl_convertTextToUnicode
does not consume erroneous input. (And I plan on doing changes in
sal/osl/unx/file* that would benefit from that behavior, too.) But changing
rtl_convertTextToUnicode to generally not consume erroneous input would not be
feasible: If calls do not set RTL_TEXTTOUNICODE_FLAGS_FLUSH, part of an
erroneous input can already have been consumed by a previous call, so the
current call cannot undo that.
But a change that looks like it can work is to change the behavior only if
RTL_TEXTTOUNICODE_FLAGS_FLUSH is set. In that case we can at least not consume
the part of an erroneous input that has not yet been consumed by a previous call
(which would necessarily have been done with RTL_TEXTTOUNICODE_FLAGS_FLUSH
unset). The expecation is that code that relies on the don't-consume behavior
will do only single calls with RTL_TEXTTOUNICODE_FLAGS_FLUSH set (so reliably
not consume the complete erroneous input), while other code (which might do
calls in a loop) will not care whether erroneous input has been consumed,
anyway. This can be considered a mild form of behavioral API CHANGE (but note
that the old implementation didn't exhibit the requested behavior anyway).
So all implementations of rtl_convertTextToUnicode for the various
rtl_TextEncoding values have been adapted to the new behavior. The only
exceptions are ImplDummyToUnicode (sal/textenc/textcvt.cxx), which is a special
case anyway used by RTL_TEXTENCODING_DONTKNOW, and two out of three places
(marked with a "TODO" each) in ImplUTF7ToUnicode (sal/textenc/tcvtutf7.cxx),
where it is hard to retrofit the expected behaivor, and RTL_TEXTENCODING_UTF7 is
probably not relevant for the use cases relying on the don't-consume--behavior,
anyway.
Whether a similar change should be done for rtl_convertUnicodeToText can be
examined later.
Change-Id: I1ac2c4cfd99e2a0eca219f9a3855ef110b254855
Reviewed-on: https://gerrit.libreoffice.org/78584
Tested-by: Jenkins
Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
|
|
auto-rewrite with <https://gerrit.libreoffice.org/#/c/47798/> "Enable
loplugin:cstylecast for some more cases" plus
solenv/clang-format/reformat-formatted-files
Change-Id: I7d89b011464ba5d2dd12e04d5fc9f65cb4daebde
|
|
Change-Id: I539ca8b9dee5edc5fc2282a2b9b0ffd78bad8b11
|
|
I have kept the old mispelled constant for backwards compatibility
Change-Id: I128a2eec76d00cc5ef058cd6a0c35a7474d2411e
Reviewed-on: https://gerrit.libreoffice.org/39995
Reviewed-by: Chris Sherlock <chris.sherlock79@gmail.com>
Tested-by: Chris Sherlock <chris.sherlock79@gmail.com>
|
|
ie.
void f(void);
becomes
void f();
I used the following command to make the changes:
git grep -lP '\(\s*void\s*\)' -- *.cxx \
| xargs perl -pi -w -e 's/(\w+)\s*\(\s*void\s*\)/$1\(\)/g;'
and ran it for both .cxx and .hxx files.
Change-Id: I314a1b56e9c14d10726e32841736b0ad5eef8ddd
|
|
Change-Id: Ie54d340478412e62b87d66e287fd8a3963e97898
|
|
Signed-off-by: Riccardo Magliocchetti <riccardo.magliocchetti@gmail.com>
Signed-off-by: Stephan Bergmann <sbergman@redhat.com>, undid one remove that was
detrimental to loplugin:unreffun
Change-Id: I18d8252084d828f94ef7a954e1dbfb45743d7970
|
|
Change-Id: I76be464200d486efef9c8a7e957c310c9adae3b8
|
|
Patch contributed by Herbert Duerr:
#i118662# remove berkeleyDB from module xmlhelp (author=orwitt)
http://svn.apache.org/viewvc?view=revision&revision=1213188
#i119141# remove ISCII converter for now
http://svn.apache.org/viewvc?view=revision&revision=1306246
make exceptions for cppunittester verbose
http://svn.apache.org/viewvc?view=revision&revision=1174831
Patches contributed by Pedro Giffuni:
Avoid some uses of non portable #!/bin/bash in shell scripts.
http://svn.apache.org/viewvc?view=revision&revision=1235297
Patch contributed by Oliver-Rainer Wittmann
88652: applied patch, remove unicows deps
http://svn.apache.org/viewvc?view=revision&revision=1177585
drop OS/2 code, remove in-line assembler ARM atomics,
and obsolete armarch header.
|
|
|
|
|
|
|