summaryrefslogtreecommitdiff
path: root/sal
AgeCommit message (Collapse)Author
2021-05-18tdf#126742 make Windows file handling more unx-likeArmin Le Grand (Allotropia)
The bug mentioned happens due to a system-dependent difference: Unx-systems allow files to be opened for write multiple times while our windows implementation until now did prevent that. For that reason an embedded OLE which is still opened in the same LO instance behaves wrong/strange - the e.g. changed size cannot be written (to the file). Since we already have unx-like handling and in that scenario useful sync has to be done anyways, no new scenario will be created. Only Windows implemenation will change to behave closer to unx-like behaviour, I already test-built that on gerrit to make sure all tests for Windows work as before. I thought about this for quite some time, but see no too big risk. For thoughts/discussion please refer to the task. Change-Id: I8dbfd70c2f69d0a013f445e152e597f37fa6ecc7 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/112237 Tested-by: Jenkins Reviewed-by: Armin Le Grand <Armin.Le.Grand@me.com> (cherry picked from commit 2b4cd99d3360ccffb9829a02412824864d045753) Reviewed-on: https://gerrit.libreoffice.org/c/core/+/112428 Tested-by: Thorsten Behrens <thorsten.behrens@allotropia.de> Reviewed-by: Thorsten Behrens <thorsten.behrens@allotropia.de>
2020-05-05Fix problems when running a sandboxed LO as X.app/Contents/MacOS/sofficeTor Lillqvist
The argv[0] passed to osl_setCommandArgs will then be the relative path and osl::realpath() will fail. Instead, use bootstrap_getExecutableFile() which calls _NSGetExecutablePath() to get the executable's pathname for g_command_args. Change-Id: I1345afe158d7b64871f6340733fb5490d5ca6bd8 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/93438 Tested-by: Jenkins Reviewed-by: Miklos Vajna <vmiklos@collabora.com>
2020-01-31Adapt CPPUNIT_ASSERT to C++20 deleted ostream << for sal_Unicode (aka char16_t)Stephan Bergmann
<http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r3.html> "char8_t backward compatibility remediation", as implemented now by <https://gcc.gnu.org/ git/?p=gcc.git;a=commit;h=0c5b35933e5b150df0ab487efb2f11ef5685f713> "libstdc++: P1423R3 char8_t remediation (2/4)" for -std=c++2a, deletes operator << overloads that would print an integer rather than a (presumably expected) character. But for simplicity (and to avoid issues with non-printing characters), keep printing an integer here. Change-Id: I751b99ee32d418eb488131ffa130d6f7d6d38dc7 Reviewed-on: https://gerrit.libreoffice.org/84348 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com> (cherry picked from commit 5d8f0fad50f90195a11873c70ddab4644f5839ea) Reviewed-on: https://gerrit.libreoffice.org/c/core/+/87760 Reviewed-by: Caolán McNamara <caolanm@redhat.com>
2019-12-03WIN accept single-backslash file URIsJan-Marek Glogowski
From all I could find, a single (back-)slash file URI is fine as a local file path. This includes the commit "WIN enable NoAuthority test" (cherry picked from commit f9fd9d4cd4f792cd4ec8e14df78f3193653dae67) Change-Id: I75e95c809894cdef88f708d0477cb98eb114a107 Reviewed-on: https://gerrit.libreoffice.org/83837 Tested-by: Jenkins Reviewed-by: Jan-Marek Glogowski <glogow@fbihome.de> (cherry picked from commit 895cd72158fc8a455f705764ae4ae000b933eba4) Reviewed-on: https://gerrit.libreoffice.org/83952
2019-11-20ofz#19010 wrong start of rangeCaolán McNamara
Change-Id: Ibf97a830932d3f153b99031abc8c4a00b54cedab Reviewed-on: https://gerrit.libreoffice.org/83266 Reviewed-by: Stephan Bergmann <sbergman@redhat.com> Tested-by: Jenkins
2019-11-12Silence -Werror=sign-compare (--enable-cipher-openssl-backend)Stephan Bergmann
as found by <https://ci.libreoffice.org/job/lo_tb_random_config_linux/2039/> Change-Id: Ie698b7905bd2f25e74791f91f586479d1fc473dc Reviewed-on: https://gerrit.libreoffice.org/82488 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-11-11simplify OUString constructionNoel Grandin
Change-Id: Ib91d77c578aa21af02beaf299d16b5d8c2942f00 Reviewed-on: https://gerrit.libreoffice.org/82447 Tested-by: Jenkins Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk>
2019-10-31tdf#125688 speed up load of change-tracking ODSNoel Grandin
by 10%, by avoiding an OUString construction in a hot path through XMLTextColumnContext_Impl::XMLTextColumnContext_Impl -> sax::Convert::convertNumber Also changed XMLTextAnimationStepPropertyHdl::importXML to take advantage of the modified convertNumber passing convention. Change-Id: I4e5503dbb094c88a09af8b6dc8c22b6c53f9eb75 Reviewed-on: https://gerrit.libreoffice.org/81726 Tested-by: Jenkins Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk>
2019-10-28loplugin:stringadd improve detectionNoel Grandin
if one side of the expression is a compile-time-constant, we don't need to worry about side-effects on the other side Change-Id: Iee71ea51b327ef244bf39f128f921ac325d74e2b Reviewed-on: https://gerrit.libreoffice.org/81589 Tested-by: Jenkins Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk>
2019-10-23-Werror,-Wdeprecated-volatile (clang-cl)Stephan Bergmann
same as b89187aad86e2be000d2f4c9c380a95bf8430c2e "Simplify forced memory reads" in sal/osl/unx/file.cxx Change-Id: I31edbc72f88895e148609498d367a50e38723b11 Reviewed-on: https://gerrit.libreoffice.org/81408 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-10-23Reinstate CppunitTest_sal_rtlStephan Bergmann
...which had been dropped by 0f874472c672175135520101837ff0c9d4701d7f "size some stringbuffer to prevent re-alloc", presumably by accident Change-Id: I3b84e743c1adcdd8518114810dcee5c3f12c4290 Reviewed-on: https://gerrit.libreoffice.org/81388 Tested-by: Jenkins Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk>
2019-10-21size some stringbuffer to prevent re-allocNoel Grandin
found by the simple expidient of putting asserts in the resize routine. Where an explicit const size is used, I started with 32 and kept doubling until that site did not need resizing anymore. Change-Id: I998787edc940d0a3ba23b5ac37131ab9ecd300f4 Reviewed-on: https://gerrit.libreoffice.org/81138 Tested-by: Jenkins Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk>
2019-10-20Replace some uses of OUStringChar with string literalsStephan Bergmann
(At least MSVC++ 14.14, aka Visual Studio 2017 version 15.7, apparently requires `"\xDFFF"` to be written with a `u` prefix in the concatenated string literal u"\xD800" "\U000103FF" "\xDFFF" "A" to avoid "error C2022: '57343': too big for character", so prefix all the individual string literals in such concatenations, even if that should be redundant.) Change-Id: Ief69e6c7ae71fe2c4c9c56c38fab0bc782ceb82c Reviewed-on: https://gerrit.libreoffice.org/81142 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-10-18cid#1448512 silence Out-of-bounds accessCaolán McNamara
Change-Id: I3e5efb9be471814b5dabf501feb93a65e9b5bcd3 Reviewed-on: https://gerrit.libreoffice.org/81022 Tested-by: Jenkins Reviewed-by: Caolán McNamara <caolanm@redhat.com> Tested-by: Caolán McNamara <caolanm@redhat.com>
2019-10-17Remove some memset callsMike Kaganski
Replace them with default initialization or calloc Change-Id: I747f53c2ced2d0473fd5a5ede4f8520a0633dcc1 Reviewed-on: https://gerrit.libreoffice.org/80805 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-10-17Rename OUStringLiteral1 to OUStringCharStephan Bergmann
It started out as a wrapper around character literals, but has by now become a wrapper around arbitrary single characters. Besides updating the documentation, this change is a mechanical for i in $(git grep -Fl OUStringLiteral1); do sed -i -e s/OUStringLiteral1/OUStringChar/g "$i"; done Change-Id: I1b9eaa4b3fbc9025ce4a4bffea3db1c16188b76f Reviewed-on: https://gerrit.libreoffice.org/80892 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-10-16loplugin:stringadd look through a couple more known-good methodsNoel Grandin
Change-Id: Ifbdb3e41eae665f7dcaf5301aaba2b6e4662cf48 Reviewed-on: https://gerrit.libreoffice.org/80855 Tested-by: Jenkins Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk>
2019-10-15new loplugin:bufferaddNoel Grandin
look for OUStringBuffer append sequences that can be turned into creating an OUString with + operations Change-Id: Ica840dc096000307b4a105fb4d9ec7588a15ade6 Reviewed-on: https://gerrit.libreoffice.org/80809 Tested-by: Jenkins Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk>
2019-10-14Fix misuse of OStringLiteralStephan Bergmann
...(which got introduced with 9b5dad13b56bdde7c40970351af3da3a2c3c9350 "loplugin:stringadd look for unnecessary temporaries", and had reportedly broken CppunitTest_sc_ucalc on tml's Windows build by hitting the "strlen( str ) == N - 1" assert at include/rtl/string.hxx:1867), by introducing rtl::OStringView (and rtl::OUStringView, for consistency). Change-Id: I766b600274302ded66a6bffc91be189b20ed1ac3 Reviewed-on: https://gerrit.libreoffice.org/80778 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-10-14New loplugin:getstrStephan Bergmann
...to find matches of ... << s.getStr() (for the rtl string classes) that can be written as just ... << s Some notes: * The OUStringToOString(..., RTL_TEXTENCODING_UTF8) is left explicit in desktop/source/app/crashreport.cxx (even though that would also be done internally by the "<< OUString" operator) to clarify that these values are written out as UTF-8 (and not as what that operator << happens to use, which just also happens to be UTF-8). * OUSTRING_TO_CSTR (include/oox/helper/helper.hxx) is no longer used now. * Just don't bother to use osl_getThreadTextEncoding() in the SAL_WARN in lingucomponent/source/hyphenator/hyphen/hyphenimp.cxx. * The toUtf8() in the SAL_DEBUG in pyuno/source/module/pyuno_module.cxx can just go, too. Change-Id: I4602f0379ef816bff310f1e51b57c56b7e3f0136 Reviewed-on: https://gerrit.libreoffice.org/80762 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-10-14loplugin:stringadd look for unnecessary temporariesNoel Grandin
which defeat the *StringConcat optimisation. Also make StringConcat conversions treat a nullptr as an empty string, to match the O*String(char*) constructors. Change-Id: If45f5b4b6a535c97bfeeacd9ec472a7603a52e5b Reviewed-on: https://gerrit.libreoffice.org/80724 Tested-by: Jenkins Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk>
2019-10-11round out StringConcat helpers with sal_Unicode* overloadsNoel Grandin
so we can construct efficient expressions when we have pointers to unicode data Also lightly reformat a couple of the older helpers to make it easier to compare the different helpers. Change-Id: Ib8a4227714e9218512b6871d3285e4e2703bec3b Reviewed-on: https://gerrit.libreoffice.org/80639 Tested-by: Jenkins Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk>
2019-10-10DestFileExists should be boolStephan Bergmann
Change-Id: I5803aa2498654c579f9fe6293e5204aa63edd589 Reviewed-on: https://gerrit.libreoffice.org/80607 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-10-09cid#1453854 silence Time of check time of useCaolán McNamara
Change-Id: Icfa358476db3166c29e893c09ec943aa3c38dba3 Reviewed-on: https://gerrit.libreoffice.org/80520 Tested-by: Jenkins Reviewed-by: Caolán McNamara <caolanm@redhat.com> Tested-by: Caolán McNamara <caolanm@redhat.com>
2019-10-07cid#1448512 silence bogus Out-of-bounds accessCaolán McNamara
Change-Id: I6febe3d48fc9018b373a940d88d2afeefad7502c Reviewed-on: https://gerrit.libreoffice.org/80355 Tested-by: Jenkins Reviewed-by: Caolán McNamara <caolanm@redhat.com> Tested-by: Caolán McNamara <caolanm@redhat.com>
2019-10-01loplugin:data (clang-cl)Stephan Bergmann
Change-Id: Ib8b2bc1c5f7b27a646036ce23cae2b6a06edd038 Reviewed-on: https://gerrit.libreoffice.org/79922 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-10-01loplugin:simplifyconstruct (clang-cl)Stephan Bergmann
Change-Id: I08da288a88c2bce1d4250ec77f17bd483e6bc09c Reviewed-on: https://gerrit.libreoffice.org/79911 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-10-01loplugin:stringadd in package..saxNoel Grandin
Change-Id: I1f8b626ae99bca6e31e7c4aa9c8a1fc016b76e5c Reviewed-on: https://gerrit.libreoffice.org/79890 Tested-by: Jenkins Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk>
2019-09-29constmethod for accessor-type methodsNoel Grandin
Apply the constmethod plugin, but only to accessor-type methods, e.g. IsFoo(), GetBar(), etc, where we can be sure of that constifying is a reasonable thing to do. Change-Id: Ibc97f5f359a0992dd1ce2d66f0189f8a0a43d98a Reviewed-on: https://gerrit.libreoffice.org/74269 Tested-by: Jenkins Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk>
2019-09-24support O(U)String::number() for fast string concatenationLuboš Luňák
When I did the fast string concatenation, I didn't add any support for number(), which simply returned a O(U)String, and so it did the extra allocation/deallocation, although that could be avoided. In order to support this, number() now returns a special temporary return type, similarly to O(U)StringConcat, which allows delaying the concatenation the same way. Also similarly, the change of the return type in some cases requires explicit cast to the actual string type. Usage of OString::getStr() is so extensive in the codebase that I actually added it to the helper class, after that it's only relatively few cases. Change-Id: Iba6e158010e1e458089698c426803052b6f46031 Reviewed-on: https://gerrit.libreoffice.org/78873 Tested-by: Jenkins Reviewed-by: Luboš Luňák <l.lunak@collabora.com>
2019-09-23do not require $(SRCDIR) in every gb_Library_set_precompiled_headerLuboš Luňák
Change-Id: I7b3a22584bb2e4d501f509ffcd80929feed23a4c Reviewed-on: https://gerrit.libreoffice.org/79360 Tested-by: Jenkins Reviewed-by: Luboš Luňák <l.lunak@collabora.com>
2019-09-17Better handling of non--UTF-8 filesystem pathnames in sal/osl/unx/Stephan Bergmann
The idea is to internally in sal/osl/unx/ use OString instead of OUString to represent pathnames, so that the OString carries the actual bytes that make up the pathname. At the boundary of translating between pathname OStrings and file URL OUStrings, translate sequences of bytes that are valid according to osl_getThreadTextEncoding() into UTF-8 and translate other bytes into individual (percent-encoded) bytes in the file URL. This change required duplicating some of the internal functionality in sal/osl/unx/ for both OString and OUString, and to make part of sal/rtl/uri.cxx accessible from sal/osl/unx/ via new sal/inc/uri_internal.hxx. Change-Id: Id1ebaebe9e7f2d21f350f6b1a07849edee54331f Reviewed-on: https://gerrit.libreoffice.org/78798 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-09-16Simplify forced memory readsStephan Bergmann
Change-Id: I68ea0a46bcaaadb455f2f2cc6e53950e2f26a763 Reviewed-on: https://gerrit.libreoffice.org/79003 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-09-16-Werror=volatile (GCC 10 trunk)Stephan Bergmann
"error: compound assignment with ‘volatile’-qualified left operand is deprecated" in C++20 mode Change-Id: I62825237a2f4caf359f5f116ab4097ae6b9376e6 Reviewed-on: https://gerrit.libreoffice.org/78975 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-09-11Fix Unicode to Shift JIS/MS932 conversion dataStephan Bergmann
These are MS932 extensions, and per <https://www.unicode.org/Public/MAPPINGS/VENDORS/MICSFT/WINDOWS/CP932.TXT> ("Table version: 2.01", "Date: 04/15/98"), U+4F92 is a mapping for 0xFA6F (and also for 0xED53, which is also an MS932 extension, and "loses" here), and U+4F9A is a mapping for 0xFA71 (and also for 0xED55, which is also an MS932 extension, and "loses" here). (And neither U+4F92 nor U+4F9A appear as mappings in <https://www.unicode.org/Public/MAPPINGS/OBSOLETE/EASTASIA/JIS/SHIFTJIS.TXT>, "Table version: 2.0", "Date: 2011 October 14 (header updated: 2015 December 02)".) This appears to be a typo dating back to 9399c662f36c385b0c705eb34e636a9aec450282 "initial import". Change-Id: I0c699675355d839e62d6e4082355a2d67472533e Reviewed-on: https://gerrit.libreoffice.org/78720 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-09-10Fix typosAndrea Gelmini
Change-Id: I79f87f033eeb67d1750bb595d311d74ef3db6ce9 Reviewed-on: https://gerrit.libreoffice.org/78795 Reviewed-by: Julien Nabet <serval2412@yahoo.fr> Tested-by: Jenkins
2019-09-10tdf#127069 sal: preserve gid of files in the unx osl_replaceFile()Miklos Vajna
The w32 implementation preserves all attributes of the destination file, the unx one preserved none of them. Bring the unx osl_replaceFile() closer to the w32 by preserving the gid of the destination file as a start. [ No testcase, we support building on systems where the user is part of a single group only, and it's not possible to verify the effect of this change in such environments. ] Change-Id: I722d4802df34caf71a9dc0db1a3df8b76acb9de6 Reviewed-on: https://gerrit.libreoffice.org/78789 Tested-by: Jenkins Reviewed-by: Miklos Vajna <vmiklos@collabora.com>
2019-09-06Fix typo in comment (ASCII 0x42 is "B")Stephan Bergmann
Change-Id: Iba8411cede4dc47aaa1d9d433de2606c0d66e0bf Reviewed-on: https://gerrit.libreoffice.org/78692 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-09-06Fixing "...."Andrea Gelmini
Change-Id: I3424e17cfdfb563fdc5882942031deafae8689fe Reviewed-on: https://gerrit.libreoffice.org/78678 Tested-by: Jenkins Reviewed-by: Julien Nabet <serval2412@yahoo.fr>
2019-09-05Clean up {osl_,osl::}systemPathEnsureSeparator comboStephan Bergmann
Change-Id: Iafa953725c1ca8e6f3032945dc0700ae989519b9 Reviewed-on: https://gerrit.libreoffice.org/78671 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-09-05Clean up {osl_,osl::}systemPathMakeAbsolutePath comboStephan Bergmann
Change-Id: Iec4c2ff8c8239069f95fff195c49fac9f7c865d4 Reviewed-on: https://gerrit.libreoffice.org/78656 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-09-05Use OUString in osl_getNextDirectoryItemStephan Bergmann
Change-Id: Ifa1491a1af1d3c74d84ec4d6bec79fcf7a5d6bf4 Reviewed-on: https://gerrit.libreoffice.org/78653 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-09-05Fix osl_systemPathEnsureSeparator preconditionStephan Bergmann
Change-Id: I0165a14f159a6c2c7bce84d1ca646435146d1da0 Reviewed-on: https://gerrit.libreoffice.org/78643 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-09-05Let osl_systemPathEnsureSeparator directly take an OUStringStephan Bergmann
Change-Id: Ia9505298fe92d62d716e2c28ac0a5098c4b61121 Reviewed-on: https://gerrit.libreoffice.org/78642 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-09-05Fix conversion of U+0000 in ImplUnicodeToDBCSStephan Bergmann
...which appears to have been broken when 13824735057ef25075af8fd0ddb8f14e34c7eeb6 "#81346# - Fix for unconverted characters for DBCS encodings" moved that "if" out of surrounding "if" block. (And, for consistency, write the "if" check in the same way as the preceding one is written since 739cb04c36524c5a1bbf768dfe93624a1b2ec8b4 "#97705# Fixed mapping of Big5 EUDC points.") Change-Id: I4324197c4eba671ab6313fb89f988da102b8ffa5 Reviewed-on: https://gerrit.libreoffice.org/78627 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-09-04Do not exclude Unicode noncharacters from rtl_convertUnicodeToTextStephan Bergmann
For one, that broke round-tripping with e.g. UTF-8 (see the test case added to Test::testComplex in sal/qa/rtl/textenc/rtl_textcvt.cxx) which did not treat noncharacters as invalid. For another, <https://unicode.org/faq/private_use.html#nonchar7> is meanwhile quite clear on the matter: "Q: Are noncharacters prohibited in interchange? "A: This question has led to some controversy, because the Unicode Standard has been somewhat ambiguous about the status of noncharacters. The formal wording of the definition of 'noncharacter' in the standard has always indicated that noncharacters 'should never be interchanged.' That led some people to assume that the definition actually meant 'shall not be interchanged' and that therefore the presence of a noncharacter in any Unicode string immediately rendered that string malformed according to the standard. But the intended use of noncharacters requires the ability to exchange them in a limited context, at least across APIs and even through data files and other means of 'interchange', so that they can be processed as intended. The choice of the word 'should' in the original definition was deliberate, and indicated that one should not try to interchange noncharacters precisely because their interpretation is strictly internal to whatever implementation uses them, so they have no publicly interchangeable semantics. But other informative wording in the text of the core specification and in the character names list was differently and more strongly worded, leading to contradictory interpretations. "Given this ambiguity of intent, in 2013 the UTC issued Corrigendum #9, which deleted the phrase 'and that should never be interchanged' from the definition of noncharacters, to make it clear that prohibition from interchange is not part of the formal definition of noncharacters. Corrigendum #9 has been incorporated into the core specification for Unicode 7.0. "Q: Are noncharacters invalid in Unicode strings and UTFs? "A: Absolutely not. Noncharacters do not cause a Unicode string to be ill-formed in any UTF. This can be seen explicitly in the table above, where every noncharacter code point has a well-formed representation in UTF-32, in UTF-16, and in UTF-8. An implementation which converts noncharacter code points between one UTF representation and another must preserve these values correctly. The fact that they are called 'noncharacters' and are not intended for open interchange does not mean that they are somehow illegal or invalid code points which make strings containing them invalid." Change-Id: I4fcc0156e3d2fd305a7c7bb0c7b3dbef846c9e64 Reviewed-on: https://gerrit.libreoffice.org/78598 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-09-04[API CHANGE] rtl_convertTextToUnicode behavior upon erroneous inputStephan Bergmann
<http://udk.openoffice.org/cpp/man/spec/textconversion.html> specifies that FLAGS_UNDEFINED_ERROR, FLAGS_MBUNDEFINED_ERROR, and FLAGS_INVALID_ERROR: "Read past the [erroneous] code in the input buffer [...]" But actual behavior of rtl_convertTextToUnicode for the various rtl_TextEncoding values has been inconsistent. Some erroneous input (mostly single-byte UNDEFINED and INVALID ones) has not been consumed at all, some (multi-byte MBUNDEFINED and INVALID) has been consumed partly, and some has been consumed fully as required. However, at least since 8dd4265b9ddbd7786b6237676909eae5b540da0e "CWS-TOOLING: integrate CWS hb18", Custom8BitToUnicode in sw/source/filter/ww8/ww8par.cxx appears to rely on the broken behavior of not consuming erroneous input. (It reads the chunk of valid input with e.g. some RTL_TEXTENCODING_MS_125x that happens to exhibit the broken behavior of not consuming erroneous input, then wants to try to re-read the erroneous input with RTL_TEXTENCODING_MS_1252. For example, opening sw/qa/core/data/ww8/pass/forcepoint50-grfanchor-1.doc triggers that code. For whatever reason, the am_faksas.dot attached to <https://bz.apache.org/ooo/show_bug.cgi?id=9240#c1> "Do not show lithuanian letter 'Š'" appears to not, or at least no longer, trigger that code.) Therefore, it would be useful to have a mode in which rtl_convertTextToUnicode does not consume erroneous input. (And I plan on doing changes in sal/osl/unx/file* that would benefit from that behavior, too.) But changing rtl_convertTextToUnicode to generally not consume erroneous input would not be feasible: If calls do not set RTL_TEXTTOUNICODE_FLAGS_FLUSH, part of an erroneous input can already have been consumed by a previous call, so the current call cannot undo that. But a change that looks like it can work is to change the behavior only if RTL_TEXTTOUNICODE_FLAGS_FLUSH is set. In that case we can at least not consume the part of an erroneous input that has not yet been consumed by a previous call (which would necessarily have been done with RTL_TEXTTOUNICODE_FLAGS_FLUSH unset). The expecation is that code that relies on the don't-consume behavior will do only single calls with RTL_TEXTTOUNICODE_FLAGS_FLUSH set (so reliably not consume the complete erroneous input), while other code (which might do calls in a loop) will not care whether erroneous input has been consumed, anyway. This can be considered a mild form of behavioral API CHANGE (but note that the old implementation didn't exhibit the requested behavior anyway). So all implementations of rtl_convertTextToUnicode for the various rtl_TextEncoding values have been adapted to the new behavior. The only exceptions are ImplDummyToUnicode (sal/textenc/textcvt.cxx), which is a special case anyway used by RTL_TEXTENCODING_DONTKNOW, and two out of three places (marked with a "TODO" each) in ImplUTF7ToUnicode (sal/textenc/tcvtutf7.cxx), where it is hard to retrofit the expected behaivor, and RTL_TEXTENCODING_UTF7 is probably not relevant for the use cases relying on the don't-consume--behavior, anyway. Whether a similar change should be done for rtl_convertUnicodeToText can be examined later. Change-Id: I1ac2c4cfd99e2a0eca219f9a3855ef110b254855 Reviewed-on: https://gerrit.libreoffice.org/78584 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-09-03Fix handling of invalid bytes >= 0x80 in ImplUTF7ToUnicodeStephan Bergmann
Change-Id: I08838f9ae34a31712d7269ddaaee3fe59ece2178 Reviewed-on: https://gerrit.libreoffice.org/78562 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2019-09-01Resolves: ofz#16898 Direct-leak in rtl_uString_ImplAllocCaolán McNamara
Change-Id: I7bc11108790f8d87396bad3a2c5c2280f8f7d59a Reviewed-on: https://gerrit.libreoffice.org/78369 Tested-by: Jenkins Reviewed-by: Caolán McNamara <caolanm@redhat.com> Tested-by: Caolán McNamara <caolanm@redhat.com>
2019-08-30Blind fix for Android, take twoStephan Bergmann
After 1928ced074260d2d40345bdf4c96767abb99bb4f "Blind fix for Android", tb24 still fails with > Linking obj/local/armeabi-v7a/liblo-native-code.so > /home/android/lo/master-android-arm/instdir/program/libsofficeapp.a(sofficemain.o):sofficemain.cxx:function soffice_main: error: undefined reference to ´sal_detail_initialize´ > clang++: error: linker command failed with exit code 1 (use -v to see invocation) Lets see if including sal/osl/unx/salinit.cxx in ANDROID/iOS builds works (even if the contents is not normally used there, including it should be harmless). Change-Id: Ifa38af8f5217a17d3ac74851b46bdb3b50cd4efd Reviewed-on: https://gerrit.libreoffice.org/78325 Tested-by: Jenkins Reviewed-by: Stephan Bergmann <sbergman@redhat.com>