summaryrefslogtreecommitdiff
path: root/writerfilter
AgeCommit message (Collapse)Author
2017-11-09tdf#112352 ooxmlimport: ALWAYS treat 1st nextpage w/cols as contJustin Luth
fix 5.4 regression from 4605bd46984125a99b0e993b71efa6edb411699f. When there are columns, if a nextpage section doesn't contain any other "page style" details we treat it as a continuous break, If we don't, the column info becomes part of the style itself, and not just a section property. However, the very first section is troublesome - by definition it DOES contain page style details, and so if the document starts with columns, the default style would gain the column attribute. Usually that results in a mess, so lets make sure that we avoid that also in the case where headers/footers are defined. Change-Id: I7e08a9218e4304206579ed064bc92c9604d4470e Reviewed-on: https://gerrit.libreoffice.org/44505 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Justin Luth <justin_luth@sil.org> Tested-by: Justin Luth <justin_luth@sil.org>
2017-11-07tdf#113550 RTF import: fix incorrect text indentMiklos Vajna
Left indent was set to non-zero in the style, but direct formatting set it back to zero. Teach deduplication to remove the NS_ooxml::LN_CT_PPrBase_ind SPRM itself in case the last attribute was removed. Change-Id: I01b202f0241b02816b2b392326737b1150caffc2 Reviewed-on: https://gerrit.libreoffice.org/44385 Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> Tested-by: Jenkins <ci@libreoffice.org>
2017-11-06sw: prefix members of SwScannerMiklos Vajna
Change-Id: I441876e73793e07d78f1eadb2b21c282845298c3 Reviewed-on: https://gerrit.libreoffice.org/44345 Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> Tested-by: Jenkins <ci@libreoffice.org>
2017-11-06loplugin:constparams in various(2)Noel Grandin
Change-Id: I533a7eb724b15e168a28dc92cd5962a39bc96e7c Reviewed-on: https://gerrit.libreoffice.org/44313 Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk> Tested-by: Noel Grandin <noel.grandin@collabora.co.uk>
2017-11-03Only downcast to OOXMLFastContextHandlerShape when actually necessaryStephan Bergmann
After bd3c5c4c234e3dc6b89cd235321945a41a08d562 "[API CHANGE] tdf#65393 Import signature line images from ooxml", UBSan CppunitTest_chart2_export had started to fail with > writerfilter/source/ooxml/OOXMLFastContextHandler.cxx:1898:25: runtime error: downcast of address 0x61200070a440 which does not point to an object of type 'writerfilter::ooxml::OOXMLFastContextHandlerShape' > 0x61200070a440: note: object is of type 'writerfilter::ooxml::OOXMLFastContextHandlerWrapper' > 0e 10 00 20 50 86 4a 00 a2 7f 00 00 01 00 00 00 be be be be 00 00 00 00 00 00 00 00 00 00 00 00 > ^~~~~~~~~~~~~~~~~~~~~~~ > vptr for 'writerfilter::ooxml::OOXMLFastContextHandlerWrapper' Change-Id: I028ef619766466e8cd9bb0ca09174b926fc6d23c
2017-11-03[API CHANGE] tdf#65393 Import signature line images from ooxmlSamuel Mehrbrodt
showing whether the signature behind the signature line is valid or not. Change-Id: Ia6cca62812019f26d55d234cac767a9b4b7c8175 Reviewed-on: https://gerrit.libreoffice.org/40980 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Samuel Mehrbrodt <Samuel.Mehrbrodt@cib.de>
2017-11-02sw: ODF import: default as-char shapes to vertical-pos="top"Michael Stahl
The problem is that we don't render ShapesWithWrapping.odt the same as Word does: https://beta.opendocumentformat.org/rendercompare/upload/223/86/191/1 The first shape in the file is anchored "as-char" and has no style:vertical-rel or style:vertical-pos attribute affecting it. If Word would write either style:vertical-rel="baseline" or style:vertical-pos="top" explicitly, the rendering in LO would be the same. So the problem is that, for drawing shapes (note, text frames are images, embedded objects handled differently), LO's default vertical alignment is different, it is hard-coded in SwShapeDescriptor_Impl::GetVOrient() as SwFormatVertOrient(0, text::VertOrientation::NONE, text::RelOrientation::FRAME) This effectively positions as-char shapes *below* the baseline, which, while technically allowed, isn't really a good default. So fix this by making the default alignment dependent on the anchor type, so that as-char shapes sit on top of the baseline. The ODF filter sets the anchor type before inserting the shape in XMLTextShapeImportHelper::addShape(), however as it turns out the various MSO filters insert the shape before setting the anchor, which means the new default in SwXShape has an unwanted effect on them, as inserting the shape causes the default to be created. This requires changes to VML import to always set the VertOrient property, and to RTF import to set the anchor type before inserting. The DrawingML import is unaffected as it already sets VertOrient for every non-as-char shape. The testDmlTextshape "dml-textshape.docx" test still fails, but it turns out that the change in alignment for this test document is a bugfix, as it now has the same vertical alignment as in Word, so adapt the test. Change-Id: Ifcabd96a037515f7803f5474ec995f968b3b4de1
2017-10-31tdf#113408 RTF import style dedup: separate paragraph and character handlingMiklos Vajna
The problem was that the paragraph in question had no left margin, while it should have one. The reason for this is that the style deduplication logic took both the current paragraph and character style, but the direct formatting only contained character formatting, so it tried to emit the default values for all paragraph formatting. This started to show up after commit 657c6cc3acec0528209a8584b838cd6de581c437 (tdf#104228 RTF import: fix override of style left/right para margin, 2016-12-13), but the root cause is much older, it was there since commit 321d7ec2071472b3765a00806715e7ad9f8a306f (fdo#82078 RTF import: fix bold text spilling over to non-bold text, 2014-09-06). Change-Id: If03240a85cc9de89afe9111c2d29de2672e407bf Reviewed-on: https://gerrit.libreoffice.org/44097 Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> Tested-by: Jenkins <ci@libreoffice.org>
2017-10-31tdf#82065 strict docx import: add support for LN_CT_Ind_startJustin Luth
Change-Id: Iaddd9e852388f5bb076c4bc6f8eee8a256581033 Reviewed-on: https://gerrit.libreoffice.org/43985 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Justin Luth <justin_luth@sil.org>
2017-10-26related tdf#78508 and n#793262: import w:tcMar_start/endJustin Luth
Although 2013 commit 60ec497e0e91354a616978be531d15d3efa3f559 added support for the other tcMar items, it omitted _start and _end (perhaps because they caused unit test failures). The document in bug 78508 proves that these are needed. Testing whether the cell spacing matches the default table spacing should occur before adjusting for MSO compatibility. This fixes the three unit tests that mysteriously failed when adding _start/_end support. Unfortunately, these two fixes could not be committed separately - the unit test fails unless both parts are included. I couldn't figure out why. Change-Id: I9507da48b629b9618c5ee790bf0088ce82fc5692 Reviewed-on: https://gerrit.libreoffice.org/43432 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk>
2017-10-26tdf#108947 - Fixed regressionThomas Beck
Handled Header/Footer that are specifically for Left/Right pages the old way again. Fix done previously was too much. Change-Id: I0f9e8d23022300a06bd3fb45054cca1b03cf096f Reviewed-on: https://gerrit.libreoffice.org/43749 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Bartosz Kosiorek <gang65@poczta.onet.pl>
2017-10-24Fix typoAndrea Gelmini
Change-Id: I143e8df0e16ad921777b9caabde8e1c3f8bd61df Reviewed-on: https://gerrit.libreoffice.org/43788 Reviewed-by: Adolfo Jayme Barrientos <fitojb@ubuntu.com> Tested-by: Adolfo Jayme Barrientos <fitojb@ubuntu.com>
2017-10-24tdf#113202 RTF import: fix lack of expected contextual spacingMiklos Vajna
Upper, lower and contextual spacing are all stored in SvxULSpaceItem, so if after spacing is set as direct formatting, contextual spacing has to be set directly as well (having it in the paragraph style has no effect). Change-Id: Ie331c7561de7f2f16776a1613717e38fa083a541 Reviewed-on: https://gerrit.libreoffice.org/43735 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk>
2017-10-23loplugin:includeform: writerfilterStephan Bergmann
Change-Id: Ic42219a7d733897ba47b26ffa0659c524798878e
2017-10-23overload std::hash for OUString and OStringNoel Grandin
no need to explicitly specify it anymore Change-Id: I6ad9259cce77201fdd75152533f5151aae83e9ec Reviewed-on: https://gerrit.libreoffice.org/43567 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk>
2017-10-21loplugin:redundantcast handle dynamic_castNoel Grandin
Change-Id: I7855c76e820efce96778b1c19ec71dffcc4b4abb Reviewed-on: https://gerrit.libreoffice.org/43621 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Stephan Bergmann <sbergman@redhat.com>
2017-10-20Resolves: tdf#113230 crash in finishParagraphCaolán McNamara
Change-Id: I94535a51a87be097ff7356edff935877b42c3272 Reviewed-on: https://gerrit.libreoffice.org/43598 Reviewed-by: Caolán McNamara <caolanm@redhat.com> Tested-by: Caolán McNamara <caolanm@redhat.com>
2017-10-19tdf#87533 Fixed initialization of writing mode for paragraphSerge Krot
During parsing of the docx the paragraph without w:bidi should take this value from style or from default paragraph properties, Change-Id: Ie33f0d1cd3551c4053a47e6faf7dcac71765db65 tdf#87533 explicitly set writing mode value based on default properties Change-Id: I3fcf514a901f0630d749ba0ddb6361d6db3ce1b5 Reviewed-on: https://gerrit.libreoffice.org/42895 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Thorsten Behrens <Thorsten.Behrens@CIB.de>
2017-10-19related tdf#87533: handle LN_EG_SectPrContents_bidi correctlySerge Krot
Change-Id: I90d220550d24fb964cf4e528a1f506033f05de95 Reviewed-on: https://gerrit.libreoffice.org/42896 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Thorsten Behrens <Thorsten.Behrens@CIB.de>
2017-10-18Fix typoAndrea Gelmini
Change-Id: Ie29a05fec90c0d81b4a0399505b0a6761dfdef69 Reviewed-on: https://gerrit.libreoffice.org/43463 Reviewed-by: Julien Nabet <serval2412@yahoo.fr> Tested-by: Julien Nabet <serval2412@yahoo.fr>
2017-10-18tdf#109306 ooxmlimport: consider table sizes < 10%Justin Luth
Change-Id: I336d5a498f4f4523e03b1316b7adaca21df4de82 Reviewed-on: https://gerrit.libreoffice.org/43385 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Justin Luth <justin_luth@sil.org>
2017-10-17tdf#79272 ooxmlimport: support strict dxa tblWidthJustin Luth
This patch is also required for tdf#78508. ECMA-376-1:2016 indicates that MeasurementOrPercent is a union of ST_DecimalNumberOrPercent and ST_UniversalMeasure. For the elements that use MeasurementOrPercent, that is represented as 1/50 of a percent or in Twips. This patch adds support for the ST_UniversalMeasure component of the union. Change-Id: I1bac30707f118a3d1f0eab3c27f8dcec96470592 Reviewed-on: https://gerrit.libreoffice.org/43384 Tested-by: Justin Luth <justin_luth@sil.org> Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com> Reviewed-by: Justin Luth <justin_luth@sil.org>
2017-10-17tdf#104079 RTF import: fix handling fields inside TOC fieldsMiklos Vajna
The marker trick is not needed for these, but the paragraph margins are lost when using it, so avoid it. Change-Id: I3fc9644cb85138b5473cb1478196ae8538041fb1 Reviewed-on: https://gerrit.libreoffice.org/43446 Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> Tested-by: Jenkins <ci@libreoffice.org>
2017-10-16Inconsistent block / missing TblBorders_rightJustin Luth
Likely a copy/paste error, imported from OOo. Change-Id: I29713309164170e86f8e793e2f15a601ce2b5da7 Reviewed-on: https://gerrit.libreoffice.org/43431 Reviewed-by: Justin Luth <justin_luth@sil.org> Tested-by: Justin Luth <justin_luth@sil.org>
2017-10-11Fix typoAndrea Gelmini
Change-Id: Iab17008c8cc122176fb51b8766540d59cd681b35 Reviewed-on: https://gerrit.libreoffice.org/43316 Reviewed-by: Julien Nabet <serval2412@yahoo.fr> Tested-by: Julien Nabet <serval2412@yahoo.fr>
2017-10-10tdf#112211 RTF import: fix unwanted direct formatting for other indentsMiklos Vajna
Commit 56a695fddb915bcba13b088b5b2b4e0841d4acbc (tdf#112211 RTF import: fix unwanted direct formatting for left indents, 2017-09-26) fixed left indents, and given that it was a regression fix, left the other indent types untouched. As it has been pointed out in the bug comment, the original bugdoc actually needs the other indent types removed as well, so let's do that. Change-Id: Ia4ea7e2214b7df27536f46b046f90bd703c107be Reviewed-on: https://gerrit.libreoffice.org/43303 Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> Tested-by: Jenkins <ci@libreoffice.org>
2017-10-10tdf#66398 Parse and output permissions for DOCX using bookmarksSerge Krot
Change-Id: Id08998ae775c5f383edc4bf0410d16f88c70dfd6 Reviewed-on: https://gerrit.libreoffice.org/43275 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Thorsten Behrens <Thorsten.Behrens@CIB.de>
2017-10-10tdf#90789: DOCX paragraphs in shapes like frames do not belong to section.Vasily Melenchuk
Change-Id: I60644bd62e2a2ac97a97f0a492b146dc69456cd6 Reviewed-on: https://gerrit.libreoffice.org/43291 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Thorsten Behrens <Thorsten.Behrens@CIB.de>
2017-10-08Make Color a forward declarationChris Sherlock
Change-Id: Ib28833555661b119de8e967b05e3c8691fca826a Reviewed-on: https://gerrit.libreoffice.org/43227 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Chris Sherlock <chris.sherlock79@gmail.com>
2017-10-06related tdf#66398 remove useless breaksSerge Krot
Change-Id: I39caad06bcd645d582c180195a839113759b57a1 Reviewed-on: https://gerrit.libreoffice.org/43159 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Thorsten Behrens <Thorsten.Behrens@CIB.de>
2017-10-06tdf#66398 Import/export docx document protection propertiesSerge Krot
Added: + import/export of all doc protection properties + unit test Change-Id: I7b65cf4f5c7add2a96fef407c243081fcc2b6d8d Reviewed-on: https://gerrit.libreoffice.org/43156 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Thorsten Behrens <Thorsten.Behrens@CIB.de>
2017-10-05crashtesting: crash on import of abi3007-4.rtfCaolán McNamara
which started happending at... commit 56a695fddb915bcba13b088b5b2b4e0841d4acbc Date: Tue Sep 26 09:13:05 2017 +0200 tdf#112211 RTF import: fix unwanted direct formatting for left indents Change-Id: Id3e8c4452238b48495b1014eff14cdaddcb047ab Reviewed-on: https://gerrit.libreoffice.org/43172 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Caolán McNamara <caolanm@redhat.com> Tested-by: Caolán McNamara <caolanm@redhat.com>
2017-10-05writerfilter: consistently use "" and <> in include directivesMike Kaganski
[cpp.include] tells that includes in <> are searched in implementation-defined places; includes in "" are searched in other implementation-defined places, and is unsuccessful, then as if they were in <>. MS VisualStudio IDE uses paths configured for the project for includes in <>, and starts with current file paths for includes in "". So, using <> for includes in current source file's directory missing from configured project paths makes IDE show unsuccessful includes and unknown identifiers. This fixes includes in writerfilter source directory. Change-Id: I0bc1147aa68c305afd0c119418f07b655783a466 Reviewed-on: https://gerrit.libreoffice.org/43138 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Thorsten Behrens <Thorsten.Behrens@CIB.de> Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com>
2017-10-04add << operator for css::uno::ExceptionNoel Grandin
Change-Id: Ia23dafd07133779144965682df3b7125a3214235 Reviewed-on: https://gerrit.libreoffice.org/43046 Reviewed-by: Stephan Bergmann <sbergman@redhat.com> Tested-by: Jenkins <ci@libreoffice.org>
2017-10-04loplugin:finalclasses in writerfilterNoel Grandin
Change-Id: I590de2fd15c630d5ea5e706ce9421ee8bfe19db7 Reviewed-on: https://gerrit.libreoffice.org/43116 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk>
2017-10-03tdf#112507 RTF import: fix too narrow table cellMiklos Vajna
Commit e6ec0794858df1444f43659b568119bf126a90e6 (tdf#104937 RTF import: \trwWidthA is an absolute value, 2017-08-29) changed the handling of the fake empty cell at the end of table rows so that the parameter of the control word is an absolute, not a relative value. Turns out this wasn't correct, the DOCX equivalent of that bugdoc shows that the parameter is a relative value after all. The RTF spec also talks about a "width", which is assumed to be a relative value. So fix that bug in a different way again (by making sure that this additional fake cell contributes to the total width of the table, so column separators are counted correctly), this time without less side-effects. Change-Id: Ic64fd3a6abae8e0398e8e77123f0473d73f0c4b0 Reviewed-on: https://gerrit.libreoffice.org/43063 Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk> Tested-by: Jenkins <ci@libreoffice.org>
2017-10-03new loplugin:blockblockNoel Grandin
Change-Id: I7b68b70fa4c7234e8882f7627026959a596968fd Reviewed-on: https://gerrit.libreoffice.org/43025 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Noel Grandin <noel.grandin@collabora.co.uk>
2017-09-29remove some SAL_WARN in DomainMapper_ImplNoel Grandin
it's obviously not a real problem, because higher up code calls this even if it doesn't intend to use the result, and in the places where it does intend to use the result, it warns again, so this warning is redundant. And it's the 3rd largest number of warnings in our logs. Change-Id: I1a6c40bc99a3252594f87e121a81c661686c5348
2017-09-29Revert "writerfilter: convert loops to range-based-for"Miklos Vajna
This reverts commit 25cd067a82742210793e39708cc1de9ff84692a7, as it broke CppunitTest_sw_ooxmlexport4. The comment above the change suggests that perhaps the usage of indexes was intentional to avoid the usage of invalidated iterators.
2017-09-28writerfilter: convert loops to range-based-forSerge Krot
Change-Id: I424fd1bf8eef7112a8cff54ab46a07bb41596ca5 Reviewed-on: https://gerrit.libreoffice.org/42901 Reviewed-by: Thorsten Behrens <Thorsten.Behrens@CIB.de> Tested-by: Thorsten Behrens <Thorsten.Behrens@CIB.de>
2017-09-27tdf#75757 comphelper: avoid STL inheritance in SequenceAsHashMapMiklos Vajna
Change-Id: I5c7d107a05deb06749b4d04246ba183adfafb14d Reviewed-on: https://gerrit.libreoffice.org/42829 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk>
2017-09-26tdf#112446 ooxmlimport: Orient=NONE when distance is givenJustin Luth
Prior to commit 9920a0bf9d783978cd6f7b97f7528d8aa2571143 the style could only contain the default of NONE. So when a position was specified, it was always paired with HoriOrient == NONE. So it never caused problems until that commit when the Frame's style orientation started overriding the unset paragraph default. When a position is specified, that needs to be paired with an orientation of NONE in order to take effect. Change-Id: Iab0057810270ba708a8855c2ec6db291cef17cfb Reviewed-on: https://gerrit.libreoffice.org/42499 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Miklos Vajna <vmiklos@collabora.co.uk>
2017-09-26tdf#112211 RTF import: fix unwanted direct formatting for left indentsMiklos Vajna
Commit f528f9499bd91b700c549575e88fa102cfffede9 (tdf#106953 RTF import: fix missing paragraph left margin, 2017-05-16) fixed a problem around inheriting indents from numbering styles vs paragraph styles, but it turns out that document was rather special and in general the old behavior was correct. So fix that bug in a different way again, this time without less side-effects. The trick is that in case the level numbers group in a list definition ends with \u59 (instead of an ASCII ';'), then that group is considered to be invalid by Word. RTF import already was aware of this, but it wasn't known that when this invalid state is reached, that also means that the indents of the list level definitions should be ignored. So in general not putting direct formatting on a paragraph is a good thing: that way in case the paragraph style and the number style both has indent infos, then the numbering style wins, and that is what we want -- but in case \u59 appears in the list definition, then the indentation from the numbering style should be ignored. So fix up the tokenizer to import the indentation from list levels in general, ignore it for invalid list levels, and then we can remove the direct formatting from the paragraphs, which fixes this bug and keeps the old one fixed as well. This required fixing up two poor testcases, which tested paragraph properties, but in fact are interested in the real source of indentation, which is now the numbering style. Visually both bugdocs are unchanged. Change-Id: I6390aa870659a8ad02ba5512d84dea34dba29e9f
2017-09-26Map RTF codepage 0 to osl_getThreadTextEncoding()Stephan Bergmann
...instead of RTL_TEXTENCODING_DONTKNOW. (A file actually using that codepage is sw/qa/core/data/rtf/pass/CVE-2014-6357.rtf, processed by CppunitTest/sw_filters_test, where it caused OStringToOUString to be called with RTL_TEXTENCODING_DONTKNOW from writerfilter::rtftok::RTFDocumentImpl::resolveChars.) Change-Id: I41081c5df5c3aa80b4f1c7d52b158e73ef68cf38
2017-09-25RTF import: split this call into simpler onesMiklos Vajna
Hopefully with this it's easier to see which is the usual and which one is the exceptional case. Change-Id: Iac1b49b2a4f2b909db46155d1ff10d2ba99fd655
2017-09-24Map Windows code page 42 to RTL_TEXTENCODING_SYMBOLStephan Bergmann
<https://msdn.microsoft.com/en-us/library/windows/desktop/ dd374130(v=vs.85).aspx> "WideCharToMultiByte function" suggests that there now is CP_SYMBOL, "Windows 2000: Symbol code page (42)." And a little test program on Windows indicates that our RTL_TEXTENCODING_SYMBOL is working the same way as CP_SYMBOL, where MultiByteToWideChar maps 00..1F to U+0000..1F and 20..FF to U+F020..F0FF. At least CppunitTest_writerfilter_rtftok, when testing writerfilter/qa/cppunittests/rtftok/data/pass/EDB-18940-1.rtf, goes into case RTF_FCHARSET in RTFDocumentImpl::dispatchValue (writerfilter/source/rtftok/rtfdispatchvalue.cxx) with nParam matching aRTFEncodings[2] (i.e., a mapping from charset 2 to codepage 42, see writerfilter/source/rtftok/rtfcharsets.cxx), then passes 42 into rtl_getTextEncodingFromWindowsCodePage and obtains an unhelpful RTL_TEXTENCODING_DONTKNOW. testFdo72031 (sw/qa/extras/rtfexport/rtfexport2.cxx, CppunitTest_sw_rtfexport2) needed to be adapted, as the circled plus from the Symbol font is now internally represented as U+F0C5, not (somewhat bogusly) as U+00C5 (aka LATIN CAPTIAL LETTER A WITH RING ABOVE). But, when displayed with the Symobl font, the glyph that is actually shown remains the circled plus. Turns out changing rtl_getTextEncodingFromWindowsCodePage would start to make CppunitTest_sw_rtfimport fail: Sep 20 15:49:24 <sberg> vmiklos, with <https://gerrit.libreoffice.org/#/c/42477/>, testN823675 (sw/qa/extras/rtfimport/rtfimport.cxx) fails, the aFont.Name is not "Symbol"; sw/qa/extras/rtfimport/data/n823675.rtf contains a \fonttbl that specifies \f3 to have \fcharset2 (i.e., symbol font) and fontname "Symbol". However, RTFDocumentImpl::checkUnicode (writerfilter/source/rtftok/rtfdocumentimpl.cxx) converts m_aHexBuffer (containing "Symbol;") with nCurrentEncoding apparently being the encoding specified by \fcharset2 (i.e., now RTL_TEXTENCODING_SYMBOL instead of old RTL_TEXTENCODING_DONTKNOW), so the resulting OUString is garbage (instead of the byte-for-byte conversion to Unicode "Symbol;" that RTL_TEXTENCODING_DONTKNOW would do there); do you know whether such \fonttbl fontnames should actually be interpreted in the given \fcharset? Sep 20 15:49:24 <IZBot> gerrit: »Map Windows code page 42 to RTL_TEXTENCODING_SYMBOL« by Stephan Bergmann for master [NEW] Sep 20 15:51:15 <vmiklos> sberg: let me check if the spec covers that Sep 20 15:54:29 <mst_> sberg: i think the name is typically encoded in the font's encoding but probably they have to make a (likely undocumented) exception for symbol encoding Sep 20 15:57:46 <vmiklos> sberg: the spec only says that \fcharset is about the encoding of the content using that font, i don't see it described what would be the encoding of the font name itself Sep 20 15:58:51 <vmiklos> sberg: i'm not sure about if that encoding should or should not affect the encoding of the font name in general, but indeed at least for 2 (symbol encoding) you're right, Word doesn't encoding the font name with that encoding, either. Sep 20 15:59:30 <sberg> vmiklos, mst_, at the top of page 14 of Word2007RTFSpec9.docx I see "Note that runs of text marked with a particular font index (see \fN in the Font Table section) use the codepage for that font as given by \cpgN or implied by \fcharsetN, unless they use Unicode RTF described in the following section." Would that match what mst_ says? Sep 20 15:59:33 <vmiklos> so if it helps you case to handle at as e.g. ascii, just for that encoding, i think there would be no problem with that. Sep 20 16:00:07 <vmiklos> sberg: that still talks about the content using the font, not the strings (font names) in the font table itself, i think. Sep 20 16:01:17 <sberg> vmiklos, what's the control word to select such a font, also \fN? I don't see any such in n823675.rtf Sep 20 16:02:16 <mikekaganski> loircbot: e.g. \af3 Sep 20 16:02:31 <mikekaganski> sberg: ^ Sep 20 16:02:47 <mst_> 04d5a280beeeb6e056df68395dc9c3b3a674361b Sep 20 16:02:50 <IZBot> core - related: fdo#77979: writerfilter RTF import: read encoded font name - http://cgit.freedesktop.org/libreoffice/core/commit/?id=04d5a280beeeb6e056df68395dc9c3b3a674361b Sep 20 16:02:52 <mst_> sberg: ^ Sep 20 16:04:05 <sberg> mst_, thanks; so there's likely an (implicit?) exception for \fcharset2, as you say Sep 20 16:04:33 <mst_> that's most plausible, our own font code is full of exceptions for "symbol fonts" too Sep 20 16:05:19 <sberg> mikekaganski, ENOCONTEXT Sep 20 16:05:36 <mikekaganski> sberg: [17:01:16] sberg: vmiklos, what's the control word to select such a font, also \fN? I don't see any such in n823675.rtf Sep 20 16:06:32 <sberg> mikekaganski, so you say selection is done with \af3 instead of \f3? Sep 20 16:06:40 <mikekaganski> sberg: yes, in that case Sep 20 16:07:34 <mst_> i think there are several different keywords that apply fonts, but can't remember the whole list Sep 20 16:08:10 <mst_> \fN shoudl be one of them though Sep 20 16:22:18 <sberg> vmiklos, so who generated that sw/qa/extras/rtfimport/data/n823675.rtf, was it manually created and lacks a \cpgN before "Symbol"? Sep 20 16:29:17 <sberg> vmiklos, (after further reading of the RTF spec): disregard the "and lacks a \cpgN before 'Symbol'" part of my above question Sep 20 16:30:27 <mst_> sberg: i suggest not reading too much about encoding in RTF, it gets pretty lovecraftian pretty fast... Sep 20 16:32:58 <vmiklos> sberg: given how short that bugdoc is, i'm pretty sure i cut it down manually to something readable from a multi-MB real bugdoc Sep 20 16:33:07 <sberg> mst_, do you have a recommendation how I could get that "don't use symbol font encoding to read a symbol font's name" into writerfilter/source/rtftok/rtfdocumentimpl.cxx? RTFDocumentImpl::checkUnicode lacks the context to tell whether it is using m_aStates.top().nCurrentEncoding to convert a fontname, and the caller of checkUnicode (at the end of RTFDocumentImpl::resolveChars in this case) appears to lack the context, too Sep 20 16:33:12 <mst_> various Old Ones from The Time Before Unicode and their Backward Compatibility Tentacles etc. Sep 20 16:34:59 <sberg> vmiklos, anyway, that "so there's likely an (implicit?) exception for \fcharset2" hypothesis sounds sane, so we should probably implement it (if only you or mst_ can give me a good hint how...) Sep 20 16:35:13 <vmiklos> sberg: looking for a code pointer Sep 20 16:36:05 <mst_> sberg: m_aStates.top().eDestination == Destination::FONTENTRY should be the relevant check? Sep 20 16:36:17 <vmiklos> sberg: RTFDocumentImpl::text() is where the text is taken, Destination::FONTENTRY is the state on the parser stack which is a font entry in the font table. so to detect "your case" during decoding a byte array into a string, m_aStates.top().eDestination == Destination::FONTENTRY is what you want Sep 20 16:36:35 <vmiklos> ah good, two independent matching hints are promising ;) Sep 20 16:37:35 <sberg> mst_, vmiklos, ah; but what also looks dodgy is that checkUnicode operates there on "Symbol;" including the closing ";" of the full <fontinfo>, not just the <fontname> part of the <fontinfo> Sep 20 16:39:24 <vmiklos> sberg: i think we already assume that the only "token" in the font entry destination that is not bound to a control world (\foo) is the font name Sep 20 16:40:52 <vmiklos> sberg: writerfilter/source/rtftok/rtfdocumentimpl.cxx:1237 is where we simply strip away the trailing semicolon, there is no further separation between the font name and other character content inside the destination (apart from the control words and their arguments) Sep 20 16:42:18 <sberg> vmiklos, OK, thanks; I'll just pretend I haven't seen those dodgy details :) ...so I'm switching to (somewhat arbitrarily) RTL_TEXTENCODING_MS_1252 there now Change-Id: Iebd1bcecb7fa71c489798154d3356062b052775e Reviewed-on: https://gerrit.libreoffice.org/42477 Reviewed-by: Stephan Bergmann <sbergman@redhat.com> Tested-by: Stephan Bergmann <sbergman@redhat.com>
2017-09-22Fresh run of bin/update_pch.shMike Kaganski
Change-Id: I69d4157aaf6570cecd51ea59df20556914942e06 Reviewed-on: https://gerrit.libreoffice.org/42565 Tested-by: Jenkins <ci@libreoffice.org> Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com>
2017-09-21attempt to fix android buildNoel Grandin
Change-Id: Ie3eede03b90db272d70e7cb383c7a69d9db0f2ae
2017-09-20unused bPageToggle in GraphicImport_ImplNoel Grandin
ever since commit 0929dfa83ca4dbc675c74854566ce4e25def0dbd Date: Sat Jan 11 11:54:14 2014 +0100 writerfilter: drop never generated rtf {XAlign,FHDR,YAlign,XRelTo,YRelTo} Change-Id: Icabe9ff848e3cc9918741e9c68d8f2312145fb74
2017-09-20unused bVertFlip/bHoriFlip in GraphicImport_ImplNoel Grandin
ever since commit 4a924576e415f16e0571542bb0d683529f9046ff Date: Wed Jan 15 20:24:41 2014 +0100 writerfilter: drop unused BlipDib and FSP in doctok Change-Id: I9bf644bdc4b37cb6c4a9a9ab7757c4a83a520cd7