diff options
author | Mike Kaganski <mike.kaganski@collabora.com> | 2019-12-13 09:36:39 +0300 |
---|---|---|
committer | Andras Timar <andras.timar@collabora.com> | 2020-01-09 12:15:39 +0100 |
commit | 3018dcbf12fe96e57fe52dc1b2d7a1235b480eef (patch) | |
tree | 999f377206f9d7f79f1120ba5c997dbd6f060095 /sw | |
parent | 62eee51aeaee380139126e21ac550e6e367164ab (diff) |
tdf#129353, tdf#129402: fix node creation on index import
ToC, bibliography, and index sections import code changed to closely
follow what Word does, make sure that pre-rendered entries don't get
imported as standalone paragraphs outside of the index sections, and
paragraph count is accurate (no missing or added paragraphs as much
as possible).
In Word, an index may start and end in the middle of a paragraph:
<w:p>
<w:r>
<w:t>Some text before index</w:t>
</w:r>
<w:r>
<w:fldChar w:fldCharType="begin"/>
</w:r>
<w:r>
<w:instrText> TOC ...</w:instrText>
</w:r>
<w:r>
<w:fldChar w:fldCharType="separate"/>
</w:r>
<w:r>
<w:t>First pre-rendered index entry</w:t>
</w:r>
</w:p>
...
<w:p>
<w:r>
<w:t>Last pre-rendered index entry</w:t>
</w:r>
<w:r>
<w:fldChar w:fldCharType="end"/>
</w:r>
<w:r>
<w:t>Some text after index</w:t>
</w:r>
</w:p>
However, normally it looks like either no runs precedig index, or no
runs of pre-rendered contents will be present. When no Std elements
are used, the typical situation is that there's a normal paragraph
(possibly with some user text), which ends with index start marker,
without any pre-rendered contents in the same paragraph; and all pre-
rendered contents goes in following paragraphs. Such index normally
ends with index end marker in the *first* run of a paragraph, which
then might have normal text runs.
When Stds are used, then no leading/trailing out-of-index runs in
paragraphs with marks are usually present; and in this case, when
paragraphs with index marks don't contain pre-rendered entries, they
still are treated as part of the index.
In Writer, indexes are node sections (and so cannot be inline with
other paragraph contents). When there was some paragraph content
already before the start-of-index mark, the paragraph is assumed
to end before the index; in this case, when current <w:p> element
ends, importer decides if a separate starting paragraph is needed
or not, depending on if there was some runs after the mark. When
there was no text runs before the starting mark, then the paragraph
is treated as leading paragraph of the index. This allows to not
miss empty paragraphs before index; and not have two paragraphs
where there was one in Word. Only in cases when user had manually
typed text both in and outside of the index in the same paragraph
in Word, we would have the paragraph split into two in Writer.
For end marks, the behaviour depends on whether it's inside Std.
When inside, the ending paragraph starting with index end mark is
considered part of the index. For out-of-Std case, it's considered
normal paragraph (and measures are taken to make sure it's not
dropped even if empty, because sometimes such paragraphs don't
have other content, and have section settings, which is usually
treated by Writer as "drop this paragraph" sign).
A special problem is multi-column index. It's wrapped into a
continuous section by Word; and in Writer, we also wrap it into
a section. It would be possibly useful to detect somehow if this
section is part of index definition, and in this case, drop the
section and put its properties into the Writer's index section.
That would avoid an explicit section in the imported document.
This is TODO, for someone who figures how to detect reliably if
the section belongs to index definition. See comment in
DomainMapper_Impl::appendTextSectionAfter. By the way, current
export code is wrong, producing an index that is single-column
in Word; this change doesn't touch that.
Several existing tests needed to be fixed, which used to test
wrong results.
Change-Id: I9597c8ab13f31ded9abcc24054d3478d3e3a3b40
Reviewed-on: https://gerrit.libreoffice.org/85089
Tested-by: Jenkins
Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com>
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/85278
Tested-by: Jenkins CollaboraOffice <jenkinscollaboraoffice@gmail.com>
Reviewed-on: https://gerrit.libreoffice.org/c/core/+/86464
Reviewed-by: Andras Timar <andras.timar@collabora.com>
Diffstat (limited to 'sw')
-rw-r--r-- | sw/qa/extras/ooxmlexport/data/tdf129353.docx | bin | 0 -> 4420 bytes | |||
-rw-r--r-- | sw/qa/extras/ooxmlexport/ooxmlexport13.cxx | 28 | ||||
-rw-r--r-- | sw/qa/extras/ooxmlexport/ooxmlexport5.cxx | 29 | ||||
-rw-r--r-- | sw/qa/extras/ooxmlexport/ooxmlfieldexport.cxx | 12 | ||||
-rw-r--r-- | sw/qa/extras/rtfimport/rtfimport.cxx | 2 |
5 files changed, 64 insertions, 7 deletions
diff --git a/sw/qa/extras/ooxmlexport/data/tdf129353.docx b/sw/qa/extras/ooxmlexport/data/tdf129353.docx Binary files differnew file mode 100644 index 000000000000..c5cf8865eda6 --- /dev/null +++ b/sw/qa/extras/ooxmlexport/data/tdf129353.docx diff --git a/sw/qa/extras/ooxmlexport/ooxmlexport13.cxx b/sw/qa/extras/ooxmlexport/ooxmlexport13.cxx index 3800851c3b83..d5aff0be51f9 100644 --- a/sw/qa/extras/ooxmlexport/ooxmlexport13.cxx +++ b/sw/qa/extras/ooxmlexport/ooxmlexport13.cxx @@ -11,10 +11,12 @@ #include <com/sun/star/beans/XPropertySet.hpp> #include <com/sun/star/text/WritingMode2.hpp> +#include <com/sun/star/text/XDocumentIndex.hpp> #include <com/sun/star/text/XTextFrame.hpp> #include <com/sun/star/drawing/XControlShape.hpp> #include <com/sun/star/style/ParagraphAdjust.hpp> #include <IDocumentSettingAccess.hxx> +#include <tools/lineend.hxx> #include <editsh.hxx> @@ -478,6 +480,32 @@ DECLARE_OOXMLEXPORT_TEST(testTdf127741, "tdf127741.docx") CPPUNIT_ASSERT(visitedStyleName.equalsIgnoreAsciiCase("Visited Internet Link")); } +DECLARE_OOXMLEXPORT_TEST(testTdf129353, "tdf129353.docx") +{ + CPPUNIT_ASSERT_EQUAL(8, getParagraphs()); + getParagraph(2, "Bibliography"); + getParagraph(4, "Christie, A. (1922). The Secret Adversary. "); + CPPUNIT_ASSERT_EQUAL(OUString(), getParagraph(8)->getString()); + + uno::Reference<text::XDocumentIndexesSupplier> xIndexSupplier(mxComponent, uno::UNO_QUERY); + uno::Reference<container::XIndexAccess> xIndexes = xIndexSupplier->getDocumentIndexes(); + uno::Reference<text::XDocumentIndex> xIndex(xIndexes->getByIndex(0), uno::UNO_QUERY); + uno::Reference<text::XTextRange> xTextRange = xIndex->getAnchor(); + uno::Reference<text::XText> xText = xTextRange->getText(); + uno::Reference<text::XTextCursor> xTextCursor = xText->createTextCursor(); + xTextCursor->gotoRange(xTextRange->getStart(), false); + xTextCursor->gotoRange(xTextRange->getEnd(), true); + OUString aIndexString(convertLineEnd(xTextCursor->getString(), LineEnd::LINEEND_LF)); + + // Check that all the pre-rendered entries are correct, including trailing spaces + CPPUNIT_ASSERT_EQUAL(OUString("\n" // starting with an empty paragraph + "Christie, A. (1922). The Secret Adversary. \n" + "\n" + "Verne, J. G. (1870). Twenty Thousand Leagues Under the Sea. \n" + ""), // ending with an empty paragraph + aIndexString); +} + CPPUNIT_PLUGIN_IMPLEMENT(); /* vim:set shiftwidth=4 softtabstop=4 expandtab: */ diff --git a/sw/qa/extras/ooxmlexport/ooxmlexport5.cxx b/sw/qa/extras/ooxmlexport/ooxmlexport5.cxx index fde09e123f9f..8d53353aa13c 100644 --- a/sw/qa/extras/ooxmlexport/ooxmlexport5.cxx +++ b/sw/qa/extras/ooxmlexport/ooxmlexport5.cxx @@ -9,6 +9,8 @@ #include <swmodeltestbase.hxx> +#include <com/sun/star/text/XDocumentIndex.hpp> +#include <com/sun/star/text/XDocumentIndexesSupplier.hpp> #include <com/sun/star/text/XFootnote.hpp> #include <com/sun/star/text/XTextTable.hpp> #include <com/sun/star/style/LineSpacing.hpp> @@ -676,6 +678,33 @@ DECLARE_OOXMLEXPORT_TEST(testFdo77129, "fdo77129.docx") assertXPathContent(pXmlDoc, "/w:document/w:body/w:p[4]/w:r[1]/w:t", "Abstract"); } +// Test the same testdoc used for testFdo77129. +DECLARE_OOXMLEXPORT_TEST(testTdf129402, "fdo77129.docx") +{ + // tdf#129402: ToC title must be "Contents", not "Content"; the index field must include + // pre-rendered element. + + // Currently export drops empty paragraph after ToC, so skip getParagraphs test for now +// CPPUNIT_ASSERT_EQUAL(5, getParagraphs()); + CPPUNIT_ASSERT_EQUAL(OUString("owners."), getParagraph(1)->getString()); + CPPUNIT_ASSERT_EQUAL(OUString("Contents"), getParagraph(2)->getString()); + CPPUNIT_ASSERT_EQUAL(OUString("How\t2"), getParagraph(3)->getString()); +// CPPUNIT_ASSERT_EQUAL(OUString(), getParagraph(4)->getString()); + + uno::Reference<text::XDocumentIndexesSupplier> xIndexSupplier(mxComponent, uno::UNO_QUERY); + uno::Reference<container::XIndexAccess> xIndexes = xIndexSupplier->getDocumentIndexes(); + uno::Reference<text::XDocumentIndex> xIndex(xIndexes->getByIndex(0), uno::UNO_QUERY); + uno::Reference<text::XTextRange> xTextRange = xIndex->getAnchor(); + uno::Reference<text::XText> xText = xTextRange->getText(); + uno::Reference<text::XTextCursor> xTextCursor = xText->createTextCursor(); + xTextCursor->gotoRange(xTextRange->getStart(), false); + xTextCursor->gotoRange(xTextRange->getEnd(), true); + OUString aTocString(xTextCursor->getString()); + + // Check that the pre-rendered entry is inside the index + CPPUNIT_ASSERT_EQUAL(OUString("How\t2"), aTocString); +} + DECLARE_OOXMLEXPORT_TEST(testfdo79969_xlsm, "fdo79969_xlsm.docx") { // This UT for DOCX embedded with excel work sheet. diff --git a/sw/qa/extras/ooxmlexport/ooxmlfieldexport.cxx b/sw/qa/extras/ooxmlexport/ooxmlfieldexport.cxx index 1765a28fa64f..94ca46896548 100644 --- a/sw/qa/extras/ooxmlexport/ooxmlfieldexport.cxx +++ b/sw/qa/extras/ooxmlexport/ooxmlfieldexport.cxx @@ -282,9 +282,9 @@ DECLARE_OOXMLEXPORT_TEST(testAlphabeticalIndex_MultipleColumns,"alphabeticalInde // check for section breaks after and before the Index Section assertXPath(pXmlDoc, "/w:document/w:body/w:p[2]/w:pPr/w:sectPr/w:type","val","continuous"); - assertXPath(pXmlDoc, "/w:document/w:body/w:p[9]/w:pPr/w:sectPr/w:type","val","continuous"); + assertXPath(pXmlDoc, "/w:document/w:body/w:p[8]/w:pPr/w:sectPr/w:type","val","continuous"); // check for "w:space" attribute for the columns in Section Properties - assertXPath(pXmlDoc, "/w:document/w:body/w:p[9]/w:pPr/w:sectPr/w:cols/w:col[1]","space","720"); + assertXPath(pXmlDoc, "/w:document/w:body/w:p[8]/w:pPr/w:sectPr/w:cols/w:col[1]","space","720"); } DECLARE_OOXMLEXPORT_TEST(testPageref, "testPageref.docx") @@ -359,9 +359,9 @@ DECLARE_OOXMLEXPORT_TEST(test_FieldType, "99_Fields.docx") if (!pXmlDoc) return; // Checking for three field types (BIBLIOGRAPHY, BIDIOUTLINE, CITATION) in sequence - assertXPath(pXmlDoc, "/w:document[1]/w:body[1]/w:p[2]/w:r[2]/w:instrText[1]",1); - assertXPath(pXmlDoc, "/w:document[1]/w:body[1]/w:p[5]/w:r[2]/w:instrText[1]",1); - assertXPath(pXmlDoc, "/w:document[1]/w:body[1]/w:p/w:sdt/w:sdtContent/w:r[2]/w:instrText[1]",1); + assertXPath(pXmlDoc, "/w:document/w:body/w:p[2]/w:r[2]/w:instrText"); + assertXPath(pXmlDoc, "/w:document/w:body/w:p[3]/w:r[2]/w:instrText"); + assertXPath(pXmlDoc, "/w:document/w:body/w:p[4]/w:sdt/w:sdtContent/w:r[2]/w:instrText"); } DECLARE_OOXMLEXPORT_TEST(testCitation,"FDO74775.docx") @@ -411,7 +411,7 @@ DECLARE_OOXMLEXPORT_TEST(testFDO78654 , "fdo78654.docx") return; // In case of two "Hyperlink" tags in one paragraph and one of them // contains "PAGEREF" field then field end tag was missing from hyperlink. - assertXPath ( pXmlDoc, "/w:document/w:body/w:p[2]/w:hyperlink[3]/w:r[5]/w:fldChar", "fldCharType", "end" ); + assertXPath ( pXmlDoc, "/w:document/w:body/w:sdt/w:sdtContent/w:p[2]/w:hyperlink[3]/w:r[5]/w:fldChar", "fldCharType", "end" ); } diff --git a/sw/qa/extras/rtfimport/rtfimport.cxx b/sw/qa/extras/rtfimport/rtfimport.cxx index 6fa57b6de4ce..b35ee4b84717 100644 --- a/sw/qa/extras/rtfimport/rtfimport.cxx +++ b/sw/qa/extras/rtfimport/rtfimport.cxx @@ -919,7 +919,7 @@ DECLARE_RTFIMPORT_TEST(testFdo44984, "fdo44984.rtf") DECLARE_RTFIMPORT_TEST(testFdo82071, "fdo82071.rtf") { // The problem was that in TOC, chapter names were underlined, but they should not be. - uno::Reference<text::XTextRange> xRun = getRun(getParagraph(2), 1); + uno::Reference<text::XTextRange> xRun = getRun(getParagraph(1), 1); // Make sure we test the right text portion. CPPUNIT_ASSERT_EQUAL(OUString("Chapter 1"), xRun->getString()); // This was awt::FontUnderline::SINGLE. |