tdf#160253: fix list identifier export decision code

Commits 8f48f91009caa86d896f247059874242ed18bf39 (ODT export: omit unreferenced <text:list xml:id="...">, 2022-03-10) and 82bbf63582bdf28e7918e58ebf6657a9144bc9f3 (tdf#155823: Improve the check if the list id is not required, 2023-06-14) tried to improve deterministic ODF output, by omitting the list identifiers in case when those identifiers were unreferenced. The latter of these used document model node numbers to check if other lists appeared after the last occurrence of the list that is continuing in the current node. But it turned out, that this isn't robust. Consider this ODF: <text:list xml:id="list1" text:style-name="L1"> <text:list-item> <text:p>a</text:p> </text:list-item> </text:list> <text:p>b<text:note text:id="ftn1" text:note-class="endnote"><text:note-citation>i</text:note-citation><text:note-body> <text:list text:style-name="L2"> <text:list-item> <text:p>x</text:p> </text:list-item> </text:list></text:note-body></text:note></text:p> <text:list text:continue-list="list1" text:style-name="L1"> <text:list-item> <text:p>c</text:p> </text:list-item> </text:list> The paragraphs a, b, and c are all in the main document body, and have sequential document model node numbers (say, 15, 16, 17). If these numbers are checked, there is no node between node 15 ("a") and node 17 ("c") with a different list (both 15 and 17 belong to a list with style "L1" and identifier "list1", and node 16 doesn't belong to any lists). That suggests that the list identifier isn't needed in this case. Bug when the actual output of node 16 is done, it includes a node from an endnote ("x"), which is located in a different place in the document model, and has a node number like 7 (so not between 15 and 17). The paragraph "x" belongs to another list with style "L2", and is output to ODF between paragraphs "a" and "c". Here, we must refer from paragraph "c" to the list of the paragraph "a" using the list id, but this is not obvious when only considering node numbers, and requires the prior knowledge of the actual order of appearance of lists in the ODF. Unless we build a DOM, this is only possible, if we do a two-pass output, and collect the nodes order in the first pass. The output already does that in a "collect autostyles" pass. The problem here is that the "collect autostyles" pass used an optimized function, XMLTextParagraphExport::collectTextAutoStylesOptimized, introduced in commit 8195d7061ed52ebb98f46d35fe5929762c71e4b3 (INTEGRATION: CWS swautomatic01 (1.126.4); FILE MERGED, 2006-12-01) for #i65476# and which used style::XAutoStylesSupplier for optimization to get the autostyles. This drops XMLTextParagraphExport::collectTextAutoStylesOptimized, and reverts to use of collectTextAutoStyles, which handles nodes in the same order as when writing to ODF. There, we build a vector of the node numbers sequence, used later to sort DocumentListNodes. This uncovered an omission from the work on paragraph mark (commit 1a88efa8e02a6d765dab13c7110443bb9e6acecf tdf#155238: Reimplement how ListAutoFormat is stored to ODF, 2023-05-11). Turns out, that the code in SwTextFormatter::NewNumberPortion introduced in commit cb0e1b52d68aa6d5b505f91cb4ce577f7f3b2a8f (sw, numbering portion format: consider full-para char formats as well, 2022-10-20) was left behind when re-implementing paragraph marks to use dedicated property; empty trailing spans still affected how the lists were rendered, and that allowed to overlook import defects, where the paragraph mark properties weren't properly set. In ODF import (XMLParaContext::endFastElement), for compatibility, this treats empty trailing spans as defining paragraph mark (when the paragraph mark wasn't set explicitly). This way, the trailing spans get converted to the paragraph mark. In WW8 import, last cell paragraphs didn't call the code handling the paragraph marks. This is also fixed now. The changes result in slightly different numbering of autostyles in the ODF. It seems, that the new numbering more closely follows the order of appearance of the autostyles in the output; and some cases of autostyles that were written, but unreferenced, are now eliminated. The unit tests were updated accordingly. I hope that the performance impact on the export time would not be too large. It is unclear why outline numbering exports a list element at all. Fixing that to not emit the list element is a separate task / TODO. Change-Id: I5c99f8d48be77c4454ffac6ffa9f5babfe0d4909 Reviewed-on: https://gerrit.libreoffice.org/c/core/+/166572 Tested-by: Jenkins Reviewed-by: Mike Kaganski <mike.kaganski@collabora.com>
author: Mike Kaganski <mike.kaganski@collabora.com> 2024-04-24 07:55:35 +0500
committer: Mike Kaganski <mike.kaganski@collabora.com> 2024-04-24 11:37:27 +0200
commit: 69ed893087f89d176a5ec4b263ce8d75774be72b (patch)
tree: 95a24d99a458dad12520977dad8c44ca3d1b9a89 /xmloff/source
parent: dfb412699b96e12b2758be0e422c3e775f183d17 (diff)
2 files changed, 88 insertions, 136 deletions
diff --git a/xmloff/source/text/txtparae.cxx b/xmloff/source/text/txtparae.cxx
index 6153fb09d7a8..786539812341 100644
--- a/xmloff/source/text/txtparae.cxx
+++ b/xmloff/source/text/txtparae.cxx
@@ -39,6 +39,7 @@
 #include <com/sun/star/text/XTextTablesSupplier.hpp>
 #include <com/sun/star/text/XNumberingRulesSupplier.hpp>
 #include <com/sun/star/text/XChapterNumberingSupplier.hpp>
+#include <com/sun/star/text/XTextDocument.hpp>
 #include <com/sun/star/text/XTextTable.hpp>
 #include <com/sun/star/text/XText.hpp>
 #include <com/sun/star/text/XTextContent.hpp>
@@ -1331,12 +1332,14 @@ struct XMLTextParagraphExport::DocumentListNodes
 {
     struct NodeData
     {
+        std::ptrdiff_t order;
         sal_Int32 index; // see SwNode::GetIndex and SwNodeOffset
         sal_uInt64 style_id; // actually a pointer to NumRule
         OUString list_id;
     };
     std::vector<NodeData> docListNodes;
-    DocumentListNodes(const css::uno::Reference<css::frame::XModel>& xModel)
+    DocumentListNodes(const css::uno::Reference<css::frame::XModel>& xModel,
+                      const std::vector<sal_Int32>& aDocumentNodeOrder)
     {
         // Sequence of nodes, each of them represented by three-element sequence,
         // corresponding to NodeData members
@@ -1358,13 +1361,18 @@ struct XMLTextParagraphExport::DocumentListNodes
         for (const auto& node : nodes)
         {
             assert(node.getLength() == 3);
-            docListNodes.push_back({ .index = node[0].get<sal_Int32>(),
+            sal_Int32 nodeIndex = node[0].get<sal_Int32>();
+            auto nodeOrder = std::distance(
+                aDocumentNodeOrder.begin(),
+                std::find(aDocumentNodeOrder.begin(), aDocumentNodeOrder.end(), nodeIndex));
+            docListNodes.push_back({ .order = nodeOrder,
+                                     .index = nodeIndex,
                                      .style_id = node[1].get<sal_uInt64>(),
                                      .list_id = node[2].get<OUString>() });
         }
 
         std::sort(docListNodes.begin(), docListNodes.end(),
-                  [](const NodeData& lhs, const NodeData& rhs) { return lhs.index < rhs.index; });
+                  [](const NodeData& lhs, const NodeData& rhs) { return lhs.order < rhs.order; });
     }
     bool ShouldSkipListId(const Reference<XTextContent>& xTextContent) const
     {
@@ -1385,10 +1393,9 @@ struct XMLTextParagraphExport::DocumentListNodes
                 return false;
             }
 
-            auto it = std::lower_bound(docListNodes.begin(), docListNodes.end(), index,
-                                       [](const NodeData& lhs, sal_Int32 rhs)
-                                       { return lhs.index < rhs; });
-            if (it == docListNodes.end() || it->index != index)
+            auto it = std::find_if(docListNodes.begin(), docListNodes.end(),
+                                   [index](const NodeData& el) { return el.index == index; });
+            if (it == docListNodes.end())
                 return false;
 
             // We need to write the id, when there will be continuation of the list either with
@@ -1616,9 +1623,7 @@ const enum XMLTokenEnum lcl_XmlReferenceElements[] = {
 const enum XMLTokenEnum lcl_XmlBookmarkElements[] = {
     XML_BOOKMARK, XML_BOOKMARK_START, XML_BOOKMARK_END };
 
-// This function replaces the text portion iteration during auto style
-// collection.
-void XMLTextParagraphExport::collectTextAutoStylesOptimized( bool bIsProgress )
+void XMLTextParagraphExport::collectTextAutoStylesAndNodeExportOrder(bool bIsProgress)
 {
     GetExport().GetShapeExport(); // make sure the graphics styles family is added
 
@@ -1628,60 +1633,11 @@ void XMLTextParagraphExport::collectTextAutoStylesOptimized( bool bIsProgress )
     const bool bAutoStyles = true;
     const bool bExportContent = false;
 
-    // Export AutoStyles:
-    Reference< XAutoStylesSupplier > xAutoStylesSupp( GetExport().GetModel(), UNO_QUERY );
-    if ( xAutoStylesSupp.is() )
+    if (auto xTextDocument = GetExport().GetModel().query<XTextDocument>())
     {
-        Reference< XAutoStyles > xAutoStyleFamilies = xAutoStylesSupp->getAutoStyles();
-        const auto collectFamily = [this, &xAutoStyleFamilies](const OUString& sName,
-                                                               XmlStyleFamily nFamily) {
-            Any aAny = xAutoStyleFamilies->getByName( sName );
-            Reference< XAutoStyleFamily > xAutoStyles = *o3tl::doAccess<Reference<XAutoStyleFamily>>(aAny);
-            Reference < XEnumeration > xAutoStylesEnum( xAutoStyles->createEnumeration() );
-
-            while ( xAutoStylesEnum->hasMoreElements() )
-            {
-                aAny = xAutoStylesEnum->nextElement();
-                Reference< XAutoStyle > xAutoStyle = *o3tl::doAccess<Reference<XAutoStyle>>(aAny);
-                Reference < XPropertySet > xPSet( xAutoStyle, uno::UNO_QUERY );
-                Add( nFamily, xPSet, {}, true );
-            }
-        };
-        collectFamily("CharacterStyles", XmlStyleFamily::TEXT_TEXT);
-        collectFamily("RubyStyles", XmlStyleFamily::TEXT_RUBY);
-        collectFamily("ParagraphStyles", XmlStyleFamily::TEXT_PARAGRAPH);
-    }
-
-    // Export Field AutoStyles:
-    Reference< XTextFieldsSupplier > xTextFieldsSupp( GetExport().GetModel(), UNO_QUERY );
-    if ( xTextFieldsSupp.is() )
-    {
-        Reference< XEnumerationAccess > xTextFields = xTextFieldsSupp->getTextFields();
-        Reference < XEnumeration > xTextFieldsEnum( xTextFields->createEnumeration() );
-
-        while ( xTextFieldsEnum->hasMoreElements() )
-        {
-            Any aAny = xTextFieldsEnum->nextElement();
-            Reference< XTextField > xTextField = *o3tl::doAccess<Reference<XTextField>>(aAny);
-            exportTextField( xTextField, bAutoStyles, bIsProgress,
-                !xAutoStylesSupp.is(), nullptr );
-            try
-            {
-                Reference < XPropertySet > xSet( xTextField, UNO_QUERY );
-                Reference < XText > xText;
-                Any a = xSet->getPropertyValue("TextRange");
-                a >>= xText;
-                if ( xText.is() )
-                {
-                    exportText( xText, true, bIsProgress, bExportContent );
-                    GetExport().GetTextParagraphExport()
-                        ->collectTextAutoStyles( xText );
-                }
-            }
-            catch (Exception&)
-            {
-            }
-        }
+        bInDocumentNodeOrderCollection = true;
+        collectTextAutoStyles(xTextDocument->getText(), bIsProgress);
+        bInDocumentNodeOrderCollection = false;
     }
 
     // Export text frames:
@@ -1728,47 +1684,11 @@ void XMLTextParagraphExport::collectTextAutoStylesOptimized( bool bIsProgress )
             }
         }
 
-    sal_Int32 nCount;
-    // AutoStyles for sections
-    Reference< XTextSectionsSupplier > xSectionsSupp( GetExport().GetModel(), UNO_QUERY );
-    if ( xSectionsSupp.is() )
-    {
-        Reference< XIndexAccess > xSections( xSectionsSupp->getTextSections(), UNO_QUERY );
-        if ( xSections.is() )
-        {
-            nCount = xSections->getCount();
-            for( sal_Int32 i = 0; i < nCount; ++i )
-            {
-                Any aAny = xSections->getByIndex( i );
-                Reference< XTextSection > xSection = *o3tl::doAccess<Reference<XTextSection>>(aAny);
-                Reference < XPropertySet > xPSet( xSection, uno::UNO_QUERY );
-                Add( XmlStyleFamily::TEXT_SECTION, xPSet );
-            }
-        }
-    }
-
-    // AutoStyles for tables (Note: suppress autostyle collection for paragraphs in exportTable)
-    Reference< XTextTablesSupplier > xTablesSupp( GetExport().GetModel(), UNO_QUERY );
-    if ( xTablesSupp.is() )
-    {
-        Reference< XIndexAccess > xTables( xTablesSupp->getTextTables(), UNO_QUERY );
-        if ( xTables.is() )
-        {
-            nCount = xTables->getCount();
-            for( sal_Int32 i = 0; i < nCount; ++i )
-            {
-                Any aAny = xTables->getByIndex( i );
-                Reference< XTextTable > xTable = *o3tl::doAccess<Reference<XTextTable>>(aAny);
-                exportTable( xTable, true, true );
-            }
-        }
-    }
-
     Reference< XNumberingRulesSupplier > xNumberingRulesSupp( GetExport().GetModel(), UNO_QUERY );
     if ( xNumberingRulesSupp.is() )
     {
         Reference< XIndexAccess > xNumberingRules = xNumberingRulesSupp->getNumberingRules();
-        nCount = xNumberingRules->getCount();
+        sal_Int32 nCount = xNumberingRules->getCount();
         // Custom outline assignment lost after re-importing sxw (#i73361#)
         for( sal_Int32 i = 0; i < nCount; ++i )
         {
@@ -1894,14 +1814,36 @@ bool XMLTextParagraphExport::ExportListId() const
            && GetExport().getSaneDefaultVersion() >= SvtSaveOptions::ODFSVER_012;
 }
 
+void XMLTextParagraphExport::RecordNodeIndex(const css::uno::Reference<css::text::XTextContent>& xTextContent)
+{
+    if (!bInDocumentNodeOrderCollection)
+        return;
+    if (auto xPropSet = xTextContent.query<css::beans::XPropertySet>())
+    {
+        try
+        {
+            sal_Int32 index = 0;
+            // See SwXParagraph::Impl::GetPropertyValues_Impl
+            xPropSet->getPropertyValue("ODFExport_NodeIndex") >>= index;
+            assert(std::find(maDocumentNodeOrder.begin(), maDocumentNodeOrder.end(), index)
+                   == maDocumentNodeOrder.end());
+            maDocumentNodeOrder.push_back(index);
+        }
+        catch (css::beans::UnknownPropertyException&)
+        {
+            // That's absolutely fine!
+        }
+    }
+}
+
 bool XMLTextParagraphExport::ShouldSkipListId(const Reference<XTextContent>& xTextContent)
 {
     if (!mpDocumentListNodes)
     {
         if (ExportListId())
-            mpDocumentListNodes.reset(new DocumentListNodes(GetExport().GetModel()));
+            mpDocumentListNodes.reset(new DocumentListNodes(GetExport().GetModel(), maDocumentNodeOrder));
         else
-            mpDocumentListNodes.reset(new DocumentListNodes({}));
+            mpDocumentListNodes.reset(new DocumentListNodes({}, {}));
     }
 
     return mpDocumentListNodes->ShouldSkipListId(xTextContent);
@@ -1952,6 +1894,7 @@ void XMLTextParagraphExport::exportTextContentEnumeration(
         {
             if( bAutoStyles )
             {
+                RecordNodeIndex(xTxtCntnt);
                 exportListAndSectionChange( xCurrentTextSection, xTxtCntnt,
                                             aPrevNumInfo, aNextNumInfo,
                                             bAutoStyles );
@@ -2323,7 +2266,6 @@ void XMLTextParagraphExport::exportParagraph(
 
     Reference < XEnumerationAccess > xEA( rTextContent, UNO_QUERY );
     Reference < XEnumeration > xTextEnum = xEA->createEnumeration();
-    const bool bHasPortions = xTextEnum.is();
 
     Reference < XEnumeration> xContentEnum;
     Reference < XContentEnumerationAccess > xCEA( rTextContent, UNO_QUERY );
@@ -2357,22 +2299,10 @@ void XMLTextParagraphExport::exportParagraph(
 
     bool bPrevCharIsSpace(true); // true because whitespace at start is ignored
 
-    if( bAutoStyles )
-    {
-        if( bHasContentEnum )
-            exportTextContentEnumeration(
-                                    xContentEnum, bAutoStyles, xSection,
-                                    bIsProgress );
-        if ( bHasPortions )
-        {
-            exportTextRangeEnumeration(xTextEnum, bAutoStyles, bIsProgress, bPrevCharIsSpace);
-        }
-    }
-    else
     {
         enum XMLTokenEnum eElem =
             0 < nOutlineLevel ? XML_H : XML_P;
-        SvXMLElementExport aElem( GetExport(), eExtensionNS == TextPNS::EXTENSION ? XML_NAMESPACE_LO_EXT : XML_NAMESPACE_TEXT, eElem,
+        SvXMLElementExport aElem( GetExport(), !bAutoStyles, eExtensionNS == TextPNS::EXTENSION ? XML_NAMESPACE_LO_EXT : XML_NAMESPACE_TEXT, eElem,
                                   true, false );
         if( bHasContentEnum )
         {
diff --git a/xmloff/source/text/txtparai.cxx b/xmloff/source/text/txtparai.cxx
index 94cef85739ef..f2124bf068ed 100644
--- a/xmloff/source/text/txtparai.cxx
+++ b/xmloff/source/text/txtparai.cxx
@@ -1789,6 +1789,46 @@ void XMLParaContext::endFastElement(sal_Int32 )
                                                true,
                                                mbOutlineContentVisible);
 
+    bool bEmptyHints = false;
+    XMLHint_Impl* pMarkerStyleHint = nullptr;
+    if (m_xHints)
+    {
+        uno::Reference<text::XTextRangeCompare> xCompare(xTxtImport->GetText(), uno::UNO_QUERY);
+        if (xCompare.is())
+        {
+            try
+            {
+                for (const auto& pHint : m_xHints->GetHints())
+                {
+                    if (xCompare->compareRegionStarts(pHint->GetStart(), pHint->GetEnd()) == 0)
+                    {
+                        bEmptyHints = true;
+
+                        // Is this the trailing empty span, defining the paragraph mark properties?
+                        // Convert it to the marker style, for backward compatibility with documents
+                        // created between commits 6249858a8972aef077e0249bd93cfe8f01bce4d6 and
+                        // 1a88efa8e02a6d765dab13c7110443bb9e6acecf, where the trailing empty spans
+                        // were used to store the marker formatting
+                        if (!m_aMarkerStyleName.hasValue()
+                            && xCompare->compareRegionStarts(pHint->GetStart(), xEnd) == 0)
+                        {
+                            if (auto pStyle = GetImport().GetTextImport()->FindAutoCharStyle(
+                                    static_cast<XMLStyleHint_Impl*>(pHint.get())->GetStyleName()))
+                            {
+                                m_aMarkerStyleName = pStyle->GetAutoName();
+                                pMarkerStyleHint = pHint.get();
+                            }
+                        }
+                    }
+                }
+            }
+            catch (const uno::Exception&)
+            {
+                TOOLS_WARN_EXCEPTION("xmloff.text", "");
+            }
+        }
+    }
+
     if (m_aMarkerStyleName.hasValue())
     {
         if (auto xPropSet = xStart.query<css::beans::XPropertySet>())
@@ -1850,26 +1890,7 @@ void XMLParaContext::endFastElement(sal_Int32 )
     {
         bool bSetNoFormatAttr = false;
         uno::Reference<beans::XPropertySet> xCursorProps(xAttrCursor, uno::UNO_QUERY);
-        int nEmptyHints = 0;
-        uno::Reference<text::XTextRangeCompare> xCompare(xTxtImport->GetText(), uno::UNO_QUERY);
-        if (xCompare.is())
-        {
-            try
-            {
-                for (const auto& pHint : m_xHints->GetHints())
-                {
-                    if (xCompare->compareRegionStarts(pHint->GetStart(), pHint->GetEnd()) == 0)
-                    {
-                        ++nEmptyHints;
-                    }
-                }
-            }
-            catch (const uno::Exception&)
-            {
-                TOOLS_WARN_EXCEPTION("xmloff.text", "");
-            }
-        }
-        if (nEmptyHints > 0 || m_aMarkerStyleName.hasValue())
+        if (bEmptyHints || m_aMarkerStyleName.hasValue())
         {
             // We have at least one empty hint, then make try to ask the cursor to not upgrade our character
             // attributes to paragraph-level formatting, which would lead to incorrect rendering.
@@ -1888,6 +1909,7 @@ void XMLParaContext::endFastElement(sal_Int32 )
             switch( pHint->GetType() )
             {
             case XMLHintType::XML_HINT_STYLE:
+                if (pHint != pMarkerStyleHint) // already processed above
                 {
                     const OUString& rStyleName =
                             static_cast<XMLStyleHint_Impl *>(pHint)->GetStyleName();
author	Mike Kaganski <mike.kaganski@collabora.com>	2024-04-24 07:55:35 +0500
committer	Mike Kaganski <mike.kaganski@collabora.com>	2024-04-24 11:37:27 +0200
commit	69ed893087f89d176a5ec4b263ce8d75774be72b (patch)
tree	95a24d99a458dad12520977dad8c44ca3d1b9a89 /xmloff/source
parent	dfb412699b96e12b2758be0e422c3e775f183d17 (diff)