Page 1 of 1

Deleting ZERO WIDTH NON-JOINER (8204) character in footnotes on importing docx by TRvOfficeConverter

Posted: Sun Feb 18, 2018 9:45 am
by saeid2016
Hello,
When we import docx or doc files by TRVOfficeImporter to TRichViewEdit, If the footnotes has ZERO WIDTH NON-JOINER(8204) character it have been deleted after import but this character imports correctly in main text.

Re: Deleting ZERO WIDTH NON-JOINER (8204) character in footnotes on importing docx by TRvOfficeConverter

Posted: Sun Feb 18, 2018 10:25 am
by Sergey Tkachenko
Try copy-pasting from MS Word to TRichViewEdit.
If this problem persists, the problem is in our RTF reading procedure. Otherwise, most probably, the problem is in the converter.

Re: Deleting ZERO WIDTH NON-JOINER (8204) character in footnotes on importing docx by TRvOfficeConverter

Posted: Sun Feb 18, 2018 11:09 am
by saeid2016
Sergey Tkachenko wrote: Sun Feb 18, 2018 10:25 am Try copy-pasting from MS Word to TRichViewEdit.
If this problem persists, the problem is in our RTF reading procedure. Otherwise, most probably, the problem is in the converter.
I tried copy-pasting from MS Word to TRichViewEdit. The problem doesn't exist.

I use this converter: https://www.microsoft.com/en-us/downloa ... .aspx?id=3
I have downloaded and installed it's update from here: https://support.microsoft.com/en-us/hel ... y-pack-sp3

Is there an other converter to use it?

Re: Deleting ZERO WIDTH NON-JOINER (8204) character in footnotes on importing docx by TRvOfficeConverter

Posted: Sun Feb 18, 2018 10:07 pm
by Sergey Tkachenko
I looked at RTF files generated by the converter and MS Word 2016.
The both of them saved this character (8204 decimal code, or 200C hexadecimal code) using \zwbo keyword, both in the main text and in footnotes.
So (a strange thing!) on my computer the effect is the same regardless the character location and using the converter.

According to RTF specification:
\zwbo - Zero-width break opportunity. Used to insert break opportunity between two characters.
This RTF keyword is ignored by TRichView, because there are no Unicode characters that work exactly in the way it is described, and RTF specification has the more appropriate keyword:
\zwnj - Zero-width nonjoiner. This is used for unligating a character.
TRichView supports \zwnj, and loads it as ZERO WIDTH NON-JOINER character.

I made one more test: created RTF file containing \zwnj opened and resaved it. \zwnj were saved as \zwbo!
So it looks like MS Word threats them as synonyms.
In the next update, I'll include loading \zwbo as ZERO WIDTH NON-JOINER character. However, I am not sure that it fixes the problem from your side, because you describe different results. To answer mode definitely, I need to see RTF file generated by the converter.
To get it, instead of rvc.ImportRV call rvc.ImportRTF, and then rvc.Stream.SaveToFile(<name or RTF file>).

Re: Deleting ZERO WIDTH NON-JOINER (8204) character in footnotes on importing docx by TRvOfficeConverter

Posted: Mon Feb 19, 2018 12:44 pm
by Sergey Tkachenko
Quick fix:
Open RVRTF.pas. Find the constant isymMax, increase its value by 2.
Add two items in rgsymRtf array declaration (at any place):

Code: Select all

    (Keyword:'zwbo';     DefValue:$200C;        UseDef:False;  kwd:rtf_kwd_WideChar; idx:0;               AffectTo:rtf_af_None),
    (Keyword:'zwnbo';    DefValue:$200D;        UseDef:False;  kwd:rtf_kwd_WideChar; idx:0;               AffectTo:rtf_af_None),

Re: Deleting ZERO WIDTH NON-JOINER (8204) character in footnotes on importing docx by TRvOfficeConverter

Posted: Fri Mar 16, 2018 9:58 am
by saeid2016
Thank you very much.

Re: Deleting ZERO WIDTH NON-JOINER (8204) character in footnotes on importing docx by TRvOfficeConverter

Posted: Wed Apr 04, 2018 10:45 am
by Sergey Tkachenko
This change is added in TRichView 17.3