Page 1 of 1

importing documents with embedded hyperlinks

Posted: Fri Jan 25, 2008 7:41 am
by toolwiz
Again, I'm not sure if this is an issue with the RichView, RVActions, or SRV.

In the Actions Demo, if I do an Insert File ... and load a Word 2002 document that contains normal URLs as well as anchor text with embedded hyperlinks (ie, where the anchor text is NOT a URL, but just some text), they both appear as blue text with underscores. However, their styles do not identify them as hyperlinks. Nor does the embedded hyperlink's URL seem to be accessible.

Either there's something about embedded hyperlinks that the importer can see to change the text font's style, or it's just copying the style (blue text with underline) from the original document and ignoring the embedded hyperlink.

Is there some way to get embedded hyperlinks to be imported where the underlying URL is attached to the associated text item?

NOTE: In the Actions Demo app, RichViewEdit1_ReadHyperlink is not called when Insert File... Word Doc is imported that contains hyperlinks.

NOTE: If I add a space after a hyperlink, RichViewEdit1_KeyDown gets called and it calls rvActionInsertHyperlink1.DetectURL() and rvActionInsertHyperlink1.TerminateHyperlink(). But if I then highlight the URL and click the Insert | Hyperlink button or menus, the URL is not placed into the form as one might expect allowing it to be edited. If this is, indeed, only an "insert" function, then how does one go about editing a hyperlink attached to normal text?

Thanks
-David

Posted: Fri Jan 25, 2008 2:47 pm
by Sergey Tkachenko
1) Please send me example of this Word doc to test.
2) This action must be able add, modify and delete hyperlinks. I cannot reproduce the problem. Please give me step by step instructions how to do it.

Posted: Sat Jan 26, 2008 10:09 am
by toolwiz
Sergey Tkachenko wrote:1) Please send me example of this Word doc to test.
2) This action must be able add, modify and delete hyperlinks. I cannot reproduce the problem. Please give me step by step instructions how to do it.
It is very simple.

I created a new Word document and just typed in some text that included some hyperlinks, as well as a couple of embedded hyperlinks, and a couple of email addresses. I saved a copy as a .RTF file, and a copy as a .DOC file.

Nothing magical about it.

I found that if I do Insert File... and load an RTF document, then it DOES properly recognize hyperlinks written as URLs in the text. However, it does not recognize embedded hyperlinks.

Examples:

(1) a hyperlink is: http://www.trichview.com/

(2) an embedded hyperlink is: the link here is embedded

The demo app can read URLs like (1) from an RTF file and make it a hyperlink that appears when you click the hyperlink button in the demo app. It does not recognize any embedded (2) links.

But when you import a Word doc file, it does not recognize either (1) or (2).

Any links will do. You can make your own test files very easily.

(I have Word 2002 SP2.)

-David

Posted: Sat Jan 26, 2008 12:37 pm
by Sergey Tkachenko
For any case, send me the document example. I want to test on the same data as you do.

Posted: Sat Jan 26, 2008 8:51 pm
by toolwiz
Maybe you can point me to just ONE Word Document that this works with?

I've tried about a dozen, written by many different people with different versions of Word, including some I just created from scratch, and I get the exact same results each time.

But I'll be happy to send you a simple test page I generated.

what email address do you want me to use?

-David

Posted: Sun Jan 27, 2008 4:54 pm
by Sergey Tkachenko
Well, I can reproduce the problem with importing DOC files using office text converters.
Unfortunately, it cannot be fixed, because the DOC file import converter creates RTF without hyperlinks.

As for the problem with embedded hyperlinks when importing RTF files, I cannot reproduce it. May be I do not understand what is embedded hyperlink. Is it a hyperlink where visible text <> target? I can see no problems with them.
Send example to [email protected]

Posted: Sun Jan 27, 2008 9:57 pm
by toolwiz
Sergey Tkachenko wrote:Well, I can reproduce the problem with importing DOC files using office text converters.
Unfortunately, it cannot be fixed, because the DOC file import converter creates RTF without hyperlinks.
This means that a separate pass needs to be made to convert all obvious URLs and email addresses to hyperlink style. It also means that no embedded hyperlinks can be imported. Hmmm....
Sergey Tkachenko wrote:As for the problem with embedded hyperlinks when importing RTF files, I cannot reproduce it. May be I do not understand what is embedded hyperlink. Is it a hyperlink where visible text <> target? I can see no problems with them.
Send example to [email protected]
Sorry, I don't know what to call it -- an embedded hyperlink it's when the anchor text in an HTML anchor tag is not a URL, but just some text. This isn't HTML. You select some text in Word, right-click and select the "Hyperlink" option. It opens a little form and you enter a URL there. It saves it so the text says whatever and if you click it you go to the URL. The URL is "embedded" or "hidden" behind the text.

I have shown example in earlier message.

this is an explicit hyperlink: http://www.trichview.com

this is embedded: not a url

You've said that the Word converter exports RTF, but for whatever reason the incoming hyperlinks are not being recognized as hyperlinks, and anything that's embedded or hidden behind the text is not making it through.

If you open an RTF file directly, it appears that both explicit and embedded links DO come through. The surrounding text style is used and a mouse-over displays the actual URL in both cases.

However, if you import an RTF file through the Insert File... option and use that converter, you get a different result. In this case, the explicit hyperlinks are set with the default hyperlink style (a Times Roman font) and the embedded links are not recognized.

Interestingly, if you create hyperlinks in a Word doc, save it as a DOC file, then save it as an RTF file, both explicit and embedded hyperlinks ARE saved properly when you open that RTF file in Word.

So it's perplexing why the Word converter that converts to RTF is unable to save hyperlinks or even embedded hyperlinks. Don't you think the same ocnverter is being used when you do a Save As... RTF file?

Either way, the problem is NOT with RTF files. It's with importing Word (.DOC) files, because that's the format that most people use.

It's ok to tell them to export as RTF first, except that I've noticed that RTF files do not conform to the right-hand boundaries of the page layout in SRV, whereas DOC files do. That is, you can set 1" margins all the way around, RTF text stretches almost to the right side of the page, well into the right-hand margin area. DOC files don't do that.

-David

Posted: Tue Jan 29, 2008 8:22 pm
by toolwiz
I need to know what to do about this.

The majority of our users will be importing Word (.doc) files. I can tell them to re-insert embedded links, and I can add a pass that converts explicit URLs to hyperlink style.

Or I can tell them to save their documents as RTF files first, and then import that into the app. RTF files preserve the hyperlinks, both explicit and implicit. But RTF files do not observe the right-hand margins properly when they're imported, whereas .DOC files do.

I need to get this resolved so I know how best to proceed.

Right now, I don't know where the problems lie, except that what I'm reporting is showing up in the Actions Demo, very consistently.

Thanks
-David

ps: can anybody tell me where to find the code that handles the office converters and actually does the import into the RVE control?

Posted: Tue Feb 26, 2008 9:20 am
by Sergey Tkachenko
When MS Word saves RTF files, it converts document to RTF almost perfectly.
When you open DOC file using converter, this converter DLL converts DOC to RTF allowing thirdparty applications (such as applications using TRVOfficeConverter) reading this RTF. And it often does it not very good. Hyperlinks are lost - this is one problem. There are some other problems too. So converting documents to RTF before importing is the best way.

As for bad margins, I think this problem is fixed in the latest version. At least I just saved in MS Word RTF file having margins 1 cm top, 2 cm bottom, 3 cm left, 4 cm right, and it was opened in ScaleRichView as expected.

Posted: Tue Feb 26, 2008 10:10 am
by toolwiz
That's strange. You'd think they'd use the same converter code for both conversions.

-David

Something interesting...

Posted: Wed Feb 27, 2008 8:37 am
by toolwiz
I discovered Something interesting:

When you do the Insert | File... and click the drop-down at the bottom
to select the file type, I found there are several different ones for
Word docs:

Word 97-2002 (*.doc)
Word 97-2003 (*.doc)
Word 2006 (*.docx)

The first one does not import embedded links at all. However ... the 2nd one
DOES! It also enables the program to recognize URLs, which it can't do
in the first option.

I haven't tried the 3rd one as it got installed as a convter with something
else. I don't have that version of Word or Office.

-David

Posted: Wed Feb 27, 2008 11:53 am
by Sergey Tkachenko
I believe the second one was installed with Word 2006. Finally they updated this converter (I suspect "Word 97-2002" was not updated since Word97)