Problem with rve.SearchText

General TRichView support forum. Please post your questions here
Post Reply
toolwiz
Posts: 150
Joined: Wed Nov 30, 2005 3:27 am

Problem with rve.SearchText

Post by toolwiz »

I've got the following code:

Code: Select all

        tot_kwds := 0;
        for wd_ndx := 0 to cklist.Count-1 do begin
            // 1
            if not CurrFileData.ContainsWord( cklist.Items[wd_ndx] ) then
                continue;

            rve.SetSelectionBounds(0, rve.GetOffsBeforeItem(0), 0, rve.GetOffsBeforeItem(0));
            while rve.SearchText( cklist.Items[wd_ndx], [rvseoDown,rvseoWholeWord]) do begin
                inc( tot_kwds );
                // . . .
            end;
        end;
What I'm basically doing is searching through a list of words and looking for each word in the rve using rve.SearchText. If a word is found, I look at it's RVData properties and do some things based on them.

I ran into a problem where SearchText was getting hung up not finding a word that wasn't there, so I added a test (below the line marked "//1" ) to see if the word was even there first. This worked in that case (maybe?).

But after a while, I ran into two situations where SearchText would get hung up searching for a word that DOES exist.

By "hung up" I mean it never returns false. After a few seconds of 100% CPU activity, I put a breakpoint at the line "inc( tot_kwds )" and it invariable shows that tot_kwds is > 10,000, which in the data I'm using is simply impossible -- the input data consists of text files that are 400-800 words in length and there aren't more than a dozen or so instances of any given keyword. Most of the keywords have zero occurrences, and most of the rest have 1 or 2. The word list itself is around 200 or so.

I tried to trace it through and track down the problem, but it seems to be intermittent. So I figured I'd just report the symptoms and ask if there's something I can test within the loop to see if it's finished and I can break out of the loop anyway.

Thanks
David
toolwiz
Posts: 150
Joined: Wed Nov 30, 2005 3:27 am

Post by toolwiz »

Actually, it appears that I can test to see if rve.RVData.ItemNo changes from one iteration to the next. If it doesn't, then I break out of the loop. Is that a good idea?
Sergey Tkachenko
Site Admin
Posts: 17253
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Post by Sergey Tkachenko »

I cannot see why ItemNo may be changed - you code does not make any modifictions in the document.

Try to add the condition

Code: Select all

if tot_kwds>1000 then begin
  // display word and document with the selection
end;
toolwiz
Posts: 150
Joined: Wed Nov 30, 2005 3:27 am

Post by toolwiz »

Ok, upon further reflection, I can see why ItemNo might not change and it would be ok. Isn't there anything I can test to see if the internal code isn't going any further?

Checking for some arbitrary number of tot_kwds gives me a useless number for that value -- it is a meaningful value that's used in a statistical calculation later on. That's why I'm counting them. :)
Sergey Tkachenko
Site Admin
Posts: 17253
Joined: Sat Aug 27, 2005 10:28 am
Contact:

Post by Sergey Tkachenko »

Enormous value of this counter allows to detect that something went wrong.
We need to know on which word and on which state of the document it happened, and you can view it if the counter becomes equal to 1000, for example.

PS: probably the devil is in // . . .
toolwiz
Posts: 150
Joined: Wed Nov 30, 2005 3:27 am

Post by toolwiz »

The first time I saw this problem, it happened when it was searching for a word that was NOT in the text. The later times it happened looking for a word that WAS in the text. In one case the word was about 2/3 of the way through the text; I didn't check how many instances of it there were. In the second I was able to determine that there was exactly one instance of the word and it was about 15 words from the end of the text. The SearchText function just never moved past it. I couldn't tell if it simply started over at the top of the text buffer and kept finding it, or if it didn't advance past it and stop at the end of the text.

Is there a way to easily parse a text into individual words, like a kind of Split function that would break words into individual Items? (In my case, I'm only interested in strings consisting of alphanumerics, ie. common english words.) That way I could simply iterate over the list of items myself.

Any string of alphabetic chars could be tagged with a style, eg, 0, and the others could be tagged with another style, eg., 1. That way when I iterate through the list of items, I could first check the style and only examine the ones that are known to contain only alphabetic characters.
Post Reply