Page 1 of 1

[Demo] Searching with regular expressions

Posted: Wed Sep 26, 2018 7:04 pm
by Sergey Tkachenko
Searching with regular expressions
RVRegEx.zip
(177.49 KiB) Downloaded 4005 times
This ZIP file contains two demo projects.

1) Search from the cursor position
TRichView-Regular-Expressions.png
TRichView-Regular-Expressions.png (65.25 KiB) Viewed 84891 times
2) Search and highlight all occurrences
TRichView-Regular-Expressions-Ex.jpg
TRichView-Regular-Expressions-Ex.jpg (185.9 KiB) Viewed 84891 times
The both demo projects use the same technique. They save content to a sting that has 1:1 correspondence to the original content. Regular expressions are searched in this string using TRegEx. After finding, they are marked in the original document.

This demo requires Delphi XE or newer (because TRegEx was introduced in this version of Delphi).
The first version of this demo is very fast even for huge documents.
The second version may be slow, because TRegEx.Matches is slow if it returns a large count of matches (in real application, you can use TRegEx.Match and then a cycle of Match.NextMatch, and limit a count of results).
For results highlighting, the second demo uses the same mechanism as for live spelling checking (instead of wavy underlines, it draws semitransparent color rectangles).

Re: [Demo] Searching with regular expressions

Posted: Fri May 29, 2020 6:00 pm
by jgkoehn
Greetings Sergey,
I am working on converting the first Regex into Lazarus into an app. However, as it goes it gets further and further out of sync with the selection. I tried 2,1,0,-1 for the Caretposition and it doesn't seem to matter. Thoughts?

Re: [Demo] Searching with regular expressions

Posted: Sat May 30, 2020 6:29 pm
by jgkoehn
I found I needed to adjust some code as I am using PCRE code that was originally Delphi in Lazarus so it is not fully unicode directly.
I believe this is the problem because as more Unicode is introduced it gets more and more out of align with RVGetLinearCaretPos.
My guess is because of how Unicode can be more in this M.Index in the following snippet of code.

Code: Select all

function TForm3.Select(m: IMatch): UnicodeString;
var
  StartIndex, EndIndex, ItemNo1, Offs1, ItemNo2, Offs2: Integer;
  RVData1, RVData2: TCustomRVData;
begin

  //StartIndex := M.Index - 1;
  StartIndex := M.Index;
  EndIndex := StartIndex + M.Length;

Re: [Demo] Searching with regular expressions

Posted: Sat May 30, 2020 8:24 pm
by jgkoehn
I see if I use RVGetText and FText := GetAllText(RichViewEdit1);
It works better for Lazarus and PCRE but then graphics throw it off hmms. I will have to dig more.
[Edit] Well that only sorta worked. Hmms more.

Re: [Demo] Searching with regular expressions

Posted: Sun May 31, 2020 6:46 am
by Sergey Tkachenko
Do not use the functions from RVGetText/RVGetTextW. Only the functions from RVLinear provide one-to-one correspondence with the document.
What regexp library do you use?

Re: [Demo] Searching with regular expressions

Posted: Sun May 31, 2020 1:46 pm
by jgkoehn
Greetings,
Thanks for the information on the GetText
This is the regex library that is used with some modifications to make it Lazarus ready by another user.
http://renatomancuso.com/software/dpcre/dpcre.htm
Here is the modified:
https://github.com/rubiot/ibiblia/blob/ ... re_dll.pas and also the PCRE.pas in the same location. Also
TRegex.Replace has a comment in it if you use it that needs fixed.
It should not say Result := Input;

Re: [Demo] Searching with regular expressions

Posted: Sat Jun 06, 2020 10:28 pm
by jgkoehn
Any further thoughts on this Sergey?

Re: [Demo] Searching with regular expressions

Posted: Mon Jun 08, 2020 9:39 am
by Sergey Tkachenko
I converted the first demo to Lazarus:
https://www.trichview.com/support/files/RegExLaz.zip

This conversion is one-to-one, with classes/records mapped to interfaces. I hope I understand it correctly (I assumed that IMatch contains one position defined by GetIndex and GetLength; I do not understand the purpose of its Groups property).

As I understand, this library can work with Unicode represented as UTF-8 (which is the default string encoding in Lazarus). So Edit1.Text can be passed as it is, but the result of RVGetTextRange must be converted using UTF8Encode.
rcoUTF8 must be included in the options.

Re: [Demo] Searching with regular expressions

Posted: Mon Jun 08, 2020 2:16 pm
by jgkoehn
Thank you so much I will check this out

Re: [Demo] Searching with regular expressions

Posted: Tue Jun 09, 2020 7:27 pm
by jgkoehn
Could you test your results on the attached test.rvf
I am searching for a and it does fine up until the greek unicode. If it works for you I definitely have something wrong on my end.

Re: [Demo] Searching with regular expressions

Posted: Tue Jun 09, 2020 7:45 pm
by jgkoehn
I emailed the link to the app

Re: [Demo] Searching with regular expressions

Posted: Tue Jun 09, 2020 9:20 pm
by Sergey Tkachenko
Well, there is a bug in this example: Lazarus regex works with character positions in UTF-8, while the demo works with character positions in UTF-16.
It needs functions that will convert positions in UTF-8 to positions in UTF-16 and vice versa.
I'll make them tomorrow.

Re: [Demo] Searching with regular expressions

Posted: Wed Jun 10, 2020 4:28 pm
by Sergey Tkachenko
I uploaded a new version for Lazarus in the same location:
https://www.trichview.com/support/files/RegExLaz.zip

Now it recalculates indexes from UTF-8 to UTF-16 and back when necessary.
Also, this DLL uses 0-based indexes in string while Delphi uses 1-based indexes; this difference was not completely handled in the previous version of this demo.

Re: [Demo] Searching with regular expressions

Posted: Wed Jun 10, 2020 7:00 pm
by jgkoehn
Wow, thanks Sergey, this looks complicated. thanks sir! it works