xypcre: PCRE functions for XYplorer

Discuss and share scripts and script files...
Post Reply
bdeshi
Posts: 4249
Joined: 12 Mar 2014 17:27
Location: Asteroid B-612 / Dhaka
Contact:

xypcre: PCRE functions for XYplorer

Post by bdeshi »

xypcre.xyi: https://github.com/bdeshi/xypcre

This is a collection of user-defined functions for xyscripts that adds PCRE support to XYplorer.

PCRE1 2|3 4 is an advanced standard of Regular Expressions, which allows you to do many advanced search/replace operations not possible in XYplorer's default regexp engine.
These functions act as alternatives to builtin regexmatches() and regereplace(), and allow XYplorer scripts to use a PCRE-compatible RegularExpression engine instead of the limited Visual Basic implementation. This is achieved by offloading regexp operations to a small handler program written, at present, in AutoIt3. See usage notes for details.

:arrow: Function Reference.
:arrow: Instructions.
:arrow: Downloads.
:!: Please read usage instructions carefully before including and/or using the functions.

This may have some complicated or downright ridiculous perks, but I still hope this helps relieve some of our regexp woes. :)
Icon Names | Onyx | Undocumented Commands | xypcre
[ this user is asleep ]

bdeshi
Posts: 4249
Joined: 12 Mar 2014 17:27
Location: Asteroid B-612 / Dhaka
Contact:

functions included in xypcre

Post by bdeshi »

[up-to-date reference is here: https://github.com/bdeshi/xypcre/blob/master/XYPCRE.md]
CORE functions
  • pcrematch()
    Returns regexp pattern match(es) in a given string.
    Syntax: pcrematch(string, pattern, sep='||', index=0, format=2)
        string     string to work on (haystack).
        pattern  the regexp pattern to match (needle).
        sep        separator between returned matches. Must be at least two characters long.
        index     1-based index of one match to return when there are multiple matches. Ineffective if < 1. returns last match if > total count.
        format   format or return data. Can be 0, 1, or 2. Values cannot be combined. See Remarks.
    Returns matching substring(s) in defined format.
  • pcrereplace()
    Replaces regexp pattern match(es) in a given string.
    Syntax: pcrereplace(string, pattern, replace)
        string     string to work on (haystack).
        pattern  the regexp pattern to match (needle).
        replace  The string or pattern to replace match with.
    Returns resulting string after replacement.
  • pcrecapture()
    Returns matches of a capturing group in the regexp pattern
    Syntax: pcrecapture(string, pattern, index=1, sep='||', format=2)
        string     string to work on (haystack).
        pattern  the regexp pattern to match (needle). Should have at least one capturing group.
        sep        separator between returned matches. Must be at least two characters long.
        index     1-based index of the capturing group to return. Returns 1st group if < 1, or last one if > total count. Pass namedgroups by their ordinal index.
        format   format or return data. Can be 0, 1, or 2. Values cannot be combined. See Remarks.
    Returns matching substring(s) of the group in defined format.
  • pcresplit()
    Splits a string at each regexp pattern match, and returns resulting substrings.
    Syntax: pcresplit(string, pattern, sep='||', format=2)
        string     string to work on (haystack).
        pattern  the regexp pattern to split at (needle). Matching text is destroyed while splitting. Use lookahead/lookbehinds to retain portions.
        sep        separator between returned substrings. Must be at least two characters long.
        format   format or return data. Can be 0, 1, or 2. Values cannot be combined. See Remarks.
    Returns split substrings in defined format.
  • pcretoken()
    Returns a substring/match/token in it's original form, from a tokenlist returned by xypcre functions.
    This is equivalent to gettoken() for the special xypcre return data formats.
    Syntax: pcretoken(data, index=1, format=2, sep='||')
        data       The source tokenlist.
        index     1-based idnex of substring to return, or total count of tokens if index value is 'count'.
        format   format of data. Can be 0, 1, or 2. This must be the same format used in data.
        sep        separator used in data. This must be the same sep used in data.
    Returns asked token/substring or total count. The token is return in it's original form (ie, unescaped).
HELPER functions
(CORE functions depend on these and will not work without)
  • xypcrefind():Finds a valid xypcre.exe and returns the path. The utility is downloaded if not found.
  • xypcrewaiter(): Synchronizes communication between xyscript and xypcre. Also handles aborting when xypcre becomes nonresponsive.

Remarks
  • These function do not have a matchcase parameter, but case and a host of other options can be defined in the regexp pattern itself.
  • Most if not all of PCRE syntax is available. See 3 4 for syntax that's sure to be supported. These pages also describe some assumptions or defaults of the syntax.
  • For functions that can return multiple substring, as a tokenlist:
    • sep must be at least two characters long to work around the dilemma of sep characters already existing in the source string.
      It's recommended that sep be a single character, repeated twice. sep is irrelevant if format is set to 2.
    • format decides the format of returned data. Possible values are 0 or 1 or 2.
      • 0: return tokens are separated by sep, and not processed in any way. Not even if sep chars already exist in the strings.
        In this format, a gettoken() call might not be able to retrieve a complete token.
        But this format is fastest when it's known that no sep character exists in the source string.
        For example when the sep is <crlf 2>, and the source string is all in one line.
      • 1: sep characters are escaped with square brackets in each token.
        For example, if sep is '<>', a token 'abc>def' becomes 'abc[>]def'.
        In this format, a gettoken() call is able to retrieve a complete token, but it will have to be unescaped later.
      • 2: return is in this format: 'token1length+token2length|token1token2'
        Eg, if the tokens are 'data', '' and 'info|intel', the return becomes: '4+0+10|datainfo|intel'
        In this format, the sep parameter is irrelevant.
    • Regardless of which format is used or how complicated it may look, the pcretoken() function is able to return one token in it's original format.
    The reason behind all this elaborate escaping and formatted return data is to retrieve complete matches even when the matched text may contain the separator characters.
Icon Names | Onyx | Undocumented Commands | xypcre
[ this user is asleep ]

bdeshi
Posts: 4249
Joined: 12 Mar 2014 17:27
Location: Asteroid B-612 / Dhaka
Contact:

Re: xypcre: PCRE functions for XYplorer

Post by bdeshi »

bugfix update: FIXED: xyplorer's copydata cannot send empty strings, so a call like pcrereplace('[abc]', '[\[\]]', '') wouldn't work. This is fixed in v1.1.0.9.
The hack/fix chosen for now is to simply prefix one character ($op) to every outgoing string and xypcre in turn trims the beginning char it's received strings. Beautifuller fix may come later.

1.1.0.9 is rendered unnecessary due to native bug fix. Rolled back to 1.1.0.8.
Icon Names | Onyx | Undocumented Commands | xypcre
[ this user is asleep ]

admin
Site Admin
Posts: 59751
Joined: 22 May 2004 16:48
Location: Win8.1 @100%, Win10 @100%
Contact:

Re: xypcre: PCRE functions for XYplorer

Post by admin »

SammaySarkar wrote:xyplorer's copydata cannot send empty strings...
You could have told me that. :) Next version it can.

bdeshi
Posts: 4249
Joined: 12 Mar 2014 17:27
Location: Asteroid B-612 / Dhaka
Contact:

Re: xypcre: PCRE functions for XYplorer

Post by bdeshi »

Anyone using this? Committed to a git repository: https://github.com/smsrkr/xypcre
Icon Names | Onyx | Undocumented Commands | xypcre
[ this user is asleep ]

Dustydog
Posts: 321
Joined: 13 Jun 2016 04:19

Re: xypcre: PCRE functions for XYplorer

Post by Dustydog »

Just an FYI - I'd definitely allow the script to download the .exe - it avoids a very annoying warning popup that's a PITA to get rid of manually (I'd suggest you put it in your documents folder and unblock if from the right-click menu if you go that route and set the perm variable - documents folder (or similar) easiest for a single file, imho). And no, I have no idea why letting the script download it avoids the popup - or is somehow safer for that matter! And yes, I like keeping the warning set. I've been surprised once, and that was enough.

Thanks for some great work, Sammy! And yes, I'm using it.

bdeshi
Posts: 4249
Joined: 12 Mar 2014 17:27
Location: Asteroid B-612 / Dhaka
Contact:

Re: xypcre: PCRE functions for XYplorer

Post by bdeshi »

Thanks for the feedback!

The warning you speak of, where does this come from? your antivirus? the untrusted downloaded file warning from windows?

(btw, If you already have a copy of the autoit interpreter installed, you can use the au3 script itself, no difference at all.)
Icon Names | Onyx | Undocumented Commands | xypcre
[ this user is asleep ]

bdeshi
Posts: 4249
Joined: 12 Mar 2014 17:27
Location: Asteroid B-612 / Dhaka
Contact:

Re: xypcre: PCRE functions for XYplorer

Post by bdeshi »

There was a serious bug regarding empty parameters. It's fixed in the new v1.3.1 release.

Downloads: https://github.com/bdeshi/xypcre/releases/tag/v1.3.1

Changelog:
v1.3.0
* fixed bug with 0-length copydata arguments.
* missing binary is now downloaded from this github repo.
* the xypcrefind function detects both binary and source xypcre correctly.
* invalid xypcre now stops script with a failed assertion instead of returning empty string.
* removed minified version. Please remove any local xypcre.min.xyi to avoid script version mismatch.
v1.3.1
* fixed d/l url.
Icon Names | Onyx | Undocumented Commands | xypcre
[ this user is asleep ]

Post Reply