Skip to content

Commit f8abf52

Browse files
committedAug 15, 2023
feat: add option to not use re2 (closes #28), added github ci workflow
1 parent 283c502 commit f8abf52

File tree

3 files changed

+65
-27
lines changed

3 files changed

+65
-27
lines changed
 

‎.github/workflows/ci.yml

+26
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,26 @@
1+
name: CI
2+
on:
3+
- push
4+
- pull_request
5+
jobs:
6+
build:
7+
runs-on: ${{ matrix.os }}
8+
strategy:
9+
matrix:
10+
os:
11+
- ubuntu-latest
12+
node_version:
13+
- 14
14+
- 16
15+
- 18
16+
name: Node ${{ matrix.node_version }} on ${{ matrix.os }}
17+
steps:
18+
- uses: actions/checkout@v3
19+
- name: Setup node
20+
uses: actions/setup-node@v3
21+
with:
22+
node-version: ${{ matrix.node_version }}
23+
- name: Install dependencies
24+
run: npm install
25+
- name: Run tests
26+
run: npm run test

‎README.md

+4-11
Original file line numberDiff line numberDiff line change
@@ -37,18 +37,12 @@ This package should hopefully more closely resemble real-world intended usage of
3737

3838
## Install
3939

40-
**NOTE:** As of v3.0.0 you must also install `re2` as a peer dependency.
40+
**NOTE:** The default behavior of this package will attempt to load [re2](https://github.com/uhop/node-re2) (it is an optional peer dependency used to prevent regular expression denial of service attacks and more). If you wish to use this behavior, you must have `re2` installed via `npm install re2` – otherwise it will fallback to using normal `RegExp` instances. As of v3.0.1 we added an option if you wish to force this package to not even attempt to load `re2` (e.g. it's in your `node_modules` [but you don't want to use it](https://github.com/spamscanner/url-regex-safe/issues/28)) – simply pass `re2: false` as an option.
4141

4242
[npm][]:
4343

4444
```sh
45-
npm install url-regex-safe re2
46-
```
47-
48-
[yarn][]:
49-
50-
```sh
51-
yarn add url-regex-safe re2
45+
npm install url-regex-safe
5246
```
5347

5448

@@ -112,6 +106,7 @@ npm install --save-dev @types/url-regex-safe
112106

113107
| Property | Type | Default Value | Description | |
114108
| ---------------- | ------- | ------------------------------------------------------------ | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | - |
109+
| `re2` | Boolean | `true` | Attempt to load `re2` to use instead of `RegExp` for creating new regular expression instances. If you pass `re2: false`, then `re2` will not even be attempted to be loaded. | |
115110
| `exact` | Boolean | `false` | Only match an exact String. Useful with `regex.test(str)` to check if a String is a URL. We set this to `false` by default in order to match String values such as `github.com` (as opposed to requiring a protocol or `www` subdomain). We feel this closely more resembles real-world intended usage of this package. | |
116111
| `strict` | Boolean | `false` | Force URL's to start with a valid protocol or `www` if set to `true`. If `true`, then it will allow any TLD as long as it is a minimum of 2 valid characters. If it is `false`, then it will match the TLD against the list of valid TLD's using [tlds](https://github.com/stephenmathieson/node-tlds#readme). | |
117112
| `auth` | Boolean | `false` | Match against Basic Authentication headers. We set this to `false` by default since [it was deprecated in Chromium](https://bugs.chromium.org/p/chromium/issues/detail?id=82250#c7), and otherwise it leaves the user with unwanted URL matches (more closely resembles real-world intended usage of this package by having it set to `false` by default too). | |
@@ -140,7 +135,7 @@ Unlike the deprecated and unmaintained package [url-regex][], we do a few things
140135

141136
## Limitations
142137

143-
Since we cannot use regular expression's "negative lookbehinds" functionality (due to [RE2][] limitations), we could not merge the logic from this [pull request](https://github.com/kevva/url-regex/pull/67/commits/6c31d81c35c3bb72c413c6e4af92a37b2689ead2). This would have allowed us to make it so `example.jpeg` would match only if it was `example.jp`, however if you pass `example.jpeg` right now it will extract `example.jp` from it (since `.jp` is a TLD). An alternative solution may exist, and we welcome community contributions regarding this issue.
138+
**This limitation only applies if you are using `re2`**: Since we cannot use regular expression's "negative lookbehinds" functionality (due to [RE2][] limitations), we could not merge the logic from this [pull request](https://github.com/kevva/url-regex/pull/67/commits/6c31d81c35c3bb72c413c6e4af92a37b2689ead2). This would have allowed us to make it so `example.jpeg` would match only if it was `example.jp`, however if you pass `example.jpeg` right now it will extract `example.jp` from it (since `.jp` is a TLD). An alternative solution may exist, and we welcome community contributions regarding this issue.
144139

145140

146141
## Contributors
@@ -161,8 +156,6 @@ Since we cannot use regular expression's "negative lookbehinds" functionality (d
161156

162157
[npm]: https://www.npmjs.com/
163158

164-
[yarn]: https://yarnpkg.com/
165-
166159
[cve]: https://nvd.nist.gov/vuln/detail/CVE-2020-7661
167160

168161
[re2]: https://github.com/uhop/node-re2

‎src/index.js

+35-16
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,25 @@
11
const ipRegex = require('ip-regex');
22
const tlds = require('tlds');
33

4-
/* istanbul ignore next */
5-
const SafeRegExp = (() => {
6-
try {
7-
const RE2 = require('re2');
8-
return typeof RE2 === 'function' ? RE2 : RegExp;
9-
} catch {
10-
return RegExp;
11-
}
12-
})();
134
const ipv4 = ipRegex.v4().source;
145
const ipv6 = ipRegex.v6().source;
6+
const host = '(?:(?:[a-z\\u00a1-\\uffff0-9][-_]*)*[a-z\\u00a1-\\uffff0-9]+)';
7+
const domain = '(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*';
8+
const strictTld = '(?:[a-z\\u00a1-\\uffff]{2,})';
9+
const defaultTlds = `(?:${tlds.sort((a, b) => b.length - a.length).join('|')})`;
10+
const port = '(?::\\d{2,5})?';
11+
12+
let RE2;
13+
let hasRE2;
1514

1615
module.exports = (options) => {
1716
options = {
17+
//
18+
// attempt to use re2, if set to false will use RegExp
19+
// (we did this approach because we don't want to load in-memory re2 if users don't want it)
20+
// <https://github.com/spamscanner/url-regex-safe/issues/28>
21+
//
22+
re2: true,
1823
exact: false,
1924
strict: false,
2025
auth: false,
@@ -24,27 +29,41 @@ module.exports = (options) => {
2429
trailingPeriod: false,
2530
ipv4: true,
2631
ipv6: true,
27-
tlds,
2832
returnString: false,
2933
...options
3034
};
3135

36+
/* istanbul ignore next */
37+
const SafeRegExp =
38+
options.re2 && hasRE2 !== false
39+
? (() => {
40+
if (typeof RE2 === 'function') return RE2;
41+
try {
42+
RE2 = require('re2');
43+
return typeof RE2 === 'function' ? RE2 : RegExp;
44+
} catch {
45+
hasRE2 = false;
46+
return RegExp;
47+
}
48+
})()
49+
: RegExp;
50+
3251
const protocol = `(?:(?:[a-z]+:)?//)${options.strict ? '' : '?'}`;
52+
3353
// Add option to disable matching urls with HTTP Basic Authentication
3454
// <https://github.com/kevva/url-regex/pull/63>
3555
const auth = options.auth ? '(?:\\S+(?::\\S*)?@)?' : '';
36-
const host = '(?:(?:[a-z\\u00a1-\\uffff0-9][-_]*)*[a-z\\u00a1-\\uffff0-9]+)';
37-
const domain =
38-
'(?:\\.(?:[a-z\\u00a1-\\uffff0-9]-*)*[a-z\\u00a1-\\uffff0-9]+)*';
56+
3957
// Add ability to pass custom list of tlds
4058
// <https://github.com/kevva/url-regex/pull/66>
4159
const tld = `(?:\\.${
4260
options.strict
43-
? '(?:[a-z\\u00a1-\\uffff]{2,})'
44-
: `(?:${options.tlds.sort((a, b) => b.length - a.length).join('|')})`
61+
? strictTld
62+
: options.tlds
63+
? `(?:${options.tlds.sort((a, b) => b.length - a.length).join('|')})`
64+
: defaultTlds
4565
})${options.trailingPeriod ? '\\.?' : ''}`;
4666

47-
const port = '(?::\\d{2,5})?';
4867
let disallowedChars = '\\s"';
4968
if (!options.parens) {
5069
// Not accept closing parenthesis

0 commit comments

Comments
 (0)
Please sign in to comment.