Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add functions to load buffers, streams & URLs #2051

Closed
fb55 opened this issue Aug 17, 2021 · 0 comments · Fixed by #2857
Closed

Add functions to load buffers, streams & URLs #2051

fb55 opened this issue Aug 17, 2021 · 0 comments · Fixed by #2857
Projects

Comments

@fb55
Copy link
Member

fb55 commented Aug 17, 2021

Most users will use Cheerio with documents loaded from the web, which can lead to decoding issues; see #1785. Cheerio should provide a method to load a buffer that properly handles encodings. JSDom uses https://github.com/jsdom/whatwg-encoding to do this. I have started working on a solution at https://github.com/fb55/encoding-sniffer, which will support streams.

One current user-land implementation of this is https://github.com/ktty1220/cheerio-httpcli (in Japanese).

We should add three functions for NodeJS users:

  • load(buffer, options) — sniffs the encoding of the passed buffer and returns the loaded document; overload of the existing load function.
  • stream(cb, options) (see .stream(cb) method #99) — returns a writeable stream that will (1) sniff the encoding, (2) parse the document as chunks arrive, and (3) calls the callback with a loaded Cheerio instance once the stream has ended.
    • It would be nice to have the return value of stream be both a writeable stream, as well as a promise that allows users to await the response.
    • An alternative interface might be stream(readableStream, options), which returns a promise and automatically consumes the readable stream. Note that this is against NodeJS conventions.
  • request(url, options) — fetches the document at url and pipes it into stream. Returns a promise for the loaded document.
    • Not named fetch, to avoid a name collision with the official fetch API.

For me, the big open question is how much of this we can bring to other platforms as well. Eg. Deno users will no doubt have similar requirements.

@fb55 fb55 changed the title Provide method to read buffers with unknown encodings Add methods to load buffers & URLs May 11, 2022
@fb55 fb55 added this to Backlog in v1.0 via automation May 11, 2022
@fb55 fb55 moved this from Backlog to To do in v1.0 May 11, 2022
@fb55 fb55 mentioned this issue May 11, 2022
@fb55 fb55 changed the title Add methods to load buffers & URLs Add methods to load buffers, streams & URLs May 11, 2022
@fb55 fb55 changed the title Add methods to load buffers, streams & URLs Add functions to load buffers, streams & URLs May 11, 2022
v1.0 automation moved this from To do to Done Nov 28, 2022
kodiakhq bot pushed a commit to X-oss-byte/Canary-nextjs that referenced this issue Sep 22, 2023
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [cheerio](https://cheerio.js.org/) ([source](https://togithub.com/cheeriojs/cheerio)) | [`1.0.0-rc.9` -> `1.0.0-rc.12`](https://renovatebot.com/diffs/npm/cheerio/1.0.0-rc.9/1.0.0-rc.12) | [![age](https://developer.mend.io/api/mc/badges/age/npm/cheerio/1.0.0-rc.12?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/cheerio/1.0.0-rc.12?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/cheerio/1.0.0-rc.9/1.0.0-rc.12?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/cheerio/1.0.0-rc.9/1.0.0-rc.12?slim=true)](https://docs.renovatebot.com/merge-confidence/) |

---

### Release Notes

<details>
<summary>cheeriojs/cheerio (cheerio)</summary>

### [`v1.0.0-rc.12`](https://togithub.com/cheeriojs/cheerio/releases/tag/v1.0.0-rc.12)

[Compare Source](https://togithub.com/cheeriojs/cheerio/compare/v1.0.0-rc.11...v1.0.0-rc.12)

Bugfix release. Fixed issues:

-   Align `prop` undefined handling with jQuery by [@&#8203;fb55](https://togithub.com/fb55) in [cheeriojs/cheerio#2557
-   Allow deep imports of `cheerio/lib/utils` by [@&#8203;blixt](https://togithub.com/blixt) in [cheeriojs/cheerio#2601

#### New Contributors

-   [@&#8203;blixt](https://togithub.com/blixt) made their first contribution in [cheeriojs/cheerio#2601

**Full Changelog**: cheeriojs/cheerio@v1.0.0-rc.11...v1.0.0-rc.12

### [`v1.0.0-rc.11`](https://togithub.com/cheeriojs/cheerio/releases/tag/v1.0.0-rc.11)

[Compare Source](https://togithub.com/cheeriojs/cheerio/compare/v1.0.0-rc.10...v1.0.0-rc.11)

`cheerio@1.0.0-rc.11` is hopefully the last RC before the 1.0.0 release of Cheerio. There are two APIs that will be added for the next major release: An `exract` method ([cheeriojs/cheerio#2523) and NodeJS specific loader methods ([cheeriojs/cheerio#2051). These are still in flux and I'd appreciate feedback on the proposals.

A big thank you to everyone that contributed to this release! This includes code contributors, as well as the amazing financial support on [GitHub Sponsors](https://togithub.com/sponsors/cheeriojs)!

Under the hood, a lot of work for this release went into updating parse5, cheerio's default HTML parser. Have a look at [parse5's release notes](https://togithub.com/inikulin/parse5/releases/tag/v7.0.0) to see what has changed there.

#### Breaking

-   Cheerio is now a dual CommonJS and ESM module. That means that deep imports will now fail in newer versions of Node. [cheeriojs/cheerio#2508
-   `script` and `style` contents are added again in `.text()` [cheeriojs/cheerio#2509
    -   To keep the old behavior, switch `.text()` to `.prop('innerText')`
-   The TypeScript types inherited from upstream dependencies have changed. [cheeriojs/cheerio#2503
    -   Node types are now using tagged unions, which will make consumption a bit easier.

#### Features

-   Relevant options are now forwarded to `cheerio-select` [cheeriojs/cheerio#2511
    -   Custom pseudo classes can now be specified [using the `pseudos` option](https://cheerio.js.org/interfaces/CheerioOptions.html#pseudos).
-   For the `.prop()` method:
    -   Add `textContent` and `innerText` props [cheeriojs/cheerio#2214
    -   Users can now specify a `baseURI` option, which will lead to `href` and `src` props to be resolved as URLs. [cheeriojs/cheerio#2510
-   Added a `slim` export, which will always use htmlparser2 [cheeriojs/cheerio#1960

#### Fixes

-   Have `text` turn passed values to strings [cheeriojs/cheerio#2047
-   Include `undefined` in the return type of `get` by [@&#8203;glen-84](https://togithub.com/glen-84) in [cheeriojs/cheerio#2392
-   Recognise comments as HTML [cheeriojs/cheerio#2504
-   Add missing `undefined` return value [cheeriojs/cheerio#2505
-   Export missing static methods [cheeriojs/cheerio#2506
-   Have style parsing add malformed fields to previous field [cheeriojs/cheerio#2521

#### Refactor

-   Use `domutils` module directly [cheeriojs/cheerio#1928
-   Hand-roll `isHTML` [cheeriojs/cheerio#1935
-   Move initialization logic to `load` [cheeriojs/cheerio#1951
-   Only return elements in `closest` [cheeriojs/cheerio#2057
-   Remove unnecessary code, be more explicit [cheeriojs/cheerio#2279
-   Use stricter TS, ESLint configs [cheeriojs/cheerio#2507
-   Update exported values [cheeriojs/cheerio#2512

#### Development Experience

-   Migrate husky to v6 by [@&#8203;DavideViolante](https://togithub.com/DavideViolante) in [cheeriojs/cheerio#1934
-   Update CI by [@&#8203;XhmikosR](https://togithub.com/XhmikosR) in [cheeriojs/cheerio#2149
-   Set permissions for GitHub actions by [@&#8203;neilnaveen](https://togithub.com/neilnaveen) in [cheeriojs/cheerio#2453

#### Docs

-   Update README "is not a web browser" section by [@&#8203;mxschmitt](https://togithub.com/mxschmitt) in [cheeriojs/cheerio#2127

#### New Contributors

-   [@&#8203;DavideViolante](https://togithub.com/DavideViolante) made their first contribution in [cheeriojs/cheerio#1934
-   [@&#8203;mxschmitt](https://togithub.com/mxschmitt) made their first contribution in [cheeriojs/cheerio#2127
-   [@&#8203;glen-84](https://togithub.com/glen-84) made their first contribution in [cheeriojs/cheerio#2392
-   [@&#8203;neilnaveen](https://togithub.com/neilnaveen) made their first contribution in [cheeriojs/cheerio#2453

**Full Changelog**: cheeriojs/cheerio@v1.0.0-rc.10...v1.0.0-rc.11

### [`v1.0.0-rc.10`](https://togithub.com/cheeriojs/cheerio/releases/tag/v1.0.0-rc.10)

[Compare Source](https://togithub.com/cheeriojs/cheerio/compare/v1.0.0-rc.9...v1.0.0-rc.10)

**Fixes:**

-   `.html(node)` now moves passed nodes ([#&#8203;1923](https://togithub.com/cheeriojs/cheerio/issues/1923), fixes [#&#8203;940](https://togithub.com/cheeriojs/cheerio/issues/940))  [`258b26b`](https://togithub.com/cheeriojs/cheerio/commit/258b26b)
-   Boolean attributes are no longer special in xmlMode ([#&#8203;1903](https://togithub.com/cheeriojs/cheerio/issues/1903), fixes [#&#8203;1805](https://togithub.com/cheeriojs/cheerio/issues/1805))  [`b393e4a`](https://togithub.com/cheeriojs/cheerio/commit/b393e4a)
-   Rename parser adapter files ([#&#8203;1873](https://togithub.com/cheeriojs/cheerio/issues/1873), fixes [#&#8203;1847](https://togithub.com/cheeriojs/cheerio/issues/1847))  [`8f55dd8`](https://togithub.com/cheeriojs/cheerio/commit/8f55dd8)
-   Make `filter` work on all collections ([#&#8203;1870](https://togithub.com/cheeriojs/cheerio/issues/1870), fixes [#&#8203;1867](https://togithub.com/cheeriojs/cheerio/issues/1867))  [`fb8d31e`](https://togithub.com/cheeriojs/cheerio/commit/fb8d31e)
-   Bump cheerio-select ([#&#8203;1922](https://togithub.com/cheeriojs/cheerio/issues/1922), fixes https://www.npmjs.com/advisories/1754)  [`5cd2b9c`](https://togithub.com/cheeriojs/cheerio/commit/5cd2b9c)

**Documentation:**

-   Document how to define TS types for Plug-Ins ([#&#8203;1915](https://togithub.com/cheeriojs/cheerio/issues/1915), fixes [#&#8203;1778](https://togithub.com/cheeriojs/cheerio/issues/1778))  [`880fd2c`](https://togithub.com/cheeriojs/cheerio/commit/880fd2c)
-   Remove obsolete Testing section  [`e0c7cbb`](https://togithub.com/cheeriojs/cheerio/commit/e0c7cbb)
-   Remove now-invalid `require`  [`5dfbd35`](https://togithub.com/cheeriojs/cheerio/commit/5dfbd35)

**Refactors:**

-   Wrap shared behavior in `traversing` ([#&#8203;1909](https://togithub.com/cheeriojs/cheerio/issues/1909))  [`58e090a`](https://togithub.com/cheeriojs/cheerio/commit/58e090a)
-   Move `is` to `traversing`, optimize ([#&#8203;1908](https://togithub.com/cheeriojs/cheerio/issues/1908))  [`1c6fa3e`](https://togithub.com/cheeriojs/cheerio/commit/1c6fa3e)
-   Change order of arguments of internal `domEach` ([#&#8203;1892](https://togithub.com/cheeriojs/cheerio/issues/1892))  [`feda230`](https://togithub.com/cheeriojs/cheerio/commit/feda230)
-   Have `load` export a function ([#&#8203;1869](https://togithub.com/cheeriojs/cheerio/issues/1869))  [`c370f4e`](https://togithub.com/cheeriojs/cheerio/commit/c370f4e)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/sammyfilly/Canary-nextjs).
kodiakhq bot pushed a commit to X-oss-byte/Nextjs that referenced this issue Sep 25, 2023
[![Mend Renovate](https://app.renovatebot.com/images/banner.svg)](https://renovatebot.com)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
| [cheerio](https://cheerio.js.org/) ([source](https://togithub.com/cheeriojs/cheerio)) | [`1.0.0-rc.9` -> `1.0.0-rc.12`](https://renovatebot.com/diffs/npm/cheerio/1.0.0-rc.9/1.0.0-rc.12) | [![age](https://developer.mend.io/api/mc/badges/age/npm/cheerio/1.0.0-rc.12?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![adoption](https://developer.mend.io/api/mc/badges/adoption/npm/cheerio/1.0.0-rc.12?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![passing](https://developer.mend.io/api/mc/badges/compatibility/npm/cheerio/1.0.0-rc.9/1.0.0-rc.12?slim=true)](https://docs.renovatebot.com/merge-confidence/) | [![confidence](https://developer.mend.io/api/mc/badges/confidence/npm/cheerio/1.0.0-rc.9/1.0.0-rc.12?slim=true)](https://docs.renovatebot.com/merge-confidence/) |

---

### Release Notes

<details>
<summary>cheeriojs/cheerio (cheerio)</summary>

### [`v1.0.0-rc.12`](https://togithub.com/cheeriojs/cheerio/releases/tag/v1.0.0-rc.12)

[Compare Source](https://togithub.com/cheeriojs/cheerio/compare/v1.0.0-rc.11...v1.0.0-rc.12)

Bugfix release. Fixed issues:

-   Align `prop` undefined handling with jQuery by [@&#8203;fb55](https://togithub.com/fb55) in [cheeriojs/cheerio#2557
-   Allow deep imports of `cheerio/lib/utils` by [@&#8203;blixt](https://togithub.com/blixt) in [cheeriojs/cheerio#2601

#### New Contributors

-   [@&#8203;blixt](https://togithub.com/blixt) made their first contribution in [cheeriojs/cheerio#2601

**Full Changelog**: cheeriojs/cheerio@v1.0.0-rc.11...v1.0.0-rc.12

### [`v1.0.0-rc.11`](https://togithub.com/cheeriojs/cheerio/releases/tag/v1.0.0-rc.11)

[Compare Source](https://togithub.com/cheeriojs/cheerio/compare/v1.0.0-rc.10...v1.0.0-rc.11)

`cheerio@1.0.0-rc.11` is hopefully the last RC before the 1.0.0 release of Cheerio. There are two APIs that will be added for the next major release: An `exract` method ([cheeriojs/cheerio#2523) and NodeJS specific loader methods ([cheeriojs/cheerio#2051). These are still in flux and I'd appreciate feedback on the proposals.

A big thank you to everyone that contributed to this release! This includes code contributors, as well as the amazing financial support on [GitHub Sponsors](https://togithub.com/sponsors/cheeriojs)!

Under the hood, a lot of work for this release went into updating parse5, cheerio's default HTML parser. Have a look at [parse5's release notes](https://togithub.com/inikulin/parse5/releases/tag/v7.0.0) to see what has changed there.

#### Breaking

-   Cheerio is now a dual CommonJS and ESM module. That means that deep imports will now fail in newer versions of Node. [cheeriojs/cheerio#2508
-   `script` and `style` contents are added again in `.text()` [cheeriojs/cheerio#2509
    -   To keep the old behavior, switch `.text()` to `.prop('innerText')`
-   The TypeScript types inherited from upstream dependencies have changed. [cheeriojs/cheerio#2503
    -   Node types are now using tagged unions, which will make consumption a bit easier.

#### Features

-   Relevant options are now forwarded to `cheerio-select` [cheeriojs/cheerio#2511
    -   Custom pseudo classes can now be specified [using the `pseudos` option](https://cheerio.js.org/interfaces/CheerioOptions.html#pseudos).
-   For the `.prop()` method:
    -   Add `textContent` and `innerText` props [cheeriojs/cheerio#2214
    -   Users can now specify a `baseURI` option, which will lead to `href` and `src` props to be resolved as URLs. [cheeriojs/cheerio#2510
-   Added a `slim` export, which will always use htmlparser2 [cheeriojs/cheerio#1960

#### Fixes

-   Have `text` turn passed values to strings [cheeriojs/cheerio#2047
-   Include `undefined` in the return type of `get` by [@&#8203;glen-84](https://togithub.com/glen-84) in [cheeriojs/cheerio#2392
-   Recognise comments as HTML [cheeriojs/cheerio#2504
-   Add missing `undefined` return value [cheeriojs/cheerio#2505
-   Export missing static methods [cheeriojs/cheerio#2506
-   Have style parsing add malformed fields to previous field [cheeriojs/cheerio#2521

#### Refactor

-   Use `domutils` module directly [cheeriojs/cheerio#1928
-   Hand-roll `isHTML` [cheeriojs/cheerio#1935
-   Move initialization logic to `load` [cheeriojs/cheerio#1951
-   Only return elements in `closest` [cheeriojs/cheerio#2057
-   Remove unnecessary code, be more explicit [cheeriojs/cheerio#2279
-   Use stricter TS, ESLint configs [cheeriojs/cheerio#2507
-   Update exported values [cheeriojs/cheerio#2512

#### Development Experience

-   Migrate husky to v6 by [@&#8203;DavideViolante](https://togithub.com/DavideViolante) in [cheeriojs/cheerio#1934
-   Update CI by [@&#8203;XhmikosR](https://togithub.com/XhmikosR) in [cheeriojs/cheerio#2149
-   Set permissions for GitHub actions by [@&#8203;neilnaveen](https://togithub.com/neilnaveen) in [cheeriojs/cheerio#2453

#### Docs

-   Update README "is not a web browser" section by [@&#8203;mxschmitt](https://togithub.com/mxschmitt) in [cheeriojs/cheerio#2127

#### New Contributors

-   [@&#8203;DavideViolante](https://togithub.com/DavideViolante) made their first contribution in [cheeriojs/cheerio#1934
-   [@&#8203;mxschmitt](https://togithub.com/mxschmitt) made their first contribution in [cheeriojs/cheerio#2127
-   [@&#8203;glen-84](https://togithub.com/glen-84) made their first contribution in [cheeriojs/cheerio#2392
-   [@&#8203;neilnaveen](https://togithub.com/neilnaveen) made their first contribution in [cheeriojs/cheerio#2453

**Full Changelog**: cheeriojs/cheerio@v1.0.0-rc.10...v1.0.0-rc.11

### [`v1.0.0-rc.10`](https://togithub.com/cheeriojs/cheerio/releases/tag/v1.0.0-rc.10)

[Compare Source](https://togithub.com/cheeriojs/cheerio/compare/v1.0.0-rc.9...v1.0.0-rc.10)

**Fixes:**

-   `.html(node)` now moves passed nodes ([#&#8203;1923](https://togithub.com/cheeriojs/cheerio/issues/1923), fixes [#&#8203;940](https://togithub.com/cheeriojs/cheerio/issues/940))  [`258b26b`](https://togithub.com/cheeriojs/cheerio/commit/258b26b)
-   Boolean attributes are no longer special in xmlMode ([#&#8203;1903](https://togithub.com/cheeriojs/cheerio/issues/1903), fixes [#&#8203;1805](https://togithub.com/cheeriojs/cheerio/issues/1805))  [`b393e4a`](https://togithub.com/cheeriojs/cheerio/commit/b393e4a)
-   Rename parser adapter files ([#&#8203;1873](https://togithub.com/cheeriojs/cheerio/issues/1873), fixes [#&#8203;1847](https://togithub.com/cheeriojs/cheerio/issues/1847))  [`8f55dd8`](https://togithub.com/cheeriojs/cheerio/commit/8f55dd8)
-   Make `filter` work on all collections ([#&#8203;1870](https://togithub.com/cheeriojs/cheerio/issues/1870), fixes [#&#8203;1867](https://togithub.com/cheeriojs/cheerio/issues/1867))  [`fb8d31e`](https://togithub.com/cheeriojs/cheerio/commit/fb8d31e)
-   Bump cheerio-select ([#&#8203;1922](https://togithub.com/cheeriojs/cheerio/issues/1922), fixes https://www.npmjs.com/advisories/1754)  [`5cd2b9c`](https://togithub.com/cheeriojs/cheerio/commit/5cd2b9c)

**Documentation:**

-   Document how to define TS types for Plug-Ins ([#&#8203;1915](https://togithub.com/cheeriojs/cheerio/issues/1915), fixes [#&#8203;1778](https://togithub.com/cheeriojs/cheerio/issues/1778))  [`880fd2c`](https://togithub.com/cheeriojs/cheerio/commit/880fd2c)
-   Remove obsolete Testing section  [`e0c7cbb`](https://togithub.com/cheeriojs/cheerio/commit/e0c7cbb)
-   Remove now-invalid `require`  [`5dfbd35`](https://togithub.com/cheeriojs/cheerio/commit/5dfbd35)

**Refactors:**

-   Wrap shared behavior in `traversing` ([#&#8203;1909](https://togithub.com/cheeriojs/cheerio/issues/1909))  [`58e090a`](https://togithub.com/cheeriojs/cheerio/commit/58e090a)
-   Move `is` to `traversing`, optimize ([#&#8203;1908](https://togithub.com/cheeriojs/cheerio/issues/1908))  [`1c6fa3e`](https://togithub.com/cheeriojs/cheerio/commit/1c6fa3e)
-   Change order of arguments of internal `domEach` ([#&#8203;1892](https://togithub.com/cheeriojs/cheerio/issues/1892))  [`feda230`](https://togithub.com/cheeriojs/cheerio/commit/feda230)
-   Have `load` export a function ([#&#8203;1869](https://togithub.com/cheeriojs/cheerio/issues/1869))  [`c370f4e`](https://togithub.com/cheeriojs/cheerio/commit/c370f4e)

</details>

---

### Configuration

📅 **Schedule**: Branch creation - At any time (no schedule defined), Automerge - At any time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update again.

---

 - [ ] If you want to rebase/retry this PR, check this box

---

This PR has been generated by [Mend Renovate](https://www.mend.io/free-developer-tools/renovate/). View repository job log [here](https://developer.mend.io/github/X-oss-byte/Nextjs).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
v1.0
  
Done
Development

Successfully merging a pull request may close this issue.

1 participant