community: add user agent for web scraping loaders #22480

emilienchvt · 2024-06-04T16:04:39Z

Description: This PR adds a USER_AGENT env variable that is to be used for web scraping. It creates a util to get that user agent and uses it in the classes used for scraping in this piece of doc. Identifying your scraper is considered a good politeness practice, this PR aims at easing it.
Issue: None
Dependencies: None
Twitter handle: None

vercel · 2024-06-04T16:04:47Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name	Status	Preview	Comments	Updated (UTC)
langchain	✅ Ready (Inspect)	Visit Preview	💬 Add feedback	Jun 5, 2024 9:01am

**Description:** This PR adds a `USER_AGENT` env variable that is to be used for web scraping. It creates a util to get that user agent and uses it in the classes used for scraping in [this piece of doc](https://python.langchain.com/v0.1/docs/use_cases/web_scraping/). Identifying your scraper is considered a good politeness practice, this PR aims at easing it. **Issue:** `None` **Dependencies:** `None` **Twitter handle:** `None`

emilienchvt added 2 commits June 4, 2024 17:35

Add user agent for web scraping loaders

b137343

lint

0db9893

dosubot bot added the size:M label Jun 4, 2024

dosubot bot added Ɑ: doc loader 🤖:improvement labels Jun 4, 2024

vercel bot deployed to Preview June 4, 2024 16:20 View deployment

eyurtsev self-assigned this Jun 4, 2024

typing

Loading
Loading status checks…

72c2775

emilienchvt force-pushed the add-user-agents branch from 71fa5f0 to 72c2775 Compare June 5, 2024 08:38

Merge branch 'master' into add-user-agents

Loading
Loading status checks…

bbd3f49

vercel bot deployed to Preview June 5, 2024 09:01 View deployment

eyurtsev self-requested a review June 5, 2024 13:32

eyurtsev enabled auto-merge (squash) June 5, 2024 13:33

eyurtsev approved these changes Jun 5, 2024

View reviewed changes

dosubot bot added the lgtm label Jun 5, 2024

eyurtsev disabled auto-merge June 5, 2024 15:14

eyurtsev enabled auto-merge (squash) June 5, 2024 15:14

eyurtsev merged commit c3d4126 into langchain-ai:master Jun 5, 2024
44 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

community: add user agent for web scraping loaders #22480

community: add user agent for web scraping loaders #22480

emilienchvt commented Jun 4, 2024

vercel bot commented Jun 4, 2024 •

edited

Loading

community: add user agent for web scraping loaders #22480

community: add user agent for web scraping loaders #22480

Conversation

emilienchvt commented Jun 4, 2024

vercel bot commented Jun 4, 2024 • edited Loading

vercel bot commented Jun 4, 2024 •

edited

Loading