Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create a script to remove dead, deprecated or invalid links from planet #233

Open
pydanny opened this issue Jun 12, 2017 · 13 comments
Open

Comments

@pydanny
Copy link
Contributor

pydanny commented Jun 12, 2017

  • pysoy hasn't been updated in about 5 years
  • Cherrypy link is dead
  • Turbo gears hasn't been updated for over a year and the project is marginal
  • These could or should be replaced with links to the pyramid, flask, pandas, or jupyter feeds
@pydanny pydanny changed the title Remove dead or deprecated links for sidebar Remove dead or deprecated links from sidebar Jun 12, 2017
@rochacbruno
Copy link
Member

I suggested to create a validator script to run monthly #124 (comment)

That script would take each URL check its status and validate it in RSS validator.

@rochacbruno rochacbruno changed the title Remove dead or deprecated links from sidebar Create a script to remove dead, deprecated or invalid links from planet Jun 13, 2017
@rochacbruno
Copy link
Member

@pydanny I changed the title of your issue to address the automation of this task.

@rochacbruno
Copy link
Member

rochacbruno commented Jun 13, 2017

Script should do

  • Run weekly or monthly
  • Iterate all urls on config.ini and also python libraries and python planets list
  • Call each URL and assert (following redirects) -
    Check return code is 200
    Check data of last update is newer than one year
    Check feed is valid using feedvalidator/podcastvalidator/othervalidator
    If not valid, store the URL in a simple database (text file, sqlite or something)
  • ACTION: If link returns bad status\invalid 3 times, remove from planet
  • ACTION: If feed is outdated for more than one year, keep in planet but remove from sidebar
  • ACTION: If feed is updated and not in sidebar, put it there again

@pydanny
Copy link
Contributor Author

pydanny commented Jun 13, 2017

Pysoy and the others I mentioned aren't in the general feeds. They are in the "python libraries" and "python planets" lists. Will these be covered by your script or is that just for RSS feeds?

As for your script, go ahead and remove my blog from your list already. I say that because it's going to fail your W3C validator check and I don't have the luxury of time to fix it. Do keep in mind that this aggregator is the ONLY one I know of that insists on W3C validation.

@rochacbruno
Copy link
Member

@pydanny yeah we must discuss if using w3c validator is a good choice or not, we only need to check if the feed doesn't break the planet as it happened in the past. Maybe we can write our own simple validator so we do not need to rely on w3c one.

@tjguk
Copy link
Member

tjguk commented Jun 13, 2017 via email

@pydanny
Copy link
Contributor Author

pydanny commented Jun 13, 2017

Yes, it's too strict. No other aggregator I'm on blocks my RSS feed.

@pybites
Copy link
Contributor

pybites commented Jul 19, 2017

The cleanup and validation could be nice PyBites challenges -https://pybit.es/pages/challenges.html

What do you think?

@pybites
Copy link
Contributor

pybites commented Apr 2, 2018

We asked our community to give this a crack https://pybit.es/codechallenge49.html

@mridubhatnagar
Copy link

mridubhatnagar commented Jul 2, 2018

Hi

Can I be assigned this issue to work on. I don't really know the complexity of the issue. But, I am curious to work on the same.

Based on the above discussion what I could understand is

  • Parse through each url in config.ini
  • Check the response status
  • Validate the RSS feed (use existing service/Create a RSS feed validator)

Links which are no more working should be removed.

I am currently tied up with some other stuff. But, after a month shall start working on it.

Can I please be assigned this. It looks interesting.

Thanks

@mridubhatnagar
Copy link

Also, are we validating a RSS feed based on the W3 validator?
Or we consider a RSS feed to be valid if the feedparser is giving us no error?
Or we are relying on none, And instead create one of our own and validate?

@rochacbruno
Copy link
Member

@mridubhatnagar the only problem with W3 is that it considers podcast feeds invalid, so we need to have a podcast feed validator.

@mridubhatnagar
Copy link

@rochacbruno Fair enough. I would like to give it a shot.

Also, I think some RSS feeds are valid. But, still not showing up on planet. Like @pybites live feed.

I am not sure though if feedparser works for podcast feeds or not.

I will code podcast feed validator.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants