Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Active storage add proxying #34477

Conversation

fleck
Copy link
Contributor

@fleck fleck commented Nov 17, 2018

Summary

Added the option to globally change how active storage files are delivered. Users can now set app.config.active_storage.delivery_method to [*:redirect, :proxy]. There's also the option to override at the model level using has_one_attached :avatar, delivery_method: :proxy syntax.

This is just a rough draft, still could be DRYed up a bit, needs tests, and documentation. Just want to get some feedback before I go to far.

Other Information

If you are updating any of the CHANGELOG files or are asked to update the
CHANGELOG files by reviewers, please add the CHANGELOG entry at the top of the file.

Finally, if your pull request affects documentation or any non-code
changes, guidelines for those changes are available
here

@rails-bot
Copy link

Thanks for the pull request, and welcome! The Rails team is excited to review your changes, and you should hear from @georgeclaghorn (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

This repository is being automatically checked for code quality issues using Code Climate. You can see results for this analysis in the PR status below. Newly introduced issues should be fixed before a Pull Request is considered ready to review.

Please see the contribution instructions for more information.

@zinosama
Copy link
Contributor

zinosama commented Jul 14, 2020

Sorry for commenting on a closed PR (please feel free to lmk if there's a better place for this question) but should we get rid of Set-Cookie header from the response of the proxy controllers @fleck ? Otherwise when loading images from views rendered by rails, the presence of cookies headers will prevent the images from being cached. Is there any good work-around for this? (For context, I'm using Cloudflare in front of my service domain)

Maybe a before_action in the proxy controllers like this will do:

def session_off
  request.session_options[:skip] = true
end

@fleck
Copy link
Contributor Author

fleck commented Jul 14, 2020

@joshling1919 does this PR resolve your issue? #39286

@fleck
Copy link
Contributor Author

fleck commented Jul 18, 2020

@zinosama no worries commenting on a closed PR, it's a great place to document issues/solutions to problems around the PR. To prevent session cookies being set during asset responses add the following to config/environment.rb:

Rails.application.initialize!
# The code should be added after your application is initialized

ActiveStorage::BaseController.class_eval do
  def disable_session
    request.session_options[:skip] = true
  end
end

ActiveStorage::Blobs::ProxyController.class_eval do
  before_action :disable_session
end

ActiveStorage::Representations::ProxyController.class_eval do
  before_action :disable_session
end

@zinosama
Copy link
Contributor

Thank you @fleck for the quick response. I can see that working. I'm also wondering though if that should the default behavior? The purpose of these proxy controllers is to make image caching easy. Yet (plz correct me if I'm wrong) the cookies header makes the asset uncache-able, which defeats this purpose.

@fleck
Copy link
Contributor Author

fleck commented Jul 18, 2020

@zinosama making that the default may make sense, I can't think of a good reason to include the session for assets. But, it's too late for that change to make it into rails 6.1. I missed this during development because the application I was testing on doesn't use session except for a couple routes on the admin portion of the site. As for "the cookies header makes the asset uncache-able" that's Cloudflare specific behavior, a lot of CDNs will cache assets regardless of cookies. Cloudflare can also be configured to cache with a cookie present using a page rule with the "Edge cache TTL" setting.

georgeclaghorn added a commit that referenced this pull request Aug 31, 2020
Prevent XSS where unsafe content is served inline on the application origin.

Follows up on #34477. References 06ab7b2 and d40284b.
@mcanto
Copy link

mcanto commented Oct 16, 2020

Thank you @fleck for this feature.
I need to get this feature working on a Rails 5.2 project. I realized you have a backport for this version, but just can't get it work.

This is my application.rb

  config.active_storage.delivery_method = :proxy
  config.active_storage.proxy_urls_host = "cdn.mydomain.com"

I've also tried with

asset_name.deliver(:proxy)

in views, that is just removing the host but still not calling the cdn, otherwise the current host is being called.

I'd appreciate any insight.

@fleck
Copy link
Contributor Author

fleck commented Oct 16, 2020

@mcanto unfortunately the backport is in a half finished state. At some point during this PR I upgraded my project to use the latest version of rails and stopped updating the back port. Even if you can get the backport to work the API is fairly different from what was merged in this PR. If possible I'd recommend upgrading to the latest rails (it's pretty stable), or trying to re-create the backport based on this API.

@mcanto
Copy link

mcanto commented Oct 18, 2020

Thank you for your answer @fleck I'll take a look into upgrade to the latest version. Definitely its a pretty nice feature, good job!

@mcanto unfortunately the backport is in a half finished state. At some point during this PR I upgraded my project to use the latest version of rails and stopped updating the back port. Even if you can get the backport to work the API is fairly different from what was merged in this PR. If possible I'd recommend upgrading to the latest rails (it's pretty stable), or trying to re-create the backport based on this API.

@phoet
Copy link
Contributor

phoet commented Nov 17, 2020

I was kind of expecting to see this in 6.1.0.RC1 but instead i only find things like rails_storage_proxy_path. Has this approach been discarded?

@fleck
Copy link
Contributor Author

fleck commented Nov 18, 2020

@phoet The API has changed a decent amount from the initial proposal. Here's the up to date API: https://github.com/rails/rails/blob/master/activestorage/README.md#proxying

@zeptonaut
Copy link

For what it's worth, that documentation is pretty sparse. It seems to indicate that there's some way to get the URL for the proxied version of the asset instead of the direct ActiveStorage version, but it doesn't go into detail about how to tell Rails about how to map the raw asset path to the proxied one.

Presumably, if S3 is the primary ActiveStorage backend, then we need some way to tell Rails whether the proxy is Cloudfront or Cloudflare or Cloudinary, etc.

@marckohlbrugge
Copy link
Contributor

@zinosama making that the default may make sense, I can't think of a good reason to include the session for assets. But, it's too late for that change to make it into rails 6.1. I missed this during development because the application I was testing on doesn't use session except for a couple routes on the admin portion of the site. As for "the cookies header makes the asset uncache-able" that's Cloudflare specific behavior, a lot of CDNs will cache assets regardless of cookies. Cloudflare can also be configured to cache with a cookie present using a page rule with the "Edge cache TTL" setting.

Heads up for other people on Cloudflare who cannot get this to work. Apparently you can only have it ignore cookies on the $200/mo Business plan or higher.

The mentioned workaround that disables the session for the relevant controllers might be a better fix if you're on one of the cheaper plans.

@@ -106,6 +106,37 @@ Variation of image attachment:
<%= image_tag user.avatar.variant(resize_to_limit: [100, 100]) %>
```

## File serving strategies
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@@ -954,6 +954,14 @@ text/javascript image/svg+xml application/postscript application/x-shockwave-fla

* `config.active_storage.draw_routes` can be used to toggle Active Storage route generation. The default is `true`.

* `config.active_storage.resolve_model_to_route` can be used to globally change how Active Storage files are delivered.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#40920 should fix it

@vitobotta
Copy link

@fleck Hi, quick question. I have been using the direct block in my routes to set the CDN hostname for the proxy routes, and that works well. I would like to restrict these routes so that they only work if the incoming request actually uses the CDN hostname, and just a 404 or no content if the request uses the app's main domain. Basically I want all the requests for assets to go through the CDN.

Reason: someone has been flooding my app by making lots of requests to the assets but using the app's main domain directly, bypassing the CDN caching. I would like to prevent this and make the relevant routes available only if the CDN hostname is used with some kind of constraint. Any suggestions? Thanks in advance.

@josh-m-sharpe
Copy link
Contributor

@vitobotta seems like a job for something like rack-attack. the redirect and proxy routes just point to a controller inside activestorage (somewhere in here https://github.com/rails/rails/tree/main/activestorage). configure rack attack to check additional headers (like those set by your CDN) and if they dont exist, shut down the request

@collimarco
Copy link

collimarco commented Aug 9, 2021

Has anyone tried the proxy method with pages that have hundreds of images?
(our app has pages like this full of images...)

Currently we use Heroku + Cloudflare for caching... I wonder how many active storage proxy requests per second we can handle with each "2x dyno".

@brenogazzola
Copy link
Contributor

brenogazzola commented Aug 9, 2021

@collimarco I have. Since each image is going to generate an HTTP request, the number of requests that a single 2x dyno can handle will depend on: the number of puma workers, the size of the images, the latency/speed of your storage backend.

If you are using the recommended defaults for 2x dynos, your Puma concurrency is set to 2, which means each dyno can handle 2 requests at the same time. If you are dealing with the average mobile photo and S3 as backend (it has pretty high latency), you will probably see each request consuming between 250-350ms, which means each dyno will handle 6-8 requests per second.

The first two times the page with hundreds with image is opened, all hundreds of requests will hit your servers. If you don't have enough dynos, all your other requests (including user navigation), will have to wait until all hundreds of images have been proxied from S3 to Cloudflare through your dynos. So let's say you have 200 images in that page, and 2 dynos (12-16 images per second). We are talking about other requests waiting 12-16 seconds until they get a response.

The third time, the images will be cached and theoretically, your dynos will not have to handle them anymore. Theoretically because Cloudflare has so many PoPs, that if the page with all the images is not popular enough, your cache hit will be low, and your dynos will continue being forced to stream those images.

I recommend you at least use lazy loading in that page, so that if the user does not scroll, you don't waste your capacity streaming those images.

@collimarco
Copy link

@brenogazzola Thanks for the reply!

That was my fear... 6-8 req / sec / dyno are not enough with pages that may have 300+ images. It would be too expensive (300 / 8 = 37 dynos only for images!).

Maybe it's something better however, because you forgot the threads. It think it's 2 workers * 5 threads the concurrency (most of the time with S3 is spent waiting).

It's strange because a normal browser on a normal PC can download 300+ images in a few seconds and from my research it seems that they only use 6 concurrent connection per domain. It is strange that a server cannot perform similarly 🤔

@brenogazzola
Copy link
Contributor

@collimarco Threads coud help, yes. I tend to ignore them when calculating how much load Puma can handle since I normally have no idea which requests have I/O time and which don't. But in this case I guess you are right.

As for the difference between PC and servers, that's because the browser is downloading from Cloudflare, which does its best to ensure that latency between request and content download is as short as possible. Your dynos on the other hand are dealing with S3, which does not care about latency at all. There might also be something in the AWS gem code, or the APIs it uses that introduces extra latency. It's not really a level playing field.

@silva96
Copy link
Contributor

silva96 commented Apr 5, 2024

@zinosama making that the default may make sense, I can't think of a good reason to include the session for assets. But, it's too late for that change to make it into rails 6.1. I missed this during development because the application I was testing on doesn't use session except for a couple routes on the admin portion of the site. As for "the cookies header makes the asset uncache-able" that's Cloudflare specific behavior, a lot of CDNs will cache assets regardless of cookies. Cloudflare can also be configured to cache with a cookie present using a page rule with the "Edge cache TTL" setting.

Heads up for other people on Cloudflare who cannot get this to work. Apparently you can only have it ignore cookies on the $200/mo Business plan or higher.

The mentioned workaround that disables the session for the relevant controllers might be a better fix if you're on one of the cheaper plans.

Even after Disable session in ActiveStorage blobs and representations proxy controllers #48869 was implemented, or any monkey patch applied, that's only for session cookies.

There could be other cookies beings set, for example if using ahoy, you get another cookie called ahoy_visit

I found a more future proof approach here:

ankane/ahoy#465 (comment)

cc @brunoprietog @marckohlbrugge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet