Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Race on echo.maxParams, unable to use some context methods safely concurrently #1705

Closed
iaburton opened this issue Dec 1, 2020 · 24 comments
Closed
Labels

Comments

@iaburton
Copy link

iaburton commented Dec 1, 2020

Issue Description

Hi there!

I've been using echo for a relatively large api service, upgrading from 3.x to the latest 4.x recently and fixing some issues along the way.

While the initial issue I ran into may be related to #1678 (some groups or routes randomly "forget" their handler and start returning 404's) I discovered some other issues in echo.

First I'd like to thank the maintainers for the project, I don't mind discussing this issue or attempting to contribute a fix.

Anyways, I noticed here a significant race condition introduced via #1535 .

echo.maxParam cannot be changed safely, even with atomics (which would fix the data race), as the algorithm itself is racy.

Overwriting that value via the context I realized was causing panics here for me (not sure why that isn't just using the len of pvalues to begin with).

Further, manipulation of the context/echo types seems to lead to issues with the Find algorithm in the Router which is already hard to follow, and may be leading to my initial issue above.

Perhaps unrelated, is there a particular usage for the context type and methods I'm missing? It's defined as an interface but cannot be (re)implemented due to unsafe assertions such as this. I was hoping to override/shadow some methods for debugging purposes but cannot.

Thanks again, as for the initial issue I mentioned above regarding 404's I'd like to discuss/tackle after this one.

Checklist

  • [ x] Dependencies installed
    • Latest Go
  • [ x] No typos
  • [x ] Searched existing issues and docs

Expected behaviour

Safely use context methods

Actual behaviour

Panic from incorrect len / race.

Steps to reproduce

Noted above.

Version/commit

Latest echo release, v4.1.17. Note I haven't tried with master, I noticed ~50 commits to master since the last release, I briefly looked at them but didn't see anything that would fix this.

@pafuent
Copy link
Contributor

pafuent commented Dec 4, 2020

Hi @iaburton. Yes you are right, the use of maxParam is not thread safe, and changing it from a context could lead to have Context instances on the sync.Pool that don't have the proper length for pvalues (panics when maxParam is changed to a bigger value, the other way around it could lead to a memory leak)
I need to continue checking the Router code to see which is the best way to tackle this, but as you mention doesn't make any sense to use maxParam instead of len(c.pvalues) on Context#Reset

I'll continue trying to dig into the Git history to find out what is the intended purpose of Context#SetParamNames, because at a first glance it looks like as a "testing helper". If that is true, that could be the reason of why Echo's Context/Router assumes that maxParams shouldn't be protected (in production maxParam always is set by the Router and don't change). Please don't interpret this as me saying that maxParams shouldn't be protected, I'm just trying to discover what is the design behind the code, to then try to propose the best solution (not breaking change/keep performance)

@iaburton
Copy link
Author

iaburton commented Dec 4, 2020

Hey @pafuent , nice to get some confirmation on what I was observing.

I noticed after posting that the original issue was related to #1492 that effected several people. I have a workaround right now that might be of help in terms of a partial solution. Note this workaround below assumes you've pinned a certain echo version where the context is described below, and compiling for a 64bit system.

		//inside custom setParamNamesUnsafe function
		eptr := unsafe.Pointer(reflect.ValueOf(ectx).Pointer())

		/*
			type context struct {
				request  *http.Request	// 64bit pointer
				response *Response	// 64bit pointer
				path     string		// size of string header
				pnames   []string	// value we want to change
				pvalues  []string
				query    url.Values
				handler  HandlerFunc
				store    Map
				echo     *Echo
				logger   Logger
				lock     sync.RWMutex
			}

			Use unsafe pointer arithmetic to access and change pnames.
			stringSize := reflect.TypeOf(reflect.StringHeader{}).Size()
		*/
		pnames := (*[]string)(unsafe.Pointer(uintptr(eptr) + 8 + 8 + stringSize))

		copy(*pnames, vals)

Basically I use unsafe to set pnames myself using copy rather than a directly set (like currently) or append like how it was originally. This makes sure the values always stay within len and avoid the race since I'm not using the Context method.
Perhaps removing the *c.echo.maxParam from SetParamNames would be good enough if we also change the code to use copy? I haven't been running this code long enough to say if it fixes everything, but it's a start. It has stopped the extra panics.

I'm still not sure what is causing the 404's I mentioned prior though, I'm still doing more testing as I cannot easily replicate the issue (leading me to believe it is a concurrency issue). It's like certain groups or routes lose their ability to parse out params (or rather params look like part of a path so it becomes 404), which is what led me to this code in the first place, a hot fix for echo that lets me set the name and params on a context before going into the final handler.

Btw putting a mutex or using atomics on maxParam will solve the data race but not the inherit race condition with this design. Let me know if I can help.

@lammel lammel added the bug label Dec 7, 2020
@lammel
Copy link
Contributor

lammel commented Dec 7, 2020

We need to look at the original concept behind maxParam. I'd like to avoid introducing unsafe pointer arithmetics for working around a conceptional issue.

@iaburton
Copy link
Author

iaburton commented Dec 7, 2020

We need to look at the original concept behind maxParam. I'd like to avoid introducing unsafe pointer arithmetics for working around a conceptional issue.

Hi @lammel, my apologizes for the ambiguity. I didn't mean to imply that unsafe should be added to echo, on the contrary that was meant as a workaround (for others if their code is also panicing) until something better was found. The echo version of this unsafe code is simply:

func (c *context) SetParamNames(names ...string) {
    copy(c.pnames, names)
}

Your restating the need to look at maxParam is correct, as the above is not enough on its own.

@lammel
Copy link
Contributor

lammel commented Dec 7, 2020

@iaburton No offense taken, just wanted to clarify we are not finished yet ;-)

@iaburton
Copy link
Author

iaburton commented Dec 7, 2020

Bit of an interesting development; my 404 issue has gone away this this, but instead of 404s now I get invalid contexts essentially. My unsafe code is honoring the original len of pnames, but some other code seems to be changing that len.

I'm guessing it's Reset again, like before where the panic was occuring

func (c *context) Reset(r *http.Request, w http.ResponseWriter) {
	c.request = r
	c.response.reset(w)
	c.query = nil
	c.handler = NotFoundHandler
	c.store = nil
	c.path = ""
	c.pnames = nil
	c.logger = nil
	// NOTE: Don't reset because it has to have length c.echo.maxParam at all times
	for i := 0; i < *c.echo.maxParam; i++ {
		c.pvalues[i] = ""
	}
}

Once pnames becomes nil, my code with the copy doesn't actually copy anymore so I get requests without params in the context. 🤔

EDIT: Since the issue is nondeterministic let me get back to this after trying another patch.

EDIT 2: Added:

        if len(*pnames) == 0 {
            *pnames = make([]string, len(vals))
        }

To my unsafe code, and that seems to have stopped the params from "disappearing", but its ultimately the same issue, bad contexts getting into the sync.Pool and messing up internal state in the router.Find and other related code.

@lammel lammel added the v5 label Dec 12, 2020
@lammel
Copy link
Contributor

lammel commented Dec 12, 2020

We'll use this issue for a general improvement for the current quite messy handling with echo.maxParam. So tagged v5 now.

@pafuent
Copy link
Contributor

pafuent commented Feb 2, 2021

Maybe if we decide to return errors from our router (see #1762 (comment)), we can implement the solution already proposed by @lammel.
Basically if we set a fixed value for maxParam, we can get rid of all this kind of issues. If a new route is added that needs to increment this value, we will be able to return an error. To let Echo users fix this error, maxParam could be transformed in a configuration value for Echo (something like Echo#NewWithConfig() with maxParam as a parameter or as a member of a config struct). This could help us to remove the dependency of our Router to know the Echo instance.
Also Context#SetParamNames() should be modified to return an error if maxParam needs to be increased to handle all the new params names.
Another side effect of this change, is remove the need to reference maxParam in Echo Context, because we can safely rely on the length of pnames (this is set when the Context is created and it won't change later on)

Any thoughts?

@lammel
Copy link
Contributor

lammel commented Feb 5, 2021

Basically yes. As we have benchmarking in place can check for negative impact on allocations or processing time.
We can use a sane default (like 16 or 32 params) for the router and allow to change this value when initializing the router (or echo), so all further processing can rely on the params array to be available.

You are right, that the router can be decoupled from the echo instance, it just needs to know the maximum number of parameters that will be processed.

@nicmunroe
Copy link

We just ran into this, and the panic in context.Reset(...) can be hit without it being a race condition at all - it's just a fundamental mismatch of context (or at least context.Reset(...)) expecting c.pvalues to always equal Echo.maxParam, but then allowing Echo.maxParam to change (i.e. when registering a new endpoint with more params), and at the same time trying to reuse pooled contexts.

In other words, if you execute a request and a context is created and pooled, then that c.pvalues length is whatever the Echo.maxParam is at the time of that request. If you then come later and register a new endpoint with the echo server that has more params than the previous Echo.maxParam, then the next time that previous context is pulled from the pool for reuse you're guaranteed to hit the index-out-of-range panic in context.Reset(...).

Here's a quick unit test to explain and reproduce the problem:

func Test_show_context_pvalues_vs_echo_maxParam_panic_during_pooled_context_reset(t *testing.T) {
	// Start a fresh echo server.
	e := echo.New()
	port := 9999
	go func() {
		_ = e.Start(fmt.Sprintf("localhost:%d", port))
	}()

	time.Sleep(100)

	// Register a no-params endpoint - at this point e.maxParam is 0.
	noParamsPath := "/foo"
	e.GET(noParamsPath, func(c echo.Context) error {
		return c.String(200, "doesnotmatter")
	})

	// Execute a request. This will create a context that gets put into the e.pool pool of contexts,
	// and that pooled context will be created with len(c.pvalues) of 0 to match the
	// _current_ e.maxParam value.
	_, _ = http.Get(fmt.Sprintf("http://localhost:%d%s", port, noParamsPath))

	// Now register an endpoint that does have a param. This will cause e.maxParam to change to 1.
	withPathParamBase := "/bar"
	withParamPathTmplt := fmt.Sprintf("%s/:stuff", withPathParamBase)
	e.GET(withParamPathTmplt, func(c echo.Context) error {
		return c.String(200, "doesnotmatter")
	})

	// Execute some more requests. Eventually this will pull the pooled context with len(c.pvalues) of 0,
	// but now e.maxParam is 1, leading to a panic in context.Reset() due to the for loop logic in
	// context.Reset(...) expecting len(c.pvalues) >= e.maxParam.
	urlWithParams := fmt.Sprintf("http://localhost:%d%s%s", port, withPathParamBase, "/stuff-param")
	for i := 0; i < 1000; i++ {
		_, err := http.Get(urlWithParams)
		if err != nil {
			t.Errorf("call failed due to %v", err)
		}
	}
}

If there's a design decision that you should never add endpoints after any requests have been processed (in order to guarantee that all pooled context.pvalues equal Echo.maxParam), then the endpoint registration methods should panic or return an error or otherwise prevent that registration once any context has been pooled.

But assuming you do want people to be able to add to the Echo server's endpoints after it's started processing requests, then IMO you have a few options:

  1. Dump the Echo.pool context pool whenever Echo.maxParam changes.
  2. Bulletproof context.Reset(...) against the reality that Echo.maxParam might have changed since context.pvalues was created.

Assuming that (2) would be preferred for performance reasons, then this would be a fix that would improve things a little:

func (c *context) Reset(r *http.Request, w http.ResponseWriter) {
	c.request = r
	... etc

	// c.echo.maxParam might have changed since this context was used last. Make sure our len(c.pvalues)
	// is big enough to handle c.echo.maxParams, otherwise we end up with a panic.
	if len(c.pvalues) < *c.echo.maxParam {
		c.pvalues = make([]string, *c.echo.maxParam)
	}
	// NOTE: Don't reset because it has to have length c.echo.maxParam at all times
	for i := 0; i < *c.echo.maxParam; i++ {
		c.pvalues[i] = ""
	}
}

There would still be a potential for a race condition if Echo.maxParam changed while the context.Reset(...) was occurring. So a more robust and complete solution for v5 would be best. But this would at least improve things for v4 if you wanted to roll in a small fix.

@aldas
Copy link
Contributor

aldas commented Mar 3, 2022

At this point - routes (adding a route, note: there is no way to remove a route) or any of the echo fields is not meant to be mutated after server has started. This is not limited to c.Reset. Every single field on Echo instance is public and therefore potentially datarace place. And echo.Context instance must not be passed to any other coroutine out of original handler coroutine - whis will be another datarace place as context instance will be put back into pool and leaked reference will mess up next request response etc.

@nicmunroe
Copy link

At this point - routes (adding a route, note: there is no way to remove a route) or any of the echo fields is not meant to be mutated after server has started.

Is this an official and intentional design decision (and if so is it documented somewhere)? Or just an implicit assumption in the codebase?

If it's official and intentional, then IMO any of the mutation methods should fail-fast if they're called after the server has started.

If the intention is that you should be able to mutate after the server is started, then it looks like a significant overhaul is needed to clean up all the race condition potential (not just this one in c.Reset(...)).

@aldas
Copy link
Contributor

aldas commented Mar 3, 2022

I really can not say what intentions @vishr had but echo.Echo fields are public since v3. Probably assumptions is that getters are annoying when 99.9% cases users do not change things after server has started - and this is decision, whether we agree or not, that can not be changed within v4. This is not even limited to public fields - you can call e.Use and have datarace with middlewares as their slice is accessed in http.Server.ServeHTTP() coroutine when handling clients request.

Basically - trade-off is that for ease of use you must not mutate anything related to running Echo and Router instance.

This is not something that is specific to Echo. As Echo and Gin are quite similar both have similar limitation (having public fields which are accessed or accessible from context inside handler coroutines) - they are not meant to be mutated after server has been started.

Personally - router not being safe after server has started has been bothering me from design perspective but addressing everything related to potentially dataracey parts is questionable, as there are not much demand for it (adding routes etc after startup) and this would change the API quite drastically (no public fields anymore) - which would rub people, who do not need this feature, wrong way and they seem to be vast majority.

@yudidi
Copy link

yudidi commented Oct 24, 2023

There is trick to avoid it: We can register an empty route with many param to set an enough maxParam

	e.GET("/api/v1/:id/:idx/:idy/:idz/:idxx", func(c echo.Context) error {
			return nil
		})

@aldas aldas closed this as completed Mar 13, 2024
@iaburton
Copy link
Author

Hello all 👋 long time

I got the notification that this had been closed @aldas , but don't see a commit that references a fix. Was this addressed in the upcoming v5?

@aldas
Copy link
Contributor

aldas commented Mar 21, 2024

This will not be changed in v4. Router is not goroutine safe and is not designed to be safe. v5 addresses that part.

For v4 - If there is a middleware that needs to modify param names/values. You can manipulate maxParam to be high enough before server start thus avoiding changing that field when serving requests.

See

	e := echo.New()

	// workaround to get echo.maxParam value to high enough before server starts to routing requests
	{
		tmp := e.NewContext(nil, nil)
		tmp.SetParamNames(make([]string, 256)...) // echo.maxParam will be now 256
		e.ReleaseContext(tmp)
	}

	e.Use(middleware.Logger())

but there is nothing for guarding you from races if you start adding/registering routes after HTTP server has been started.

@iaburton
Copy link
Author

iaburton commented Mar 21, 2024

Ah okay, so the suggestion is to use the workaround until v5. Thanks for the info!

Edit: Though I do wonder, for those who don't know about this, if echo.New should simply set the size of maxParams before returning, thus not requiring the workaround (or people racing without knowing by calling some of those middleware/context methods.

@aldas
Copy link
Contributor

aldas commented Mar 21, 2024

@iaburton could you first describe the use-case where it is necessary mutate params count? It sound like some kind of dynamic routing like thing. I do not understand why it is necessary.

For example dynamically changing routes is pretty much dead end at the moment because router does not support removing routes at all.

@iaburton
Copy link
Author

The original issue was simply about using the context method SetParamNames at all specifically the race that could be caused here and here. There wasn't any changing routes (on the echo instance or otherwise) or mutating param count (beyond calling those methods which were unknown to be unsafe at the time).

@aldas
Copy link
Contributor

aldas commented Mar 21, 2024

But is there actual use case when you want to SetParam* in when serving the request? Every field that echo.Echo struct contains is unsafe when you start to mutate and read them in multiple goroutines.

SetParamNames is not a problem unless something decides that param names need to be replaced with bigger amount than actual route that Router matched has (maybe * match?). For that case we can trick Echo prematurely increase that number to some safe value so when http server started goroutines are serving the request access that we will not have to increase that value.

@aldas
Copy link
Contributor

aldas commented Mar 21, 2024

And if routes needs to be added after server has started. something like that could be created to provide locking for router

type safeServe struct {
	mu *sync.RWMutex
	e  *echo.Echo
}

func newSafeServe(e *echo.Echo) *safeServe {
	// workaround to get echo.maxParam value to high enough before server starts to routing requests
	{
		tmp := e.NewContext(nil, nil)
		tmp.SetParamNames(make([]string, 256)...) // echo.maxParam will be now 256
		e.ReleaseContext(tmp)
	}
	return &safeServe{
		mu: new(sync.RWMutex),
		e:  e,
	}
}

func (s *safeServe) ServeHTTP(w http.ResponseWriter, r *http.Request) {
	c := s.e.NewContext(r, w)
	defer s.e.ReleaseContext(c)

	s.find(r, c)
	if err := c.Handler()(c); err != nil {
		s.e.HTTPErrorHandler(err, c)
	}
}

func (s *safeServe) find(r *http.Request, c echo.Context) {
	s.mu.RLock()
	defer s.mu.RUnlock()
	s.e.Router().Find(r.Method, echo.GetPath(r), c)
}

func (s *safeServe) Add(method, path string, handler echo.HandlerFunc, middleware ...echo.MiddlewareFunc) {
	for i := len(middleware) - 1; i >= 0; i-- {
		handler = middleware[i](handler)
	}
	s.mu.Lock()
	defer s.mu.Unlock()
	s.e.Router().Add(method, path, handler)
}

func main() {
	e := echo.New()
	safe := newSafeServe(e)

	//e.Use(middleware.Logger()) // DO NOT USE e.Use

	handler := func(c echo.Context) error {
		return c.String(http.StatusOK, c.Param("param"))
	}
	safe.Add(http.MethodGet, "/:param", handler, middleware.Logger()) // add middlewares to route here

	s := http.Server{
		Addr:    ":8080",
		Handler: safe,
	}

	if err := s.ListenAndServe(); err != nil && !errors.Is(err, http.ErrServerClosed) {
		log.Fatal(err)
	}
}

but this looks silly. + you can not still remove these added routes which is probably something that you want if dynamic routing is something that is required.

@aldas
Copy link
Contributor

aldas commented Mar 21, 2024

coming back to maxParam and Context we probably can remove references to maxParam there and allow it grow if needed by SetParamValues and not to shrink. And reset it to current capacity and we are good to go.

assuming it can only grow we do not need to access echo.Echo.maxParam field.

@aldas
Copy link
Contributor

aldas commented Mar 21, 2024

#2611 addresses this issue. middlewares using SetParamValues are now in better place.

@iaburton
Copy link
Author

Hey @aldas thanks for the replies

I get what you're asking now. So originally (a few years ago now so the details are a bit fuzzy) I was updating an echo service from 3 to 4 and ran into an issue regarding routes disappearing, as mentioned in the OP

upgrading from 3.x to the latest 4.x recently and fixing some issues along the way.
While the initial issue I ran into may be related to #1678 (some groups or routes randomly "forget" their handler and start returning 404's) I discovered some other issues in echo.

I think I ended up trying the SetParamNames method as a workaround, but then that caused problems of its own. Originally my intent was never to change routes or params after starting the server, it just happened without knowing how some of these methods were implemented. Given SetParamNames was on the context type it was surprising to find it was modifying the base echo instance.

Anyways, I see your #2611 addresses it nicely though, fixing the race caused by the original MR that added it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants