-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Checker: change image check #3312
base: master
Are you sure you want to change the base?
Conversation
In the master branch, the checker starts to stream the response and cut the connection. However it creates a lot of read error, which are false negative. I don't know how to fix the issue. This commit change the checker to download the whole image. The error reporting is also changed to report only one line, instead of the whole stacktrace. Also, if a timeout occurs, the checker waits for one second before retry. Do note that I have not test the checker running in background. This feature seems forgotten and lack of interrest despite the initial move few years ago.
100a3a6
to
f49d1a9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for this PR .. when I test:
make search.checker.artic
the checker terminates prematurely with this exception:
Engine artic Checking
Traceback (most recent call last):
File "local/py3/bin/searxng-checker", line 33, in <module>
sys.exit(load_entry_point('searxng', 'console_scripts', 'searxng-checker')())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "searx/search/checker/__main__.py", line 115, in main
run(args.engine_name_list, args.verbose)
File "searx/search/checker/__main__.py", line 73, in run
checker.run()
File "searx/search/checker/impl.py", line 439, in run
self.run_test(test_name)
File "searx/search/checker/impl.py", line 425, in run_test
rct_list = [self.get_result_container_tests(test_name, search_query) for search_query in search_query_list]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "searx/search/checker/impl.py", line 419, in get_result_container_tests
result_container_check.check_basic()
File "searx/search/checker/impl.py", line 273, in check_basic
self._check_results(results)
File "searx/search/checker/impl.py", line 248, in _check_results
self._check_result(result)
File "searx/search/checker/impl.py", line 241, in _check_result
elif not _is_url_image(result.get('img_src')):
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "searx/search/checker/impl.py", line 124, in _is_url_image
return _download_and_check_if_image(image_url)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "searx/search/checker/impl.py", line 76, in _download_and_check_if_image
r = network.get(
^^^^^^^^^^^^
File "searx/network/__init__.py", line 165, in get
return request('get', url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "searx/network/__init__.py", line 96, in request
return future.result(timeout)
^^^^^^^^^^^^^^^^^^^^^^
File "~/.asdf/installs/python/3.12.0/lib/python3.12/concurrent/futures/_base.py", line 456, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "~/.asdf/installs/python/3.12.0/lib/python3.12/concurrent/futures/_base.py", line 401, in __get_result
raise self._exception
File "/searx/network/network.py", line 290, in request
return await self.call_client(False, method, url, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "searx/network/network.py", line 273, in call_client
return Network.patch_response(response, do_raise_for_httperror)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "searx/network/network.py", line 246, in patch_response
raise_for_httperror(response)
File "searx/network/raise_for_httperror.py", line 75, in raise_for_httperror
raise SearxEngineAccessDeniedException(message='HTTP error ' + str(resp.status_code))
searx.exceptions.SearxEngineAccessDeniedException: HTTP error 403, suspended_time=86400
make: *** [Makefile:50: search.checker.artic] Error 1
The issue exists before this PR: there is an exception in the async code which is not catched by the sync code. |
@return42 may I ask you to run |
NOTE: artic was the first engine where a
on master the test works, at the end there is a issue report (with the false negatives):
|
What can I do / how can I help further .. did you read my last comment from above? |
May be you assume that I know the code since I wrote it. |
What does this PR do?
In the master branch, the checker starts to stream the response and cut the connection. However it creates a lot of read error, which are false negative. I don't know how to fix the issue.
This commit change the checker to download the whole image. The error reporting is also changed to report only one line, instead of the whole stacktrace.
Do note that I have not test the checker running in background. This feature seems forgotten and lack of interrest despite the initial move few years ago.
Why is this change important?
How to test this PR locally?
make search.checker.brave.images
make search.checker.google_images
(_
not.
).Author's checklist
Related issues
Close #3311