Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

staging put: stream file uploads instead of loading all to memory #197

Merged
merged 1 commit into from Apr 9, 2024

Conversation

mdibaiee
Copy link
Contributor

os.ReadFile reads all of the content of the file into a byte array in memory, which can cause memory consumption pressure for users. Instead, an os.File instance is itself a byte reader, and we can provide the file directly to http.NewRequest so it can read the file in chunks and upload it as a stream, thus not holding the whole file in memory.

@mdibaiee
Copy link
Contributor Author

mdibaiee commented Mar 28, 2024

@yunbodeng-db @andrefurlan-db @rcypher-databricks @jadewang-db any chances of a review on this? This is currently blocking us from using Databricks efficiently

os.ReadFile reads all of the content of the file into a byte array in
memory, which can cause memory consumption pressure for users. Instead,
an os.File instance is itself a byte reader, and we can provide the file
directly to http.NewRequest so it can read the file in chunks and upload
it as a stream, thus not holding the whole file in memory.
Copy link
Contributor

@andrefurlan-db andrefurlan-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!!

@mdibaiee
Copy link
Contributor Author

mdibaiee commented Apr 9, 2024

@andrefurlan-db thanks! I can't merge the pull-request myself, and I think the lint and check jobs are failing for unrelated reasons (the lint errors are for other files and seem to complain because the imports are not recognised). can the pull-request be merged?

@yunbodeng-db
Copy link
Contributor

@andrefurlan-db thanks! I can't merge the pull-request myself, and I think the lint and check jobs are failing for unrelated reasons (the lint errors are for other files and seem to complain because the imports are not recognised). can the pull-request be merged?

A team member will assist you shortly. Thanks for your patience.

@kravets-levko kravets-levko merged commit 00bc1c8 into databricks:main Apr 9, 2024
1 of 3 checks passed
@candiduslynx
Copy link
Contributor

candiduslynx commented Apr 11, 2024

@kravets-levko @yunbodeng-db @mdibaiee I get 501 unimplemented response with this change, I really think this should be reverted & properly tested with the backend.

My assumption is that the file streaming is alright, but the backend doesn't actually allow data with unknown length, hence, this fails.

esdrasbeleza pushed a commit to esdrasbeleza/databricks-sql-go that referenced this pull request Apr 15, 2024
…tabricks#197)

`os.ReadFile` reads all of the content of the file into a byte array in
memory, which can cause memory consumption pressure for users. Instead,
an `os.File` instance is itself a byte reader, and we can provide the
file directly to `http.NewRequest` so it can read the file in chunks and
upload it as a stream, thus not holding the whole file in memory.
esdrasbeleza pushed a commit to esdrasbeleza/databricks-sql-go that referenced this pull request Apr 15, 2024
…tabricks#197)

`os.ReadFile` reads all of the content of the file into a byte array in
memory, which can cause memory consumption pressure for users. Instead,
an `os.File` instance is itself a byte reader, and we can provide the
file directly to `http.NewRequest` so it can read the file in chunks and
upload it as a stream, thus not holding the whole file in memory.

Signed-off-by: Esdras Beleza <esdras@esdrasbeleza.com>
atzoum pushed a commit to rudderlabs/sqlconnect-go that referenced this pull request Apr 29, 2024
….5.3 to 1.5.4 (#61)

Bumps
[github.com/databricks/databricks-sql-go](https://github.com/databricks/databricks-sql-go)
from 1.5.3 to 1.5.4.
<details>
<summary>Release notes</summary>
<p><em>Sourced from <a
href="https://github.com/databricks/databricks-sql-go/releases">github.com/databricks/databricks-sql-go's
releases</a>.</em></p>
<blockquote>
<h2>v1.5.4</h2>
<h2>What's Changed</h2>
<ul>
<li><code>databricks/databricks-sql-go#189</code><a
href="https://github.com/rcypher-databricks"><code>@​rcypher-databricks</code></a>)</li>
<li><code>databricks/databricks-sql-go#197</code><a
href="https://github.com/mdibaiee"><code>@​mdibaiee</code></a>)</li>
<li><code>databricks/databricks-sql-go#205</code><a
href="https://github.com/candiduslynx"><code>@​candiduslynx</code></a>)</li>
<li><code>databricks/databricks-sql-go#207</code><a
href="https://github.com/candiduslynx"><code>@​candiduslynx</code></a>)</li>
</ul>
<p><strong>Full Changelog</strong>: <a
href="https://github.com/databricks/databricks-sql-go/compare/v1.5.3...v1.5.4">https://github.com/databricks/databricks-sql-go/compare/v1.5.3...v1.5.4</a></p>
</blockquote>
</details>
<details>
<summary>Changelog</summary>
<p><em>Sourced from <a
href="https://github.com/databricks/databricks-sql-go/blob/main/CHANGELOG.md">github.com/databricks/databricks-sql-go's
changelog</a>.</em></p>
<blockquote>
<h2>v1.5.4 (2024-04-10)</h2>
<ul>
<li><code>databricks/databricks-sql-go#189</code><a
href="https://github.com/rcypher-databricks"><code>@​rcypher-databricks</code></a>)</li>
<li><code>databricks/databricks-sql-go#197</code><a
href="https://github.com/mdibaiee"><code>@​mdibaiee</code></a>)</li>
<li><code>databricks/databricks-sql-go#205</code><a
href="https://github.com/candiduslynx"><code>@​candiduslynx</code></a>)</li>
<li><code>databricks/databricks-sql-go#207</code><a
href="https://github.com/candiduslynx"><code>@​candiduslynx</code></a>)</li>
</ul>
</blockquote>
</details>
<details>
<summary>Commits</summary>
<ul>
<li><a
href="https://github.com/databricks/databricks-sql-go/commit/e82880f5e0583fcb1228080956d8ca27491f22c3"><code>e82880f</code></a>
Prepare release v1.5.4 (<a
href="https://redirect.github.com/databricks/databricks-sql-go/issues/208">#208</a>)</li>
<li><a
href="https://github.com/databricks/databricks-sql-go/commit/7ac797b5800fec66147795b2039886e270f9b944"><code>7ac797b</code></a>
fix: Properly format <code>time.Time</code> values (<a
href="https://redirect.github.com/databricks/databricks-sql-go/issues/207">#207</a>)</li>
<li><a
href="https://github.com/databricks/databricks-sql-go/commit/dea3e6ded2e8c6012d223168b7577f71016e6aa9"><code>dea3e6d</code></a>
fix: Don't panic on <code>remove</code> op (<a
href="https://redirect.github.com/databricks/databricks-sql-go/issues/205">#205</a>)</li>
<li><a
href="https://github.com/databricks/databricks-sql-go/commit/00bc1c893537290e8db73b9921151123b8cebd2b"><code>00bc1c8</code></a>
staging put: stream file uploads instead of loading all to memory (<a
href="https://redirect.github.com/databricks/databricks-sql-go/issues/197">#197</a>)</li>
<li><a
href="https://github.com/databricks/databricks-sql-go/commit/bb10f7a642f79133a4a2cb200d2c47e4909581fb"><code>bb10f7a</code></a>
Update Github workflows to fix deprecation warnings (<a
href="https://redirect.github.com/databricks/databricks-sql-go/issues/203">#203</a>)</li>
<li><a
href="https://github.com/databricks/databricks-sql-go/commit/d70ab7c5b6dda71f1439bd3edc51a2d2022f7d92"><code>d70ab7c</code></a>
Added GCP cloud type for OAuth (<a
href="https://redirect.github.com/databricks/databricks-sql-go/issues/189">#189</a>)</li>
<li><a
href="https://github.com/databricks/databricks-sql-go/commit/5adddfcbcaad7991ea084242606dd4527bd5a396"><code>5adddfc</code></a>
Update owners (<a
href="https://redirect.github.com/databricks/databricks-sql-go/issues/190">#190</a>)</li>
<li>See full diff in <a
href="https://github.com/databricks/databricks-sql-go/compare/v1.5.3...v1.5.4">compare
view</a></li>
</ul>
</details>
<br />


[![Dependabot compatibility
score](https://dependabot-badges.githubapp.com/badges/compatibility_score?dependency-name=github.com/databricks/databricks-sql-go&package-manager=go_modules&previous-version=1.5.3&new-version=1.5.4)](https://docs.github.com/en/github/managing-security-vulnerabilities/about-dependabot-security-updates#about-compatibility-scores)

Dependabot will resolve any conflicts with this PR as long as you don't
alter it yourself. You can also trigger a rebase manually by commenting
`@dependabot rebase`.

[//]: # (dependabot-automerge-start)
[//]: # (dependabot-automerge-end)

---

<details>
<summary>Dependabot commands and options</summary>
<br />

You can trigger Dependabot actions by commenting on this PR:
- `@dependabot rebase` will rebase this PR
- `@dependabot recreate` will recreate this PR, overwriting any edits
that have been made to it
- `@dependabot merge` will merge this PR after your CI passes on it
- `@dependabot squash and merge` will squash and merge this PR after
your CI passes on it
- `@dependabot cancel merge` will cancel a previously requested merge
and block automerging
- `@dependabot reopen` will reopen this PR if it is closed
- `@dependabot close` will close this PR and stop Dependabot recreating
it. You can achieve the same result by closing it manually
- `@dependabot show <dependency name> ignore conditions` will show all
of the ignore conditions of the specified dependency
- `@dependabot ignore this major version` will close this PR and stop
Dependabot creating any more for this major version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this minor version` will close this PR and stop
Dependabot creating any more for this minor version (unless you reopen
the PR or upgrade to it yourself)
- `@dependabot ignore this dependency` will close this PR and stop
Dependabot creating any more for this dependency (unless you reopen the
PR or upgrade to it yourself)


</details>

Signed-off-by: dependabot[bot] <support@github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants