Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sstable: explore index size optimizations #97

Closed
petermattis opened this issue Apr 28, 2019 · 5 comments
Closed

sstable: explore index size optimizations #97

petermattis opened this issue Apr 28, 2019 · 5 comments

Comments

@petermattis
Copy link
Collaborator

petermattis commented Apr 28, 2019

RocksDB has added version 3 and version 4 sstable formats:

  • Version 3 changes index block keys to not include the sequence number (which is always 0?)
  • Version 4 changes index block values (block handles) from being encoded as <offset, size> to a delta encoding that usually only stores <size>.

There is no particular urgency to supporting these formats, but we should investigate whether these format changes move the needle on any benchmarks.

Jira issue: PEBBLE-185

@petermattis
Copy link
Collaborator Author

There is no particular urgency to supporting these formats, but we should investigate whether these format changes move the needle on any benchmarks.

@ajkr do you have an inside knowledge on this?

@ajkr
Copy link
Contributor

ajkr commented May 1, 2019

I remember format_version=4 saved noticeable space on indexes. But it looks like it depends on setting index_block_restart_interval > 1, which we currently don't do. So it shouldn't be the first thing we try if we ever want to optimize index size.

@petermattis
Copy link
Collaborator Author

So it shouldn't be the first thing we try if we ever want to optimize index size.

Something I've been meaning to ask you about: we saw a situation once where keys larger than a data block were written to a table. The result was that every data block had 1 entry, and the index block was essentially the same size as all of the data blocks combined. I think we could avoid this situation if there was a bit more complexity in the sstable block boundary heuristics. Specifically, we shouldn't allow a block to contain only a single entry if the key for the entry is larger than the target block size.

On the other hand, perhaps it isn't worth futzing with the sstable block boundary heuristics, and instead put effort into blob (wisc-key) storage of large entries.

@github-actions
Copy link

github-actions bot commented Mar 2, 2022

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it
in 10 days to keep the issue queue tidy. Thank you for your
contribution to Pebble!

@sumeerbhola sumeerbhola changed the title sstable: support format version 3 and 4 sstable: explore index size optimizations Mar 3, 2022
@github-actions
Copy link

We have marked this issue as stale because it has been inactive for
18 months. If this issue is still relevant, removing the stale label
or adding a comment will keep it active. Otherwise, we'll close it
in 10 days to keep the issue queue tidy. Thank you for your
contribution to Pebble!

@jbowens jbowens moved this to Backlog in [Deprecated] Storage Jun 4, 2024
jbowens added a commit to jbowens/pebble that referenced this issue Aug 16, 2024
Add writer, reader and iterator types for index blocks.

This commit considers and resolves most of cockroachdb#97 and cockroachdb#2592.

Close cockroachdb#97.
Close cockroachdb#2592.
jbowens added a commit to jbowens/pebble that referenced this issue Aug 16, 2024
Add writer, reader and iterator types for index blocks.

This commit considers and resolves most of cockroachdb#97 and cockroachdb#2592.

Close cockroachdb#97.
Close cockroachdb#2592.
jbowens added a commit to jbowens/pebble that referenced this issue Aug 22, 2024
Add writer, reader and iterator types for index blocks.

This commit considers and resolves most of cockroachdb#97 and cockroachdb#2592.

Close cockroachdb#97.
Close cockroachdb#2592.
@github-project-automation github-project-automation bot moved this from Backlog to Done in [Deprecated] Storage Aug 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

4 participants