Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

os: File.Sync() for Darwin should use the F_FULLFSYNC fcntl #26650

Closed
networkimprov opened this issue Jul 27, 2018 · 31 comments
Closed

os: File.Sync() for Darwin should use the F_FULLFSYNC fcntl #26650

networkimprov opened this issue Jul 27, 2018 · 31 comments
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. OS-Darwin
Milestone

Comments

@networkimprov
Copy link

networkimprov commented Jul 27, 2018

What version of Go?

1.10 or tip

What operating system and processor architecture?

darwin, *

What is the matter?

According to this and other sites, fcntl(fd, F_FULLFSYNC) is necessary for proper fsync on MacOS (OS X): https://danluu.com/file-consistency/

Github search finds no uses of "F_FULLFSYNC" in the src tree.

If there is a rationale for the present implementation, it should be documented in File.Sync() with directions on how to invoke the fcntl variant.

@ianlancetaylor ianlancetaylor added the NeedsFix The path to resolution is known, but the work has not been done. label Jul 27, 2018
@ianlancetaylor ianlancetaylor added this to the Go1.12 milestone Jul 27, 2018
@odeke-em odeke-em changed the title os: File.Sync() for darwin not correctly implemented os: File.Sync() for Darwin should use the F_FULLFSYNC fcntl Jul 27, 2018
@networkimprov
Copy link
Author

Since this issue threatens data loss/corruption, please consider fixing it in 1.11.x and 1.10.x

@odeke-em
Copy link
Member

Thank you for reporting this issue @networkimprov! It is triaged for 1.12 as 1.11 is scheduled to be released in just a few days. It has also been like this since forever. However we'll definitely consider it as an early candidate for Go1.12 when the tree opens up in the next few weeks. If you are interested in submitting a CL, please feel free, it'd be great to have your contribution in the Go project and community, otherwise we'll get it in for Go1.12.

@networkimprov
Copy link
Author

I'm not proposing 1.11.0 as a target; rather 1.11.1 & 1.10.4 or the minor releases that appear after a fix lands on tip.

Do you have guidance on where fcntl should be called for Darwin File.Sync()?

@ianlancetaylor
Copy link
Contributor

The question is whether we should do it in syscall.Fsync or os.File.Fsync. I guess I would lean slightly toward the former, but happy to hear comments.

@odeke-em
Copy link
Member

odeke-em commented Jul 27, 2018

I'm not proposing 1.11.0 as a target; rather 1.11.1 & 1.10.4 or the minor releases that appear after a fix lands on tip.

A backport I believe warrants something that has plagued previous versions or results for memory corruption, but I'll defer to others on whether this qualifies too -- in this case potential file corruption might qualify :)

The question is whether we should do it in syscall.Fsync or os.File.Fsync. I guess I would lean slightly toward the former, but happy to hear comments.

@ianlancetaylor I too was leaning towards it going in to syscall.Fsync since perhaps we can minimally make it

// As per /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/sys/fcntl.h
const F_FULLFSYNC = 51
func Fsync(fd int) error {
     _, err := fcntl(fd, F_FULLFSYNC, 1)
    return err
}

so in full

+// As per /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.13.sdk/usr/include/sys/fcntl.h
+const F_FULLFSYNC = 51
+
+func Fsync(fd int) error {
+       _, err := fcntl(fd, F_FULLFSYNC, 0)
+       return err
+}
+
 // SockaddrDatalink implements the Sockaddr interface for AF_LINK type sockets.
 type SockaddrDatalink struct {
        Len    uint8
@@ -332,7 +340,6 @@ func Uname(uname *Utsname) error {
 //sys  Fstat(fd int, stat *Stat_t) (err error) = SYS_FSTAT64
 //sys  Fstatat(fd int, path string, stat *Stat_t, flags int) (err error) = SYS_FSTATAT64
 //sys  Fstatfs(fd int, stat *Statfs_t) (err error) = SYS_FSTATFS64
-//sys  Fsync(fd int) (err error)
 //sys  Ftruncate(fd int, length int64) (err error)
 //sys  Getdirentries(fd int, buf []byte, basep *uintptr) (n int, err error) = SYS_GETDIRENTRIES64
 //sys  Getdtablesize() (size int)

Also happy to hear what comments others have.

@ianlancetaylor
Copy link
Contributor

On the other hand, thinking about it further, most of the syscall functions simply implement that operation on the system, rather than trying to present a unified cross-system API. So that suggests that maybe syscall.Fsync on MacOS should just implement the fsync system call, and that we should make the fcntl call in os.File.Sync, or, really, internal/poll/Fd.Fsync. I'm not sure.

@networkimprov
Copy link
Author

I'm not proposing 1.11.0 as a target; rather 1.11.1 & 1.10.4 ...

A backport I believe warrants something that has plagued previous versions or results for memory corruption, but I'll defer to others on whether this qualifies too -- in this case potential file corruption might qualify :)

Note that this can hurt end-users without the Go developer ever realizing it: User does xyz in app, OS crashes (or hw fails), user sees odd behavior in app after restart and restores from backup (losing today's work), user assumes problem was due to hw or os.

+const FULLFSYNC = 51

It's already defined in tree:
https://github.com/golang/go/search?q=f_fullfsync&unscoped_q=f_fullfsync

Agreed you should leave MacOS fsync available in case anyone needs it.

@networkimprov
Copy link
Author

Perhaps surface MacOS fsync as syscall.FsyncDarwin and implement syscall.Fsync with fcntl?

@odeke-em
Copy link
Member

Cool, thanks for the discussion @networkimprov and @ianlancetaylor!

@networkimprov
Copy link
Author

@odeke-em the 1.12 tree is open; can we get this CL posted?

@odeke-em
Copy link
Member

@networkimprov thank you for the ping and reminder! For sure, let's get the ball rolling. I am in the process of submitting a CL.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/130676 mentions this issue: internal/poll: use F_FULLFSYNC fcntl for FD.Fsync on OS X

@odeke-em odeke-em self-assigned this Aug 22, 2018
@odeke-em
Copy link
Member

In regards to backporting this fix to Go1.11 and earlier, what do y'all think @ianlancetaylor @andybons?
Also perhaps we need to add this to Go1.12 release notes?

@ianlancetaylor
Copy link
Contributor

This is not a new problem. I don't see a need to backport.

@odeke-em
Copy link
Member

Roger that! The rationale that @networkimprov proposed, which I thought was reasonable was that because it is an underlying problem that has existed since Go1.0, it'd be nice to have it in the latest Go release since it can lead to disk memory corruption and affects many databases, folks that need flushes to permanent storage. However, as we've all noticed, no one had frantically reported this bug during previous Go cycles hence doesn't meet the bar for a backport.

@networkimprov
Copy link
Author

This can hurt end-users without the Go developer ever realizing it:

  1. user does xyz in app; OS crashes or hw fails
  2. user sees odd app behavior after restart and restores from backup, losing this week's work
  3. user assumes problem was due to hw or OS

I plan to release two apps near-future which rely on os.File.Sync() for crash protection, and MacOS is widely used by my audience.

Not Go-related, but my Git repo was corrupted last week due to a hw failure.

@networkimprov
Copy link
Author

@ianlancetaylor could you consider cherry-picking this for 1.11.1, given the report from a bbolt db user on MacOS?

https://github.com/etcd-io/bbolt

@ianlancetaylor
Copy link
Contributor

I don't see a report at that link.

We've also seen a bug report on this due to the changed performance: #27415. I'm pretty reluctant to make this kind of change in a point release. As I said earlier, it's not a new problem.

@mark-rushakoff
Copy link
Contributor

Couldn't bbolt handle this by making the syscall directly? From Go 1.12 onwards they can go back to a plain (*os.File).Sync().

@networkimprov
Copy link
Author

@mark-rushakoff, probably they could. Bleve is also affected and they don't use the bbolt fork yet; hopefully they'll switch if bbolt applies the workaround. Wish I knew who else depends on os.File.Sync :-(

@ianlancetaylor, yes that was the report I meant. I hope you won't fault me for asking.

@networkimprov
Copy link
Author

Regarding the 1.12 release notes:

"Note that this might have a negative performance impact."

I'd suggest adding that File.Sync() prior to 1.12 is broken on MacOS, and may cause data loss/corruption.

@ianlancetaylor
Copy link
Contributor

That would imply that calling File.Sync could in itself cause data loss/corruption, which is not the case.

I think the current release notes text is clear enough.

@mark-rushakoff
Copy link
Contributor

I was looking at the release notes for this as well, wondering to myself: "negative performance impact" relative to ... the previous implementation of Sync? Not calling Sync but just relying on Close?

I think it would be more clear if it mentioned that calling File.Sync is expected to block for a longer time in 1.12 than in 1.11 and earlier, because now Sync forces writes to disk rather than just OS buffers, and as a result Sync is now more resilient to crashes. "Negative performance impact" is a bit vague in my opinion.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/155038 mentions this issue: doc: clarify change to File.Sync on macOS

@ianlancetaylor
Copy link
Contributor

What do you think of https://golang.org/cl/155038?

(I really think we are spending too much time on this. It's a bug fix.)

@mark-rushakoff
Copy link
Contributor

LGTM @ianlancetaylor. Thanks for rewording it.

@networkimprov
Copy link
Author

networkimprov commented Dec 19, 2018

That would imply that calling File.Sync could in itself cause data loss/corruption

True. Maybe "File.Sync() prior to 1.12 is ineffective on MacOS, and cannot prevent data loss/corruption on system crash/power-off."

It's a bug fix with broad impact among database packages, including etcd bbolt, syndtr goleveldb, dgraph badger, and bleve.

@ianlancetaylor
Copy link
Contributor

Thanks, I'm going with what I've got.

@campoy
Copy link
Contributor

campoy commented Jun 17, 2019

I'm definitely late to the conversation, but I wanted to document my frustration with the decision of not backporting this fix, and how it's affecting https://github.com/dgraph-io/badger.

We now need to replace all the calls to os.File.Sync to instead call a FileSync function for which we provide two implementations.

Once is very straight forward:

file_sync.go

// +build !darwin go1.12

package y

import "os"

// FileSync calls os.File.Sync with the right parameters.
// This function can be removed once we stop supporting Go 1.11
// on MacOS.
func FileSync(f *os.File) error {return f.Sync()}

The second one for now simply makes the corresponding syscall.

file_sync_darwin.go

// +build darwin,!go1.12

package y

import (
	"os"
	"syscall"
)

func FileSync(f *os.File) error {
	_, _, err := syscall.Syscall(syscall.SYS_FCNTL, f.Fd(), syscall.F_FULLFSYNC, 0)
	if err == 0 {
		return nil
	}
	return err
}

Making this function do the right thing seems to require interacting with syscall package in ways it seems it's not designed to (according to this change). Before calling the syscall.Syscall(syscall.SYS_FCNTL, ...) we need to first call incref which is not exported.

Could you, at least, provide a clear way of how to implement this function correctly?
Is my code correct?
I'm afraid I'll hack something that works but eventually will figure out something is missing.

Thanks

UPDATE: I already found a way my code was wrong, added the check to see whether errno is zero.

@ianlancetaylor
Copy link
Contributor

@campoy I understand your frustration, but there was no good solution here. Making this change has led to multiple bug reports about significant performance changes, changes due to now using the correct behavior. It would have been inappropriate to make a change that led to multiple bug reports in a minor release.

The code you show above looks correct and fine for a file on local disk. I'm not sure there is a fully correct to call fsync for an NFS file in Go 1.11 on Darwin. The incref is not an issue because you are calling the Fd method.

@campoy
Copy link
Contributor

campoy commented Jun 21, 2019

Thanks for the confirmation, @ianlancetaylor!

I think that just keeping my code in this issue should be enough to help others wondering how to fix it on their own codebases.

The PR fixing this issue for badger is dgraph-io/badger#871.

mostynb added a commit to mostynb/bazel-remote that referenced this issue Dec 19, 2019
os.File.Sync() was inneffective on mac in go prior to 1.12. In 1.12 onwards
it is effective, but slow: golang/go#26650

This change disables Sync for CAS uploads on mac, because we can (potentially)
verify this later. Let's see if this helps performance on mac: buchgr#67 (while
being less risky than buchgr#68).
mostynb added a commit to mostynb/bazel-remote that referenced this issue Dec 30, 2019
os.File.Sync() was inneffective on mac in go prior to 1.12. In 1.12 onwards
it is effective, but slow: golang/go#26650

This change disables Sync for CAS uploads on mac, because we can (potentially)
verify this later. Let's see if this helps performance on mac: buchgr#67 (while
being less risky than buchgr#68).
mostynb added a commit to mostynb/bazel-remote that referenced this issue Jan 1, 2020
os.File.Sync() was inneffective on mac in go prior to 1.12. In 1.12 onwards
it is effective, but slow: golang/go#26650

This change disables Sync for CAS uploads on mac, because we can (potentially)
verify this later. Let's see if this helps performance on mac: buchgr#67 (while
being less risky than buchgr#68).
@golang golang locked and limited conversation to collaborators Jun 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge help wanted NeedsFix The path to resolution is known, but the work has not been done. OS-Darwin
Projects
None yet
Development

No branches or pull requests

6 participants