Discussion:
COW and refcounting of std::string / string16
Eric Roman
2010-07-16 22:31:48 UTC
Permalink
[C++ / STL question]

Does Chromium have any fixed expectations on whether the underlying
std::string implementations will be reference counted and
copy-on-write ?

The reason I ask, is I have a use case where I am sharing a (possibly
large) immutable string16 between several places.
Currently I am using copy-construction. It works out OK as the
underlying bytes end up being shared.
But certainly if this implementation detail doesn't hold, that code
will have an undesirable memory footprint.

Are there any compiler gurus here that know how reliable this
implementation detail is across platforms, chrome's build settings,
and the future?

If I can't rely on this, then I will need to consider other solutions
like having a reference-counted wrapper for the string types I want to
share
(but this feels ugly and overkill if we know our strings will always
be reference counted).

In fact, my general impression is that non-COW string implementations
will hurt Chrome performance, as we rely heavily on string copying
(for example inside containers, and when PostTask-ing |const
std::string&| between threads, etc.)

Thanks!
--
Chromium Developers mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
Peter Kasting
2010-07-16 22:35:45 UTC
Permalink
Post by Eric Roman
Does Chromium have any fixed expectations on whether the underlying
std::string implementations will be reference counted and
copy-on-write ?
We use WriteInto() sometimes to get a pointer to the buffer underneath the
string and write characters into it. That might break if the destination
was COW. I believe we decided that we could safely do this for all STL
std::string implementations.

PK
--
Chromium Developers mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
Avi Drissman
2010-07-16 22:39:23 UTC
Permalink
AFAICR most compiler vendors consider COW strings to not be worth the pain,
that all current compilers don't implement them, and that for C++0x the
standards committee has forbidden them.

http://old.nabble.com/COW-strings-in-libstdc%2B%2B-td25227463.html refers to
We propose to make all iterator and element access operations safely
concurrently executable.
We are increasing the stability of operations even in sequential code.
This change effectively disallows copy-on-write implementations.
Avi
[C++ / STL question]
Does Chromium have any fixed expectations on whether the underlying
std::string implementations will be reference counted and
copy-on-write ?
The reason I ask, is I have a use case where I am sharing a (possibly
large) immutable string16 between several places.
Currently I am using copy-construction. It works out OK as the
underlying bytes end up being shared.
But certainly if this implementation detail doesn't hold, that code
will have an undesirable memory footprint.
Are there any compiler gurus here that know how reliable this
implementation detail is across platforms, chrome's build settings,
and the future?
If I can't rely on this, then I will need to consider other solutions
like having a reference-counted wrapper for the string types I want to
share
(but this feels ugly and overkill if we know our strings will always
be reference counted).
In fact, my general impression is that non-COW string implementations
will hurt Chrome performance, as we rely heavily on string copying
(for example inside containers, and when PostTask-ing |const
std::string&| between threads, etc.)
Thanks!
--
http://groups.google.com/a/chromium.org/group/chromium-dev
--
Chromium Developers mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
Eric Roman
2010-07-16 22:52:27 UTC
Permalink
Thanks. I will make sure to wrap this big string inside a refcounted then!
Post by Avi Drissman
AFAICR most compiler vendors consider COW strings to not be worth the pain,
that all current compilers don't implement them, and that for C++0x the
standards committee has forbidden them.
http://old.nabble.com/COW-strings-in-libstdc%2B%2B-td25227463.html refers to
We propose to make all iterator and element access operations safely
concurrently executable.
We are increasing the stability of operations even in sequential code.
This change effectively disallows copy-on-write implementations.
Avi
[C++ / STL question]
Does Chromium have any fixed expectations on whether the underlying
std::string implementations will be reference counted and
copy-on-write ?
The reason I ask, is I have a use case where I am sharing a (possibly
large) immutable string16 between several places.
Currently I am using copy-construction. It works out OK as the
underlying bytes end up being shared.
But certainly if this implementation detail doesn't hold, that code
will have an undesirable memory footprint.
Are there any compiler gurus here that know how reliable this
implementation detail is across platforms, chrome's build settings,
and the future?
If I can't rely on this, then I will need to consider other solutions
like having a reference-counted wrapper for the string types I want to
share
(but this feels ugly and overkill if we know our strings will always
be reference counted).
In fact, my general impression is that non-COW string implementations
will hurt Chrome performance, as we rely heavily on string copying
(for example inside containers, and when PostTask-ing |const
std::string&| between threads, etc.)
Thanks!
--
   http://groups.google.com/a/chromium.org/group/chromium-dev
--
Chromium Developers mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
Mark Mentovai
2010-07-17 14:39:20 UTC
Permalink
Post by Eric Roman
Does Chromium have any fixed expectations on whether the underlying
std::string implementations will be reference counted and
copy-on-write ?
No expectations. It’s an implementation detail. basic_string may be
either COW or not.
Post by Eric Roman
Are there any compiler gurus here that know how reliable this
implementation detail is across platforms, chrome's build settings,
and the future?
In the current state of the world, GCC’s libstdc++ basic_strings are
copy-on-write. MSVC’s, I believe, are not.

As Avi points out, C++0x will cause changes in this area.
Post by Eric Roman
We use WriteInto() sometimes to get a pointer to the buffer underneath the
string and write characters into it.  That might break if the destination
was COW.  I believe we decided that we could safely do this for all STL
std::string implementations.
A conforming COW basic_string needs to treat certain operations as
triggering a copy. For example, operator[], as used by WriteInto,
returns a character-type reference that allows mutation, so it must
trigger a copy.

Mark
--
Chromium Developers mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
Peter Kasting
2010-07-17 18:38:17 UTC
Permalink
Post by Mark Mentovai
Post by Peter Kasting
We use WriteInto() sometimes to get a pointer to the buffer underneath
the
Post by Peter Kasting
string and write characters into it. That might break if the destination
was COW. I believe we decided that we could safely do this for all STL
std::string implementations.
A conforming COW basic_string needs to treat certain operations as
triggering a copy. For example, operator[], as used by WriteInto,
returns a character-type reference that allows mutation, so it must
trigger a copy.
Interesting. Thanks for the detail. This is definitely an area I'm not
knowledgeable about.

PK
--
Chromium Developers mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
Elliot Glaysher (Chromium)
2010-07-18 04:57:35 UTC
Permalink
Post by Mark Mentovai
In the current state of the world, GCC’s libstdc++ basic_strings are
copy-on-write. MSVC’s, I believe, are not.
In a different life, I've had problems with older versions of boost
where boost::serialization would const_cast<> away the constness of
the char* buffer returned by basic_string::data() and write into it,
subverting the COW semantics. I was as baffled by strings suddenly
changing their own value as you'd expect.

After tracking *that* down, I am all about COW strings going away.

-- Elliot
--
Chromium Developers mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
Darin Fisher
2010-07-20 06:48:20 UTC
Permalink
Post by Eric Roman
Does Chromium have any fixed expectations on whether the underlying
std::string implementations will be reference counted and
copy-on-write ?
No expectations. It’s an implementation detail. basic_string may be
either COW or not.
Post by Eric Roman
Are there any compiler gurus here that know how reliable this
implementation detail is across platforms, chrome's build settings,
and the future?
In the current state of the world, GCC’s libstdc++ basic_strings are
copy-on-write. MSVC’s, I believe, are not.
I wonder if the COW implementation is costing us more than it gains us.
The COW implementation needs to use thread-safe reference counting,
which can be costly.

-Darin
As Avi points out, C++0x will cause changes in this area.
Post by Eric Roman
We use WriteInto() sometimes to get a pointer to the buffer underneath
the
Post by Eric Roman
string and write characters into it. That might break if the destination
was COW. I believe we decided that we could safely do this for all STL
std::string implementations.
A conforming COW basic_string needs to treat certain operations as
triggering a copy. For example, operator[], as used by WriteInto,
returns a character-type reference that allows mutation, so it must
trigger a copy.
Mark
--
http://groups.google.com/a/chromium.org/group/chromium-dev
--
Chromium Developers mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
Albert J. Wong (王重傑)
2010-07-20 18:23:15 UTC
Permalink
Post by Darin Fisher
Post by Eric Roman
Does Chromium have any fixed expectations on whether the underlying
std::string implementations will be reference counted and
copy-on-write ?
No expectations. It’s an implementation detail. basic_string may be
either COW or not.
Post by Eric Roman
Are there any compiler gurus here that know how reliable this
implementation detail is across platforms, chrome's build settings,
and the future?
In the current state of the world, GCC’s libstdc++ basic_strings are
copy-on-write. MSVC’s, I believe, are not.
I wonder if the COW implementation is costing us more than it gains us.
The COW implementation needs to use thread-safe reference counting,
which can be costly.
Here's a Herb Sutter article talking about COW strings:

http://www.gotw.ca/publications/optimizations.htm

It's probably worth benchmarking to find out. There's some analysis that
people at Google have done on this (links are internal, so please e-mail me
if you want them). Gcc apparently does have a non-COW string implementation
under the symbol:__gnu_cxx::_vstring. We could try swapping out
implementations for a quick test. However, it's not going to be 100%
trivial since mixing the two string implementations will apparently cause
ABI issues.

-Albert
Post by Darin Fisher
-Darin
As Avi points out, C++0x will cause changes in this area.
Post by Eric Roman
We use WriteInto() sometimes to get a pointer to the buffer underneath
the
Post by Eric Roman
string and write characters into it. That might break if the
destination
Post by Eric Roman
was COW. I believe we decided that we could safely do this for all STL
std::string implementations.
A conforming COW basic_string needs to treat certain operations as
triggering a copy. For example, operator[], as used by WriteInto,
returns a character-type reference that allows mutation, so it must
trigger a copy.
Mark
--
http://groups.google.com/a/chromium.org/group/chromium-dev
--
http://groups.google.com/a/chromium.org/group/chromium-dev
--
Chromium Developers mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
Evan Martin
2010-07-20 18:39:28 UTC
Permalink
On Tue, Jul 20, 2010 at 11:23 AM, Albert J. Wong (王重傑)
Post by Darin Fisher
I wonder if the COW implementation is costing us more than it gains us.
The COW implementation needs to use thread-safe reference counting,
which can be costly.
  http://www.gotw.ca/publications/optimizations.htm
It's probably worth benchmarking to find out.  There's some analysis that
people at Google have done on this (links are internal, so please e-mail me
if you want them).  Gcc apparently does have a non-COW string implementation
under the symbol:__gnu_cxx::_vstring. We could try swapping out
implementations for a quick test.  However, it's not going to be 100%
trivial since mixing the two string implementations will apparently cause
ABI issues.
I've been thinking about that too. I guess we don't use any C++
libraries so we'd be ok?
If there's a way we can replace the name "std::string" to that I'd
like to try it; otherwise, I expect a patch to s/std::string/string/
throughout the code base to not go over well. :)
--
Chromium Developers mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
Albert J. Wong (王重傑)
2010-07-20 18:46:10 UTC
Permalink
On Tue, Jul 20, 2010 at 11:23 AM, Albert J. Wong (王重傑)
Post by Albert J. Wong (王重傑)
Post by Darin Fisher
I wonder if the COW implementation is costing us more than it gains us.
The COW implementation needs to use thread-safe reference counting,
which can be costly.
http://www.gotw.ca/publications/optimizations.htm
It's probably worth benchmarking to find out. There's some analysis that
people at Google have done on this (links are internal, so please e-mail
me
Post by Albert J. Wong (王重傑)
if you want them). Gcc apparently does have a non-COW string
implementation
Post by Albert J. Wong (王重傑)
under the symbol:__gnu_cxx::_vstring. We could try swapping out
implementations for a quick test. However, it's not going to be 100%
trivial since mixing the two string implementations will apparently cause
ABI issues.
I've been thinking about that too. I guess we don't use any C++
libraries so we'd be ok?
If there's a way we can replace the name "std::string" to that I'd
like to try it; otherwise, I expect a patch to s/std::string/string/
throughout the code base to not go over well. :)
I figured the easiest hack would be to pop open libstdc++, and change the
typedef for std::string inside /usr/include/c++/4.2.4/bits/stringfwd.h
and /usr/include/c++/4.2.4/debug/string from

./debug/string: typedef basic_string<char> string;
./bits/stringfwd.h: typedef basic_string<char> string;

to the equiv of the ones in vstring_fwd.h:

./ext/vstring_fwd.h: typedef __versa_string<char>
__vstring;
./ext/vstring_fwd.h: typedef __versa_string<wchar_t>
__wvstring;

I'm sure this will have all sorts of un-nice consequences, but I think it
will work in theory...

-Albert
--
Chromium Developers mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
Evan Martin
2010-07-20 18:59:10 UTC
Permalink
On Tue, Jul 20, 2010 at 11:46 AM, Albert J. Wong (王重傑)
Post by Albert J. Wong (王重傑)
I've been thinking about that too.  I guess we don't use any C++
libraries so we'd be ok?
If there's a way we can replace the name "std::string" to that I'd
like to try it; otherwise, I expect a patch to s/std::string/string/
throughout the code base to not go over well.  :)
I figured the easiest hack would be to pop open libstdc++, and change the
typedef for std::string inside /usr/include/c++/4.2.4/bits/stringfwd.h
and /usr/include/c++/4.2.4/debug/string from
./debug/string:  typedef basic_string<char>    string;
./bits/stringfwd.h:  typedef basic_string<char>    string;
./ext/vstring_fwd.h:  typedef __versa_string<char>
   __vstring;
./ext/vstring_fwd.h:  typedef __versa_string<wchar_t>
    __wvstring;
I'm sure this will have all sorts of un-nice consequences, but I think it
will work in theory...
Yeah, sorry, I meant "a way [contained to our own code base] to
replace ...". :P
I don't want to break C++ compilation for my whole system!
--
Chromium Developers mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
Scott Hess
2010-07-20 19:07:04 UTC
Permalink
1) Use the new thingie for string16.
2) Stop using wstring, dammit.
3) Profit!

-scott
Post by Evan Martin
On Tue, Jul 20, 2010 at 11:46 AM, Albert J. Wong (王重傑)
Post by Albert J. Wong (王重傑)
I've been thinking about that too.  I guess we don't use any C++
libraries so we'd be ok?
If there's a way we can replace the name "std::string" to that I'd
like to try it; otherwise, I expect a patch to s/std::string/string/
throughout the code base to not go over well.  :)
I figured the easiest hack would be to pop open libstdc++, and change the
typedef for std::string inside /usr/include/c++/4.2.4/bits/stringfwd.h
and /usr/include/c++/4.2.4/debug/string from
./debug/string:  typedef basic_string<char>    string;
./bits/stringfwd.h:  typedef basic_string<char>    string;
./ext/vstring_fwd.h:  typedef __versa_string<char>
   __vstring;
./ext/vstring_fwd.h:  typedef __versa_string<wchar_t>
    __wvstring;
I'm sure this will have all sorts of un-nice consequences, but I think it
will work in theory...
Yeah, sorry, I meant "a way [contained to our own code base] to
replace ...".  :P
I don't want to break C++ compilation for my whole system!
--
   http://groups.google.com/a/chromium.org/group/chromium-dev
--
Chromium Developers mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
Albert J. Wong (王重傑)
2010-07-20 19:11:07 UTC
Permalink
On Tue, Jul 20, 2010 at 11:46 AM, Albert J. Wong (王重傑)
Post by Albert J. Wong (王重傑)
Post by Evan Martin
I've been thinking about that too. I guess we don't use any C++
libraries so we'd be ok?
If there's a way we can replace the name "std::string" to that I'd
like to try it; otherwise, I expect a patch to s/std::string/string/
throughout the code base to not go over well. :)
I figured the easiest hack would be to pop open libstdc++, and change the
typedef for std::string inside /usr/include/c++/4.2.4/bits/stringfwd.h
and /usr/include/c++/4.2.4/debug/string from
./debug/string: typedef basic_string<char> string;
./bits/stringfwd.h: typedef basic_string<char> string;
./ext/vstring_fwd.h: typedef __versa_string<char>
__vstring;
./ext/vstring_fwd.h: typedef __versa_string<wchar_t>
__wvstring;
I'm sure this will have all sorts of un-nice consequences, but I think it
will work in theory...
Yeah, sorry, I meant "a way [contained to our own code base] to
replace ...". :P
I don't want to break C++ compilation for my whole system!
Right right... That was a suggestion just for an initial test to see how
much it would actually improve things. If that still is annoying, cp -r the
c++ directory to a new one. Modify the copy. Then pass -isystem in the
cflags....

Long term, switching from std::string to string is the "righter" solution,
but requires a lot of code cleanup along with some include hacks.

-Albert

P.S. I am thinking about trying this test, but it will be a bit before I
have the time...
--
Chromium Developers mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
Mark Mentovai
2010-07-20 19:24:58 UTC
Permalink
Post by Evan Martin
On Tue, Jul 20, 2010 at 11:23 AM, Albert J. Wong (王重傑)
 However, it's not going to be 100%
trivial since mixing the two string implementations will apparently cause
ABI issues.
I've been thinking about that too.  I guess we don't use any C++
libraries so we'd be ok?
We probably don’t need to be concerned with binary compatibility, but
source compatibility is still a problem. We can’t redefine
std::basic_string (outside of single-machine tests to see whether or
not this change would have a measurable impact), and we’ll have a hard
time modifying third-party code, so we might get caught having to
convert between std::basic_string and __gnu_cxx::__versa_string at
various third-party library boundaries.

Mark
--
Chromium Developers mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
James Su
2010-07-20 20:08:37 UTC
Permalink
(Using the correct email address, why gmail can't choose the correct one for
me?).
On Tue, Jul 20, 2010 at 11:23 AM, Albert J. Wong ($B2&=E7f(B)
Post by Albert J. Wong (王重傑)
However, it's not going to be 100%
trivial since mixing the two string implementations will apparently
cause
Post by Albert J. Wong (王重傑)
ABI issues.
I've been thinking about that too. I guess we don't use any C++
libraries so we'd be ok?
We probably don$B!G(Bt need to be concerned with binary compatibility, but
source compatibility is still a problem. We can$B!G(Bt redefine
std::basic_string (outside of single-machine tests to see whether or
not this change would have a measurable impact), and we$B!G(Bll have a hard
time modifying third-party code, so we might get caught having to
convert between std::basic_string and __gnu_cxx::__versa_string at
various third-party library boundaries.
Binary compatibility might be important to us, for example on Linux, gtk may
load a bunch of external modules, like theme engine, input method module,
etc. It might cause problem if those modules use different std::string
implementation than us.
Mark
--
http://groups.google.com/a/chromium.org/group/chromium-dev
--
Chromium Developers mailing list: chromium-***@chromium.org
View archives, change email options, or unsubscribe:
http://groups.google.com/a/chromium.org/group/chromium-dev
Loading...