Age | Commit message (Collapse) | Author | Files | Lines |
|
(#7773)
Remove dependency of System.Globalization.Native.so on specific ICU version
|
|
* Enable RegionInfo netstandard 1.7 APIs
* Fix the typo
* lowercase TRUE and FALSE
|
|
The issue was the symbol is exported by the ICU lib. Including headers
was not enough. The linker requires the libraries to succeed.
With this fix, CoreCLR successfully builds on Gentoo Linux 100%.
Tested with LXC gentoo container on Ubuntu machine.
Steps to configure and build:
https://gist.github.com/jasonwilliams200OK/1a2e2c0e904ffa95faf6333fcd88d9b8
Fix #5160
|
|
We should ignore empty collaction elements at the end of the string
when doing our EndsWith checks. This means the match ICU finds might
not span to the end of string, but the only elements after the match
before the end are completely ignorable.
U+00AD (SOFT HYPHEN) is one such case where the codepoint is completely
ignorable.
Fixes dotnet/corefx#3467
|
|
Previously, we would just ask ICU what it thought the default locale
was, since that seemed like a reasonable thing to do. However, in cases
where LANG, LC_MESSAGES and LC_ALL where unset and setlocale(3) returned
"C", ICU would use "en-US-POSIX" as a default locale.
The above case is actually what happens by default when you are running
in docker and en-US-POSIX has very odd collation rules (ASCII characters
which differ only by case are still treated as seperate letters) which
trip folks up.
So in this case, we'll use Invariant. If setlocale(3) returns a non
C/POSIX locale or any of LANG, LC_MESSAGES, or LC_ALL are set to non
empty values, we'll continue to let ICU figure out what to do.
|
|
Currently only enabled for arm64
|
|
|
|
Issue #3669
Created a common cmake strip_symbols function that all the modules and programs
use to strip the symbols out of the main into a separate .dbg (Linux) or .dSYM (OSX)
file.
Added an install_clr cmake function to encapsulate the install logic.
Changed all the library module cmake install lines from a TARGETS to a FILES one. The
TARGETS based install directives caused cmake to relink the binary and copy the unstripped
version to the install path. Left the all programs like corerun or ildasm as TARGETS
installs because on OSX FILES type installs don't get marked as executable.
Need to use "get_property(strip_source_file TARGET ${targetName} PROPERTY LOCATION)" for
the older versions of cmake and "set(strip_source_file $<TARGET_FILE:${targetName}>)" on
newer versions (v3 or greater).
|
|
This reverts commit fb80bad2ed19970472ddefe539520abef42a52d0.
|
|
Fixed by calling ICU's ucal_getTimeZoneDisplayName to read the display names for the current locale.
Fix https://github.com/dotnet/corefx/issues/2748
|
|
|
|
This reverts commit 6b1d2938ec4a5a2c64fd849797ec7800ed3ab575.
|
|
|
|
For consistency and to enable eventual sharing of the same code with CoreRT, I have changed the naming convention for System.Globalization.Native exports to match dotnet/corefx#4818.
|
|
|
|
The IdnaConformanceTests fail on Unix because \u00DF, \u200C and \u200D characters are not being handled as specified in the http://www.unicode.org/Public/idna/6.0.0/IdnaTest.txt file.
The fix is to use UIDNA_NONTRANSITIONAL_TO_UNICODE and UIDNA_CHECK_CONTEXTJ options when calling uidna_openUTS46.
Partial fix for https://github.com/dotnet/corefx/issues/3406.
|
|
symbols
By default, ICU alternate shifted collation handling only ignores punctuation, not all symbols, so change the "variable top" to include all symbols and currency characters.
Fix #4907
|
|
|
|
This change replaces all calls of OS specific functions in the GC by a call to
a platform agnostic interface. Critical sections were abstracted too.
The logging file access was changed to use CRT functions instead of Windows specific APIs.
A "size" member was added to the card_table_info so that we can pass the right
size to the VirtualRelease method when destroying the card table.
I have also fixed a bug in the gc_heap::make_card_table error path where when VirtualCommit
failed, it called VirtualRelease with size that was not the reserved size, but
the committed size.
Other related changes
- All interlocked operations moved to Interlocked class as static methods
- Removed unused function prototypes
- Shuffled stuff in the root CMakeLists.txt to enable building the GC sample using the
settings inherited from the root CMakeLists.txt and to clean up some things that have
rotted over time, like the FEATURE_xxx macros not being in one alphabetically ordered
block
- Fixed the VOLATILE_MEMORY_BARRIER macro in the gcenv.base.h
- Replaced uint32_t thread id by EEThreadId
- Removed thread handles storage (g_gc_thread) from the GC. The thread handle is closed right after the thread is launched. That allowed me to get rid of the GCThreadHandle
- Renamed the methods of the EEThreadId to be easier to understand
- Moved the gcenv.windows.cpp and gcenv.unix.cpp to the sample folder
|
|
1. When IgnoreSymbols is true, ensure we still ignore half and fullwidth characters that are symbols.
2. Hiragana-Katakana characters differ at the tertiary strength, fixing the rule.
3. Fix collation on OSX which uses ICU 55.1.
ICU 55 doesn't support having certain unicode characters using primary '<' rules.
These characters are not necessary in the rules, since Windows always treats them the same.
Removing 0x3099 and 0x309A from the half/full width rules.
|
|
|
|
|
|
|
|
|
|
passed in. This is in preparation of creating different UCollators for each option.
|
|
Cache UCollators in a Locale
|
|
Creating a UCollator is an expensive operation and we are presently
doing it on ever collation operation. We can improve this by caching
the UCollators we use for collation on the CompareInfo object itself.
This change introduces a new method GetSortHandle which gives back an
opaque wrapper which can be used in collation operations instead of a
culture name.
Internally we represent this is a struct holding the two types of
UCollators we care about (if we add additional collators per locale with
different options to handle other types of CompareOption flags, we can
cache these as well). Collation methods can get a `const UCollator*`
reference from the sort handle which is safe to share across
threads (per the ICU Design Guidelines[1]).
Unfortunately, tracking the lifetime of the SortHandle itself is not as
straightfoward as I would like. Right now, we use a SafeHandle to wrap
the internal handle and rely on the finalizer of the class to clean up
the native resources. However this means that the following code sample
will create two finalizable objects:
```csharp
var c1 = new CultureInfo("en-US").CompareInfo;
var c2 = new CultureInfo("en-US").CompareInfo;
```
If this ends up being an issue, we could explore an approach where we
keep a cahce of SortHandles in managed code and pass out references to
that SortHandle which would let us share a single SortHandle for a given
locale across more than one CompareInfo object.
Wins are seeing in places where we previously did lots of string
comparisions in a tight loop (for example: dotnet/corefx#3811) moving
these operations down to ~6ms per iteration vs ~330ms on my local machine.
[1]: http://userguide.icu-project.org/design
|
|
DateTimeFormat.ShortDatePattern should use CLDR 'short' format on Unix.
|
|
The StartsWith ICU wrapper was not checking the result of usearch_first
to see if it was USEARCH_DONE, indicating no match found. This has two
ramifications:
1. When there isn't a match, USEARCH_DONE (-1) gets passed in as the
textLength argument to ucol_openElements, which treats -1 as meaning
the string isn't null-terminated, and thus ends up walking the string
looking for non-ignorable collation elements. Our tests have been passing
because they've been using strings containing only non-ignorable
elements, and thus the first character checked causes us to bail and correctly
return false. If nothing else, this is an unnecessary perf overhead.
2. But on top of that if there are only ignorable collation elements
before the first null character in the string (e.g. if the string begins
with a null character), then because we told ICU that the string ended
at the first null character, it'll stop walking the string and return
a match. e.g. "\0bug".StartsWith("test") returns true incorrectly.
This commit simply adds a check for USEARCH_DONE to StartsWith.
EndsWith already has such a check.
|
|
The DateTimeFormat.ShortDatePattern is currently defaulting to using CLDR's 'yMd' skeleton. However, this value doesn't produce the best format for all cultures, ex. "de-DE". LongDatePattern uses CLDR's 'full' format. To be symmetrical, the ShortDatePattern should be using CLDR's 'short' format.
Fix https://github.com/dotnet/coreclr/issues/1736.
|
|
Our current ICU shims for StartsWith, EndsWith, IndexOf, and LastIndexOf
take the length of the source string but not the length of the target
string. This forces ICU to compute the length of the string by searching
for a null terminator. We can save those costs and be more accurate
around nulls in the target string by passing the known length in.
|
|
Remove OSX Homebrew ICU dependency
|
|
There were a few problems that needed to be addressed:
- Our detection logic around testing if ICU supported a feature was
still checking for C++ stuff instead of the coresponding C
code (which we ended up using).
- There was some cleanup we could do now that the OSX and other Unix
builds were split apart
|
|
|
|
|
|
This matches what we do in other places in calendarData.cpp, the RAII
pattern will make it easier to not leak memory.
|
|
OSX ships with a copy of ICU (their globalization APIs are built on top
of it). Since we only use stable c based APIs, we can link against it
using the methods described in using a System ICU in the ICU User's
Guide (basically we disable function renaming, don't use C++ and only
use stable APIs).
The ICU headers are not part of the SDK, so we continue to need ICU
installed via Homebrew as a build time dependency.
Fixes dotnet/corefx#3849
|
|
|
|
Our current implementation of IndexOfOrdinal for strings on Unix uses Substring to get the piece of the source string we care about; this results in an unnecessary allocation / string copy. When using OrdinalIgnoreCase, we also convert both the source and search strings to upper-case using ToUpperInvariant, resulting in more allocations. And our LastIndexOfOrdinal implementation delegates to IndexOfOrdinal repeatedly, incurring such allocations potentially multiple times.
This change reimplements Ordinal searching in managed code to not use Substring, and it implements OrdinalIgnoreCase searching via new functions exposed in the native globalization shim, so as to use ICU without having to make managed/native transitions for each character.
With the changes, {Last}IndexOf with Ordinal/OrdinalIgnoreCase are now allocateion-free (as you'd expect), and throughput when startIndex/count and/or OrdinalIgnoreCase are used is increased significantly, on my machine anywhere from 20% to 3x, depending on the inputs.
|
|
Use "= delete" syntax to make it clear the IcuHolder copy constructor
and assignment opperators are removed.
Remove superfluous "public" modifier on the struct closers used by the
IcuHolders.
|
|
|
|
Remove all the uses of the icu::Locale type in favor of just using a
char* which is the raw locale id (which is exactly what all the ICU C
apis use for a locale).
The meat of this change si in locale.cpp to actually handle doing the
conversion from UChar* to char*. The rest of the places are dealing
with the fallout (GetLocale now has a different signiture and the
.getName() dance is no longer needed as we have a raw locale name all
the time now).
|
|
To prepare for removing icu::Locale in favor of just using the id
directly, remove all the uses of Locale methods except for .getName().
We now use GetLocale to create a Locale but then turn it into a char*
for all the helper methods.
After this change, we can update GetLocale to do locale parsing into a
char buffer and remove all the locale.getName() calls with just `locale'.
|
|
|
|
Getting the regular eras is straight forward, we can do the thing we do
for other locale data and just ask ICU using a specific
UDateFormatSymbolType. For abbreviated eras, there's no C API, but we
can try to just read the data from ICU resources and fall back to the
standard width eras if that doesn't work.
|
|
|
|
|
|
|
|
This change removes NumberFormat in favor of UNumberFormat. There is a
bit of work that needs to happen in order to keep the normalization code
we use to convert an ICU pattern so to something we can match against
working.
Instead of UnicodeStrings, the input to the normalization function is
now a UChar* and we build up a std::string during normalization. This
allows us to also skip a conversion from UChar* back to char* so we can
find the correct pattern in our collection of patterns to examine.
|
|
|