Import simdutf8 0.1.4upstream/0.1.4 upstream

author: DongHun Kwak <dh0128.kwak@samsung.com> 2023-05-17 16:16:16 +0900
committer: DongHun Kwak <dh0128.kwak@samsung.com> 2023-05-17 16:16:16 +0900
commit: a56aafc610b5a5112b48320f9532ff3da969eb2f (patch)
tree: 7e6cc4da83f6bfd1c56a6f565064a8cd6251bbaa
download: rust-simdutf8-upstream/0.1.4.tar.gz
rust-simdutf8-upstream/0.1.4.tar.bz2
rust-simdutf8-upstream/0.1.4.zip
26 files changed, 3769 insertions, 0 deletions
diff --git a/.cargo_vcs_info.json b/.cargo_vcs_info.json
new file mode 100644
index 0000000..aa3a67d
--- /dev/null
+++ b/.cargo_vcs_info.json
@@ -0,0 +1,6 @@
+{
+  "git": {
+    "sha1": "25131dd44fd98edfa85e8306d2e107617d084a17"
+  },
+  "path_in_vcs": ""
+}
+\ No newline at end of file
diff --git a/CHANGELOG.md b/CHANGELOG.md
new file mode 100644
index 0000000..aacab17
--- /dev/null
+++ b/CHANGELOG.md
@@ -0,0 +1,75 @@
+# Changelog
+## [Unreleased]
+
+## [0.1.4] - 2022-04-02
+
+### New features
+* WASM (wasm32) support
+
+### Improvements
+* Make aarch64 SIMD implementation work on Rust 1.59/1.60 with create feature `aarch64_neon`
+* For Rust Nightly the aarch64 SIMD implementation is enabled out of the box.
+* Starting with Rust 1.61 the aarch64 SIMD implementation is expected to be enabled out of the box as well.
+
+### Performance
+* Prefetch was disabled for aarch64 since the requisite intrinsics have not been stabilized.
+
+## [0.1.3] - 2021-05-14
+### New features
+* Low-level streaming validation API in `simdutf8::basic::imp`
+
+## [0.1.2] - 2021-05-09
+### New features
+* Aarch64 support (e.g. Apple Silicon, Raspberry Pi 4, ...) with nightly Rust and crate feature `aarch64_neon`
+
+### Performance
+* Another speedup on pure ASCII data
+* Aligned reads have been removed as the performance was worse overall.
+* Prefetch is used selectively on AVX 2, where it provides a slight benefit on some Intel CPUs.
+
+[Comparison vs v0.1.1 on x86-64](https://user-images.githubusercontent.com/3736990/117568946-7a2fdb00-b0c3-11eb-936e-358850f0a9ad.png)
+
+### Other
+* Refactored SIMD integration to allow easy implementation for new architectures
+* Full test coverage
+* Thoroughly fuzz-tested
+
+## [0.1.1] - 2021-04-26
+### Performance
+* Large speedup on small inputs from delegation to std lib
+* Up to 50% better peak throughput on ASCII
+* `#[inline]` main entry points for a small general speedup.
+
+[Benchmark against v0.1.0](https://user-images.githubusercontent.com/3736990/116128298-12dc5900-a6c9-11eb-8c23-a117b3e57edb.png)
+
+### Other
+* Make both Utf8Error variants implement `std::error::Error`
+* Make `basic::Utf8Error` implement `core::fmt::Display`
+* Document Minimum Supported Rust Version (1.38.0).
+* Reduce package size.
+* Documentation updates
+
+## [0.1.0] - 2021-04-21
+- Documentation updates only.
+
+0.1.x releases will have API compatibility.
+
+## [0.0.3] - 2021-04-21
+- Documentation update only.
+
+## [0.0.2] - 2021-04-20
+- Documentation update only.
+
+## [0.0.1] - 2021-04-20
+- Initial release.
+
+[Unreleased]: https://github.com/rusticstuff/simdutf8/compare/v0.1.4...HEAD
+[0.1.4]: https://github.com/rusticstuff/simdutf8/compare/v0.1.3...v0.1.4
+[0.1.3]: https://github.com/rusticstuff/simdutf8/compare/v0.1.2...v0.1.3
+[0.1.2]: https://github.com/rusticstuff/simdutf8/compare/v0.1.1...v0.1.2
+[0.1.1]: https://github.com/rusticstuff/simdutf8/compare/v0.1.0...v0.1.1
+[0.1.0]: https://github.com/rusticstuff/simdutf8/compare/v0.0.3...v0.1.0
+[0.0.3]: https://github.com/rusticstuff/simdutf8/compare/v0.0.2...v0.0.3
+[0.0.2]: https://github.com/rusticstuff/simdutf8/compare/v0.0.1...v0.0.2
+[0.0.1]: https://github.com/rusticstuff/simdutf8/releases/tag/v0.0.1
+
diff --git a/Cargo.lock b/Cargo.lock
new file mode 100644
index 0000000..8ca68bf
--- /dev/null
+++ b/Cargo.lock
@@ -0,0 +1,7 @@
+# This file is automatically @generated by Cargo.
+# It is not intended for manual editing.
+version = 3
+
+[[package]]
+name = "simdutf8"
+version = "0.1.4"
diff --git a/Cargo.toml b/Cargo.toml
new file mode 100644
index 0000000..a605041
--- /dev/null
+++ b/Cargo.toml
@@ -0,0 +1,65 @@
+# THIS FILE IS AUTOMATICALLY GENERATED BY CARGO
+#
+# When uploading crates to the registry Cargo will automatically
+# "normalize" Cargo.toml files for maximal compatibility
+# with all versions of Cargo and also rewrite `path` dependencies
+# to registry (e.g., crates.io) dependencies.
+#
+# If you are reading this file be aware that the original Cargo.toml
+# will likely look very different (and much more reasonable).
+# See Cargo.toml.orig for the original contents.
+
+[package]
+edition = "2018"
+name = "simdutf8"
+version = "0.1.4"
+authors = ["Hans Kratz <hans@appfour.com>"]
+exclude = [
+    "/.gitignore",
+    "/.github",
+    "/.vscode",
+    "/bench",
+    "/fuzzing",
+    "/img",
+    "/inlining",
+    "TODO.md",
+]
+description = "SIMD-accelerated UTF-8 validation."
+homepage = "https://github.com/rusticstuff/simdutf8"
+documentation = "https://docs.rs/simdutf8/"
+readme = "README.md"
+keywords = [
+    "utf-8",
+    "unicode",
+    "string",
+    "validation",
+    "simd",
+]
+categories = [
+    "encoding",
+    "algorithms",
+    "no-std",
+]
+license = "MIT OR Apache-2.0"
+repository = "https://github.com/rusticstuff/simdutf8"
+
+[package.metadata.docs.rs]
+all-features = true
+rustdoc-args = [
+    "--cfg",
+    "docsrs",
+]
+default-target = "x86_64-unknown-linux-gnu"
+targets = [
+    "aarch64-unknown-linux-gnu",
+    "wasm32-unknown-unknown",
+    "wasm32-wasi",
+]
+
+[features]
+aarch64_neon = []
+aarch64_neon_prefetch = []
+default = ["std"]
+hints = []
+public_imp = []
+std = []
diff --git a/Cargo.toml.orig b/Cargo.toml.orig
new file mode 100644
index 0000000..dca2f8d
--- /dev/null
+++ b/Cargo.toml.orig
@@ -0,0 +1,47 @@
+[package]
+name = "simdutf8"
+version = "0.1.4"
+authors = ["Hans Kratz <hans@appfour.com>"]
+edition = "2018"
+description = "SIMD-accelerated UTF-8 validation."
+documentation = "https://docs.rs/simdutf8/"
+homepage = "https://github.com/rusticstuff/simdutf8"
+repository = "https://github.com/rusticstuff/simdutf8"
+readme = "README.md"
+keywords = ["utf-8", "unicode", "string", "validation", "simd"]
+categories = ["encoding", "algorithms", "no-std"]
+license = "MIT OR Apache-2.0"
+exclude = [
+    "/.gitignore",
+    "/.github",
+    "/.vscode",
+    "/bench",
+    "/fuzzing",
+    "/img",
+    "/inlining",
+    "TODO.md",
+]
+
+[features]
+default = ["std"]
+
+# enable CPU feature detection, on by default, turn off for no-std support
+std = []
+
+# expose SIMD implementations in basic::imp::* and compat::imp::*
+public_imp = []
+
+# aarch64 NEON SIMD implementation - requires Rust 1.59.0 or later
+aarch64_neon = []
+
+# enable aarch64 prefetching for minor speedup - requires nightly
+aarch64_neon_prefetch = []
+
+# deprecated - does not do anything
+hints = []
+
+[package.metadata.docs.rs]
+all-features = true
+rustdoc-args = ["--cfg", "docsrs"]
+default-target = "x86_64-unknown-linux-gnu"
+targets = ["aarch64-unknown-linux-gnu", "wasm32-unknown-unknown", "wasm32-wasi"]
diff --git a/LICENSE-Apache b/LICENSE-Apache
new file mode 100644
index 0000000..4947287
--- /dev/null
+++ b/LICENSE-Apache
@@ -0,0 +1,177 @@
+
+                                 Apache License
+                           Version 2.0, January 2004
+                        http://www.apache.org/licenses/
+
+   TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
+
+   1. Definitions.
+
+      "License" shall mean the terms and conditions for use, reproduction,
+      and distribution as defined by Sections 1 through 9 of this document.
+
+      "Licensor" shall mean the copyright owner or entity authorized by
+      the copyright owner that is granting the License.
+
+      "Legal Entity" shall mean the union of the acting entity and all
+      other entities that control, are controlled by, or are under common
+      control with that entity. For the purposes of this definition,
+      "control" means (i) the power, direct or indirect, to cause the
+      direction or management of such entity, whether by contract or
+      otherwise, or (ii) ownership of fifty percent (50%) or more of the
+      outstanding shares, or (iii) beneficial ownership of such entity.
+
+      "You" (or "Your") shall mean an individual or Legal Entity
+      exercising permissions granted by this License.
+
+      "Source" form shall mean the preferred form for making modifications,
+      including but not limited to software source code, documentation
+      source, and configuration files.
+
+      "Object" form shall mean any form resulting from mechanical
+      transformation or translation of a Source form, including but
+      not limited to compiled object code, generated documentation,
+      and conversions to other media types.
+
+      "Work" shall mean the work of authorship, whether in Source or
+      Object form, made available under the License, as indicated by a
+      copyright notice that is included in or attached to the work
+      (an example is provided in the Appendix below).
+
+      "Derivative Works" shall mean any work, whether in Source or Object
+      form, that is based on (or derived from) the Work and for which the
+      editorial revisions, annotations, elaborations, or other modifications
+      represent, as a whole, an original work of authorship. For the purposes
+      of this License, Derivative Works shall not include works that remain
+      separable from, or merely link (or bind by name) to the interfaces of,
+      the Work and Derivative Works thereof.
+
+      "Contribution" shall mean any work of authorship, including
+      the original version of the Work and any modifications or additions
+      to that Work or Derivative Works thereof, that is intentionally
+      submitted to Licensor for inclusion in the Work by the copyright owner
+      or by an individual or Legal Entity authorized to submit on behalf of
+      the copyright owner. For the purposes of this definition, "submitted"
+      means any form of electronic, verbal, or written communication sent
+      to the Licensor or its representatives, including but not limited to
+      communication on electronic mailing lists, source code control systems,
+      and issue tracking systems that are managed by, or on behalf of, the
+      Licensor for the purpose of discussing and improving the Work, but
+      excluding communication that is conspicuously marked or otherwise
+      designated in writing by the copyright owner as "Not a Contribution."
+
+      "Contributor" shall mean Licensor and any individual or Legal Entity
+      on behalf of whom a Contribution has been received by Licensor and
+      subsequently incorporated within the Work.
+
+   2. Grant of Copyright License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      copyright license to reproduce, prepare Derivative Works of,
+      publicly display, publicly perform, sublicense, and distribute the
+      Work and such Derivative Works in Source or Object form.
+
+   3. Grant of Patent License. Subject to the terms and conditions of
+      this License, each Contributor hereby grants to You a perpetual,
+      worldwide, non-exclusive, no-charge, royalty-free, irrevocable
+      (except as stated in this section) patent license to make, have made,
+      use, offer to sell, sell, import, and otherwise transfer the Work,
+      where such license applies only to those patent claims licensable
+      by such Contributor that are necessarily infringed by their
+      Contribution(s) alone or by combination of their Contribution(s)
+      with the Work to which such Contribution(s) was submitted. If You
+      institute patent litigation against any entity (including a
+      cross-claim or counterclaim in a lawsuit) alleging that the Work
+      or a Contribution incorporated within the Work constitutes direct
+      or contributory patent infringement, then any patent licenses
+      granted to You under this License for that Work shall terminate
+      as of the date such litigation is filed.
+
+   4. Redistribution. You may reproduce and distribute copies of the
+      Work or Derivative Works thereof in any medium, with or without
+      modifications, and in Source or Object form, provided that You
+      meet the following conditions:
+
+      (a) You must give any other recipients of the Work or
+          Derivative Works a copy of this License; and
+
+      (b) You must cause any modified files to carry prominent notices
+          stating that You changed the files; and
+
+      (c) You must retain, in the Source form of any Derivative Works
+          that You distribute, all copyright, patent, trademark, and
+          attribution notices from the Source form of the Work,
+          excluding those notices that do not pertain to any part of
+          the Derivative Works; and
+
+      (d) If the Work includes a "NOTICE" text file as part of its
+          distribution, then any Derivative Works that You distribute must
+          include a readable copy of the attribution notices contained
+          within such NOTICE file, excluding those notices that do not
+          pertain to any part of the Derivative Works, in at least one
+          of the following places: within a NOTICE text file distributed
+          as part of the Derivative Works; within the Source form or
+          documentation, if provided along with the Derivative Works; or,
+          within a display generated by the Derivative Works, if and
+          wherever such third-party notices normally appear. The contents
+          of the NOTICE file are for informational purposes only and
+          do not modify the License. You may add Your own attribution
+          notices within Derivative Works that You distribute, alongside
+          or as an addendum to the NOTICE text from the Work, provided
+          that such additional attribution notices cannot be construed
+          as modifying the License.
+
+      You may add Your own copyright statement to Your modifications and
+      may provide additional or different license terms and conditions
+      for use, reproduction, or distribution of Your modifications, or
+      for any such Derivative Works as a whole, provided Your use,
+      reproduction, and distribution of the Work otherwise complies with
+      the conditions stated in this License.
+
+   5. Submission of Contributions. Unless You explicitly state otherwise,
+      any Contribution intentionally submitted for inclusion in the Work
+      by You to the Licensor shall be under the terms and conditions of
+      this License, without any additional terms or conditions.
+      Notwithstanding the above, nothing herein shall supersede or modify
+      the terms of any separate license agreement you may have executed
+      with Licensor regarding such Contributions.
+
+   6. Trademarks. This License does not grant permission to use the trade
+      names, trademarks, service marks, or product names of the Licensor,
+      except as required for reasonable and customary use in describing the
+      origin of the Work and reproducing the content of the NOTICE file.
+
+   7. Disclaimer of Warranty. Unless required by applicable law or
+      agreed to in writing, Licensor provides the Work (and each
+      Contributor provides its Contributions) on an "AS IS" BASIS,
+      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
+      implied, including, without limitation, any warranties or conditions
+      of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
+      PARTICULAR PURPOSE. You are solely responsible for determining the
+      appropriateness of using or redistributing the Work and assume any
+      risks associated with Your exercise of permissions under this License.
+
+   8. Limitation of Liability. In no event and under no legal theory,
+      whether in tort (including negligence), contract, or otherwise,
+      unless required by applicable law (such as deliberate and grossly
+      negligent acts) or agreed to in writing, shall any Contributor be
+      liable to You for damages, including any direct, indirect, special,
+      incidental, or consequential damages of any character arising as a
+      result of this License or out of the use or inability to use the
+      Work (including but not limited to damages for loss of goodwill,
+      work stoppage, computer failure or malfunction, or any and all
+      other commercial damages or losses), even if such Contributor
+      has been advised of the possibility of such damages.
+
+   9. Accepting Warranty or Additional Liability. While redistributing
+      the Work or Derivative Works thereof, You may choose to offer,
+      and charge a fee for, acceptance of support, warranty, indemnity,
+      or other liability obligations and/or rights consistent with this
+      License. However, in accepting such obligations, You may act only
+      on Your own behalf and on Your sole responsibility, not on behalf
+      of any other Contributor, and only if You agree to indemnify,
+      defend, and hold each Contributor harmless for any liability
+      incurred by, or claims asserted against, such Contributor by reason
+      of your accepting any such warranty or additional liability.
+
+   END OF TERMS AND CONDITIONS
+\ No newline at end of file
diff --git a/LICENSE-MIT b/LICENSE-MIT
new file mode 100644
index 0000000..6802bc4
--- /dev/null
+++ b/LICENSE-MIT
@@ -0,0 +1,19 @@
+MIT License
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.
+\ No newline at end of file
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..8b9a97e
--- /dev/null
+++ b/README.md
@@ -0,0 +1,179 @@
+[![CI](https://github.com/rusticstuff/simdutf8/actions/workflows/ci.yml/badge.svg)](https://github.com/rusticstuff/simdutf8/actions/workflows/ci.yml)
+[![crates.io](https://img.shields.io/crates/v/simdutf8.svg)](https://crates.io/crates/simdutf8)
+[![docs.rs](https://docs.rs/simdutf8/badge.svg)](https://docs.rs/simdutf8)
+
+# simdutf8 – High-speed UTF-8 validation
+
+Blazingly fast API-compatible UTF-8 validation for Rust using SIMD extensions, based on the implementation from
+[simdjson](https://github.com/simdjson/simdjson). Originally ported to Rust by the developers of [simd-json.rs](https://simd-json.rs), but now heavily improved.
+
+## Status
+This library has been thoroughly tested with sample data as well as fuzzing and there are no known bugs.
+
+## Features
+* `basic` API for the fastest validation, optimized for valid UTF-8
+* `compat` API as a fully compatible replacement for `std::str::from_utf8()`
+* Supports AVX 2 and SSE 4.2 implementations on x86 and x86-64
+* 🆕 ARM64 (aarch64) SIMD is supported with Rust 1.59 (use feature `aarch64_neon`) and Nightly (no extra feature needed)
+* 🆕 WASM (wasm32) SIMD is supported
+* x86-64: Up to 23 times faster than the std library on valid non-ASCII, up to four times faster on ASCII
+* aarch64: Up to eleven times faster than the std library on valid non-ASCII, up to four times faster on ASCII (Apple Silicon)
+* Faster than the original simdjson implementation
+* Selects the fastest implementation at runtime based on CPU support (on x86)
+* Falls back to the excellent std implementation if SIMD extensions are not supported
+* Written in pure Rust
+* No dependencies
+* No-std support
+
+## Quick start
+Add the dependency to your Cargo.toml file:
+```toml
+[dependencies]
+simdutf8 = "0.1.4"
+```
+For ARM64 SIMD support on Rust 1.59:
+```toml
+[dependencies]
+simdutf8 = { version = "0.1.4", features = ["aarch64_neon"] }
+```
+
+Use `simdutf8::basic::from_utf8()` as a drop-in replacement for `std::str::from_utf8()`.
+
+```rust
+use simdutf8::basic::from_utf8;
+
+println!("{}", from_utf8(b"I \xE2\x9D\xA4\xEF\xB8\x8F UTF-8!").unwrap());
+```
+
+If you need detailed information on validation failures, use `simdutf8::compat::from_utf8()`
+instead.
+
+```rust
+use simdutf8::compat::from_utf8;
+
+let err = from_utf8(b"I \xE2\x9D\xA4\xEF\xB8 UTF-8!").unwrap_err();
+assert_eq!(err.valid_up_to(), 5);
+assert_eq!(err.error_len(), Some(2));
+```
+
+## APIs
+
+### Basic flavor
+Use the `basic` API flavor for maximum speed. It is fastest on valid UTF-8, but only checks
+for errors after processing the whole byte sequence and does not provide detailed information if the data
+is not valid UTF-8. `simdutf8::basic::Utf8Error` is a zero-sized error struct.
+
+### Compat flavor
+The `compat` flavor is fully API-compatible with `std::str::from_utf8()`. In particular, `simdutf8::compat::from_utf8()`
+returns a `simdutf8::compat::Utf8Error`, which has `valid_up_to()` and `error_len()` methods. The first is useful for
+verification of streamed data. The second is useful e.g. for replacing invalid byte sequences with a replacement character.
+
+It also fails early: errors are checked on the fly as the string is processed and once
+an invalid UTF-8 sequence is encountered, it returns without processing the rest of the data.
+This comes at a slight performance penalty compared to the `basic` API even if the input is valid UTF-8.
+
+## Implementation selection
+
+### X86
+The fastest implementation is selected at runtime using the `std::is_x86_feature_detected!` macro, unless the CPU
+targeted by the compiler supports the fastest available implementation.
+So if you compile with `RUSTFLAGS="-C target-cpu=native"` on a recent x86-64 machine, the AVX 2 implementation is selected at
+compile-time and runtime selection is disabled.
+
+For no-std support (compiled with `--no-default-features`) the implementation is always selected at compile time based on
+the targeted CPU. Use `RUSTFLAGS="-C target-feature=+avx2"` for the AVX 2 implementation or `RUSTFLAGS="-C target-feature=+sse4.2"`
+for the SSE 4.2 implementation.
+
+### ARM64
+The SIMD implementation is only available on Rust Nightly and Rust 1.59 or later. On Rust Nightly it is now turned on
+automatically.
+To get the SIMD implementation with Rust 1.59 (and likely 1.60) the crate feature `aarch64_neon` needs to be enabled.
+For Rust Nightly this will no longer be required (but does not hurt either). It is expected that the SIMD implementation
+will be enabled automatically starting with Rust 1.61.
+
+### WASM32
+For wasm32 support, the implementation is selected at compile time based on the presence of the `simd128` target feature.
+Use `RUSTFLAGS="-C target-feature=+simd128"` to enable the WASM SIMD implementation.  WASM, at
+the time of this writing, doesn't have a way to detect SIMD through WASM itself.  Although this capability
+is available in various WASM host environments (e.g., [wasm-feature-detect] in the web browser), there is no portable
+way from within the library to detect this.
+
+[wasm-feature-detect]: https://github.com/GoogleChromeLabs/wasm-feature-detect
+
+#### Building/Targeting WASM
+See [this document](./wasm32-development.md) for more details.
+
+### Access to low-level functionality
+
+If you want to be able to call a SIMD implementation directly, use the `public_imp` feature flag. The validation implementations are then accessible in the `simdutf8::{basic, compat}::imp` hierarchy. Traits
+facilitating streaming validation are available there as well.
+
+## Optimisation flags
+Do not use [`opt-level = "z"`](https://doc.rust-lang.org/cargo/reference/profiles.html), which prevents inlining and makes
+the code quite slow.
+
+## Minimum Supported Rust Version (MSRV)
+This crate's minimum supported Rust version is 1.38.0.
+
+## Benchmarks
+The benchmarks have been done with [criterion](https://bheisler.github.io/criterion.rs/book/index.html), the tables
+are created with [critcmp](https://github.com/BurntSushi/critcmp). Source code and data are in the
+[bench directory](https://github.com/rusticstuff/simdutf8/tree/main/bench).
+
+The naming schema is id-charset/size. _0-empty_ is the empty byte slice, _x-error/66536_ is a 64KiB slice where the very
+first character is invalid UTF-8. Library versions are simdutf8 v0.1.2 and simdjson v0.9.2. When comparing
+with simdjson simdutf8 is compiled with `#inline(never)`.
+
+Configurations:
+* X86-64: PC with an AMD Ryzen 7 PRO 3700 CPU (Zen2) on Linux with Rust 1.52.0
+* Aarch64: Macbook Air with an Apple M1 CPU (Apple Silicon) on macOS with Rust rustc 1.54.0-nightly (881c1ac40 2021-05-08).
+
+### simdutf8 basic vs std library on x86-64 (AMD Zen2)
+![image](https://user-images.githubusercontent.com/3736990/117568104-1c00f900-b0bf-11eb-938f-4c253d192480.png)
+Simdutf8 is up to 23 times faster than the std library on valid non-ASCII, up to four times on pure ASCII.
+
+### simdutf8 basic vs std library on aarch64 (Apple Silicon)
+![image](https://user-images.githubusercontent.com/3736990/117568160-42bf2f80-b0bf-11eb-86a4-9aeee4cee87d.png)
+Simdutf8 is up to to eleven times faster than the std library on valid non-ASCII, up to four times faster on
+pure ASCII.
+
+### simdutf8 basic vs simdjson on x86-64
+![image](https://user-images.githubusercontent.com/3736990/117568231-80bc5380-b0bf-11eb-8e90-1dcc6d966ebd.png)
+Simdutf8 is faster than simdjson on almost all inputs.
+
+### simdutf8 basic vs simdutf8 compat UTF-8 on x86-64
+![image](https://user-images.githubusercontent.com/3736990/117568270-af3a2e80-b0bf-11eb-8ec4-e5a0a4ad7210.png)
+There is a small performance penalty to continuously checking the error status while processing data, but detecting
+errors early provides a huge benefit for the _x-error/66536_ benchmark.
+
+## Technical details
+For inputs shorter than 64 bytes validation is delegated to `core::str::from_utf8()` except for the direct-access
+functions in `simdutf8::{basic, compat}::imp`.
+
+The SIMD implementation is mostly similar to the one in simdjson except that it is has additional optimizations
+for the pure ASCII case. Also it uses prefetch with AVX 2 on x86 which leads to slightly better performance with
+some Intel CPUs on synthetic benchmarks.
+
+For the compat API, we need to check the error status vector on each 64-byte block instead of just aggregating it. If an
+error is found, the last bytes of the previous block are checked for a cross-block continuation and then
+`std::str::from_utf8()` is run to find the exact location of the error.
+
+Care is taken that all functions are properly inlined up to the public interface.
+
+## Thanks
+* to the authors of simdjson for coming up with the high-performance SIMD implementation and in particular to Daniel Lemire
+  for his feedback. It was very helpful.
+* to the authors of the simdjson Rust port who did most of the heavy lifting of porting the C++ code to Rust.
+
+
+## License
+This code is dual-licensed under the [Apache License 2.0](https://www.apache.org/licenses/LICENSE-2.0.html) and the [MIT License](https://opensource.org/licenses/MIT).
+
+It is based on code distributed with simd-json.rs, the Rust port of simdjson, which is dual-licensed under
+the MIT license and Apache 2.0 license as well.
+
+simdjson itself is distributed under the Apache License 2.0.
+
+## References
+John Keiser, Daniel Lemire, [Validating UTF-8 In Less Than One Instruction Per Byte](https://arxiv.org/abs/2010.03090), Software: Practice and Experience 51 (5), 2021
+
diff --git a/examples/streaming.rs b/examples/streaming.rs
new file mode 100644
index 0000000..0f53a5f
--- /dev/null
+++ b/examples/streaming.rs
@@ -0,0 +1,41 @@
+#[allow(unused_imports)]
+use std::io::{stdin, Read, Result};
+
+#[cfg(all(
+    feature = "public_imp",
+    any(target_arch = "x86", target_arch = "x86_64")
+))]
+fn main() -> Result<()> {
+    use simdutf8::basic::imp::Utf8Validator;
+
+    unsafe {
+        if !std::is_x86_feature_detected!("avx2") {
+            panic!("This example only works with CPUs supporting AVX 2");
+        }
+
+        let mut validator = simdutf8::basic::imp::x86::avx2::Utf8ValidatorImp::new();
+        let mut buf = vec![0; 8192];
+        loop {
+            let bytes_read = stdin().read(buf.as_mut())?;
+            if bytes_read == 0 {
+                break;
+            }
+            validator.update(&buf);
+        }
+
+        if validator.finalize().is_ok() {
+            println!("Input is valid UTF-8");
+        } else {
+            println!("Input is not valid UTF-8");
+        }
+    }
+
+    Ok(())
+}
+
+/// Dummy main. This example requires the crate feature `public_imp`.
+#[cfg(not(all(
+    feature = "public_imp",
+    any(target_arch = "x86", target_arch = "x86_64")
+)))]
+fn main() {}
diff --git a/release.toml b/release.toml
new file mode 100644
index 0000000..09a1349
--- /dev/null
+++ b/release.toml
@@ -0,0 +1,9 @@
+pre-release-replacements = [
+  {file="CHANGELOG.md", search="## \\[Unreleased\\]", replace="## [Unreleased]\n\n## [{{version}}] - {{date}}", exactly=1},
+  {file="CHANGELOG.md", search="\\[Unreleased\\]: https://github\\.com/rusticstuff/simdutf8/compare/v[0-9.]+\\.\\.\\.HEAD", replace="[Unreleased]: https://github.com/rusticstuff/simdutf8/compare/v{{version}}...HEAD\n[{{version}}]: https://github.com/rusticstuff/simdutf8/compare/v{{prev_version}}...v{{version}}", exactly=1},
+  {file="README.md", search="simdutf8 = \"[0-9.]+\"", replace="simdutf8 = \"{{version}}\"", exactly=1},
+  {file="README.md", search="simdutf8 = \\{ version = \"[0-9.]+\"", replace="simdutf8 = { version = \"{{version}}\"", exactly=1},
+  {file="src/lib.rs", search="simdutf8 = \"[0-9.]+\"", replace="simdutf8 = \"{{version}}\"", exactly=1},
+  {file="src/lib.rs", search="simdutf8 = \\{ version = \"[0-9.]+\"", replace="simdutf8 = { version = \"{{version}}\"", exactly=1},
+]
+
diff --git a/rustfmt.toml b/rustfmt.toml
new file mode 100644
index 0000000..e7df24e
--- /dev/null
+++ b/rustfmt.toml
@@ -0,0 +1 @@
+# use defaults
+\ No newline at end of file
diff --git a/src/basic.rs b/src/basic.rs
new file mode 100644
index 0000000..2c1d042
--- /dev/null
+++ b/src/basic.rs
@@ -0,0 +1,246 @@
+//! The `basic` API flavor provides barebones UTF-8 checking at the highest speed.
+//!
+//! It is fastest on valid UTF-8, but only checks for errors after processing the whole byte sequence
+//! and does not provide detailed information if the data is not valid UTF-8. [`Utf8Error`] is a zero-sized error struct.
+//!
+//! If you need detailed error information use the functions from the [`crate::compat`] module instead.
+
+use core::str::{from_utf8_unchecked, from_utf8_unchecked_mut};
+
+use crate::implementation::validate_utf8_basic;
+
+/// Simple zero-sized UTF-8 error.
+///
+/// No information is provided where the error occured or how long the invalid byte
+/// byte sequence is.
+#[derive(Copy, Eq, PartialEq, Clone, Debug)]
+pub struct Utf8Error;
+
+impl core::fmt::Display for Utf8Error {
+    fn fmt(&self, f: &mut core::fmt::Formatter<'_>) -> core::fmt::Result {
+        f.write_str("invalid utf-8 sequence")
+    }
+}
+
+#[cfg(feature = "std")]
+impl std::error::Error for Utf8Error {}
+
+/// Analogue to [`std::str::from_utf8()`].
+///
+/// Checks if the passed byte sequence is valid UTF-8 and returns an
+/// [`std::str`] reference to the passed byte slice wrapped in `Ok()` if it is.
+///
+/// # Errors
+/// Will return the zero-sized Err([`Utf8Error`]) on if the input contains invalid UTF-8.
+#[inline]
+pub fn from_utf8(input: &[u8]) -> Result<&str, Utf8Error> {
+    unsafe {
+        validate_utf8_basic(input)?;
+        Ok(from_utf8_unchecked(input))
+    }
+}
+
+/// Analogue to [`std::str::from_utf8_mut()`].
+///
+/// Checks if the passed mutable byte sequence is valid UTF-8 and returns a mutable
+/// [`std::str`] reference to the passed byte slice wrapped in `Ok()` if it is.
+///
+/// # Errors
+/// Will return the zero-sized Err([`Utf8Error`]) on if the input contains invalid UTF-8.
+#[inline]
+pub fn from_utf8_mut(input: &mut [u8]) -> Result<&mut str, Utf8Error> {
+    unsafe {
+        validate_utf8_basic(input)?;
+        Ok(from_utf8_unchecked_mut(input))
+    }
+}
+
+/// Allows direct access to the platform-specific unsafe validation implementations.
+#[cfg(feature = "public_imp")]
+pub mod imp {
+    use crate::basic;
+
+    /// A low-level interfacne for streaming validation of UTF-8 data. It is meant to be integrated
+    /// in high-performance data processing pipelines.
+    ///
+    /// Data can be streamed in arbitrarily-sized chunks using the [`Self::update()`] method. There is
+    /// no way to find out if the input so far was valid UTF-8 during the validation. Only when
+    /// the validation is completed with the [`Self::finalize()`] method the result of the validation is
+    /// returned. Use [`ChunkedUtf8Validator`] if possible for highest performance.
+    ///
+    /// This implementation requires CPU SIMD features specified by the module it resides in.
+    /// It is undefined behavior to use it if the required CPU features are not available which
+    /// is why all trait methods are `unsafe`.
+    ///
+    /// General usage:
+    /// ```rust
+    /// use simdutf8::basic::imp::Utf8Validator;
+    /// use std::io::{stdin, Read, Result};
+    ///
+    /// # #[cfg(target_arch = "x86_64")]
+    /// fn main() -> Result<()> {
+    ///     unsafe {
+    ///         if !std::is_x86_feature_detected!("avx2") {
+    ///             panic!("This example only works with CPUs supporting AVX 2");
+    ///         }
+    ///
+    ///         let mut validator = simdutf8::basic::imp::x86::avx2::Utf8ValidatorImp::new();
+    ///         let mut buf = vec![0; 8192];
+    ///         loop {
+    ///             let bytes_read = stdin().read(buf.as_mut())?;
+    ///             if bytes_read == 0 {
+    ///                 break;
+    ///             }
+    ///             validator.update(&buf);
+    ///         }
+    ///
+    ///         if validator.finalize().is_ok() {
+    ///             println!("Input is valid UTF-8");
+    ///         } else {
+    ///             println!("Input is not valid UTF-8");
+    ///         }
+    ///     }
+    ///
+    ///     Ok(())
+    /// }
+    ///
+    /// # #[cfg(not(target_arch = "x86_64"))]
+    /// # fn main() { }
+    /// ```
+    ///
+    pub trait Utf8Validator {
+        /// Creates a new validator.
+        ///
+        /// # Safety
+        /// This implementation requires CPU SIMD features specified by the module it resides in.
+        /// It is undefined behavior to call it if the required CPU features are not available.
+        #[must_use]
+        unsafe fn new() -> Self
+        where
+            Self: Sized;
+
+        /// Updates the validator with `input`.
+        ///
+        /// # Safety
+        /// This implementation requires CPU SIMD features specified by the module it resides in.
+        /// It is undefined behavior to call it if the required CPU features are not available.
+        unsafe fn update(&mut self, input: &[u8]);
+
+        /// Finishes the validation and returns `Ok(())` if the input was valid UTF-8.
+        ///
+        /// # Errors
+        /// A [`basic::Utf8Error`] is returned if the input was not valid UTF-8. No
+        /// further information about the location of the error is provided.
+        ///
+        /// # Safety
+        /// This implementation requires CPU SIMD features specified by the module it resides in.
+        /// It is undefined behavior to call it if the required CPU features are not available.
+        unsafe fn finalize(self) -> core::result::Result<(), basic::Utf8Error>;
+    }
+
+    /// Like [`Utf8Validator`] this low-level API is for streaming validation of UTF-8 data.
+    /// It has additional restrictions imposed on how the input is passed in to allow
+    /// validation with as little overhead as possible.
+    ///
+    /// To feed it data you need to call the [`Self::update_from_chunks()`] method which takes slices which
+    /// have to be a multiple of 64 bytes long. The method will panic otherwise.  There is
+    /// no way to find out if the input so far was valid UTF-8 during the validation. Only when
+    /// the validation is completed with the [`Self::finalize()`] method the result of the validation is
+    /// returned.
+    ///
+    /// The `Self::finalize()` method can be fed the rest of the data. There is no restriction on the
+    /// data passed to it.
+    ///
+    /// This implementation requires CPU SIMD features specified by the module it resides in.
+    /// It is undefined behavior to use it if the required CPU features are not available which
+    /// is why all trait methods are `unsafe`.
+    pub trait ChunkedUtf8Validator {
+        /// Creates a new validator.
+        ///
+        /// # Safety
+        /// This implementation requires CPU SIMD features specified by the module it resides in.
+        /// It is undefined behavior to call it if the required CPU features are not available.
+        #[must_use]
+        unsafe fn new() -> Self
+        where
+            Self: Sized;
+
+        /// Updates the validator with `input`.
+        ///
+        /// # Panics
+        /// If `input.len()` is not a multiple of 64.
+        ///
+        /// # Safety
+        /// This implementation requires CPU SIMD features specified by the module it resides in.
+        /// It is undefined behavior to call it if the required CPU features are not available.
+        unsafe fn update_from_chunks(&mut self, input: &[u8]);
+
+        /// Updates the validator with remaining input if any. There is no restriction on the
+        /// data provided.
+        ///
+        /// Finishes the validation and returns `Ok(())` if the input was valid UTF-8.
+        ///
+        /// # Errors
+        /// A [`basic::Utf8Error`] is returned if the input was not valid UTF-8. No
+        /// further information about the location of the error is provided.
+        ///
+        /// # Safety
+        /// This implementation requires CPU SIMD features specified by the module it resides in.
+        /// It is undefined behavior to call it if the required CPU features are not available.
+        unsafe fn finalize(
+            self,
+            remaining_input: core::option::Option<&[u8]>,
+        ) -> core::result::Result<(), basic::Utf8Error>;
+    }
+
+    /// Includes the x86/x86-64 SIMD implementations.
+    #[cfg(all(any(target_arch = "x86", target_arch = "x86_64")))]
+    pub mod x86 {
+        /// Includes the validation implementation for AVX 2-compatible CPUs.
+        ///
+        /// Using the provided functionality on CPUs which do not support AVX 2 is undefined
+        /// behavior and will very likely cause a crash.
+        pub mod avx2 {
+            pub use crate::implementation::x86::avx2::validate_utf8_basic as validate_utf8;
+            pub use crate::implementation::x86::avx2::ChunkedUtf8ValidatorImp;
+            pub use crate::implementation::x86::avx2::Utf8ValidatorImp;
+        }
+        /// Includes the validation implementation for SSE 4.2-compatible CPUs.
+        ///
+        /// Using the provided functionality on CPUs which do not support SSE 4.2 is undefined
+        /// behavior and will very likely cause a crash.
+        pub mod sse42 {
+            pub use crate::implementation::x86::sse42::validate_utf8_basic as validate_utf8;
+            pub use crate::implementation::x86::sse42::ChunkedUtf8ValidatorImp;
+            pub use crate::implementation::x86::sse42::Utf8ValidatorImp;
+        }
+    }
+
+    /// Includes the aarch64 SIMD implementations.
+    #[cfg(all(feature = "aarch64_neon", target_arch = "aarch64"))]
+    pub mod aarch64 {
+        /// Includes the Neon-based validation implementation for aarch64 CPUs.
+        ///
+        /// Should be supported on all ARM64 CPUSs. If it is not supported by the operating
+        /// system using it is undefined behavior and will likely cause a crash.
+        pub mod neon {
+            pub use crate::implementation::aarch64::neon::validate_utf8_basic as validate_utf8;
+            pub use crate::implementation::aarch64::neon::ChunkedUtf8ValidatorImp;
+            pub use crate::implementation::aarch64::neon::Utf8ValidatorImp;
+        }
+    }
+
+    /// Includes the wasm32 SIMD implementations.
+    #[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
+    pub mod wasm32 {
+        /// Includes the simd128-based validation implementation for WASM runtimes.
+        ///
+        /// Using the provided functionality on WASM runtimes that do not support SIMD
+        /// instructions will likely cause a crash.
+        pub mod simd128 {
+            pub use crate::implementation::wasm32::simd128::validate_utf8_basic as validate_utf8;
+            pub use crate::implementation::wasm32::simd128::ChunkedUtf8ValidatorImp;
+            pub use crate::implementation::wasm32::simd128::Utf8ValidatorImp;
+        }
+    }
+}
diff --git a/src/compat.rs b/src/compat.rs
new file mode 100644
index 0000000..3f112c4
--- /dev/null
+++ b/src/compat.rs
@@ -0,0 +1,136 @@
+//! The `compat` API flavor provides full compatibility with [`std::str::from_utf8()`] and detailed validation errors.
+//!
+//! In particular, [`from_utf8()`]
+//! returns an [`Utf8Error`], which has the [`valid_up_to()`](Utf8Error#method.valid_up_to) and
+//! [`error_len()`](Utf8Error#method.error_len) methods. The first is useful for verification of streamed data. The
+//! second is useful e.g. for replacing invalid byte sequences with a replacement character.
+//!
+//! The functions in this module also fail early: errors are checked on-the-fly as the string is processed and once
+//! an invalid UTF-8 sequence is encountered, it returns without processing the rest of the data.
+//! This comes at a slight performance penality compared to the [`crate::basic`] module if the input is valid UTF-8.
+
+use core::fmt::Display;
+use core::fmt::Formatter;
+
+use core::str::{from_utf8_unchecked, from_utf8_unchecked_mut};
+
+use crate::implementation::validate_utf8_compat;
+
+/// UTF-8 error information compatible with [`std::str::Utf8Error`].
+///
+/// Contains information on the location of the encountered validation error and the length of the
+/// invalid UTF-8 sequence.
+#[derive(Copy, Eq, PartialEq, Clone, Debug)]
+pub struct Utf8Error {
+    pub(crate) valid_up_to: usize,
+    pub(crate) error_len: Option<u8>,
+}
+
+impl Utf8Error {
+    /// Analogue to [`std::str::Utf8Error::valid_up_to()`](std::str::Utf8Error#method.valid_up_to).
+    ///
+    /// ...
+    #[inline]
+    #[must_use]
+    #[allow(clippy::missing_const_for_fn)] // would not provide any benefit
+    pub fn valid_up_to(&self) -> usize {
+        self.valid_up_to
+    }
+
+    /// Analogue to [`std::str::Utf8Error::error_len()`](std::str::Utf8Error#method.error_len).
+    ///
+    /// ...
+    #[inline]
+    #[must_use]
+    pub fn error_len(&self) -> Option<usize> {
+        self.error_len.map(|len| len as usize)
+    }
+}
+
+impl Display for Utf8Error {
+    fn fmt(&self, f: &mut Formatter<'_>) -> core::fmt::Result {
+        if let Some(error_len) = self.error_len {
+            write!(
+                f,
+                "invalid utf-8 sequence of {} bytes from index {}",
+                error_len, self.valid_up_to
+            )
+        } else {
+            write!(
+                f,
+                "incomplete utf-8 byte sequence from index {}",
+                self.valid_up_to
+            )
+        }
+    }
+}
+
+#[cfg(feature = "std")]
+impl std::error::Error for Utf8Error {}
+
+/// Analogue to [`std::str::from_utf8_mut()`].
+///
+/// Checks if the passed byte sequence is valid UTF-8 and returns an
+/// [`std::str`] reference to the passed byte slice wrapped in `Ok()` if it is.
+///
+/// # Errors
+/// Will return Err([`Utf8Error`]) on if the input contains invalid UTF-8 with
+/// detailed error information.
+#[inline]
+pub fn from_utf8(input: &[u8]) -> Result<&str, Utf8Error> {
+    unsafe {
+        validate_utf8_compat(input)?;
+        Ok(from_utf8_unchecked(input))
+    }
+}
+
+/// Analogue to [`std::str::from_utf8_mut()`].
+///
+/// Checks if the passed mutable byte sequence is valid UTF-8 and returns a mutable
+/// [`std::str`] reference to the passed byte slice wrapped in `Ok()` if it is.
+///
+/// # Errors
+/// Will return Err([`Utf8Error`]) on if the input contains invalid UTF-8 with
+/// detailed error information.
+#[inline]
+pub fn from_utf8_mut(input: &mut [u8]) -> Result<&mut str, Utf8Error> {
+    unsafe {
+        validate_utf8_compat(input)?;
+        Ok(from_utf8_unchecked_mut(input))
+    }
+}
+
+/// Allows direct access to the platform-specific unsafe validation implementations.
+#[cfg(feature = "public_imp")]
+pub mod imp {
+    /// Includes the x86/x86-64 SIMD implementations.
+    #[cfg(all(any(target_arch = "x86", target_arch = "x86_64")))]
+    pub mod x86 {
+        /// Includes the validation implementation for AVX 2-compatible CPUs.
+        pub mod avx2 {
+            pub use crate::implementation::x86::avx2::validate_utf8_compat as validate_utf8;
+        }
+        /// Includes the validation implementation for SSE 4.2-compatible CPUs.
+        pub mod sse42 {
+            pub use crate::implementation::x86::sse42::validate_utf8_compat as validate_utf8;
+        }
+    }
+
+    /// Includes the aarch64 SIMD implementations.
+    #[cfg(all(feature = "aarch64_neon", target_arch = "aarch64"))]
+    pub mod aarch64 {
+        /// Includes the validation implementation for Neon SIMD.
+        pub mod neon {
+            pub use crate::implementation::aarch64::neon::validate_utf8_compat as validate_utf8;
+        }
+    }
+
+    /// Includes the wasm32 SIMD implementations.
+    #[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
+    pub mod wasm32 {
+        /// Includes the validation implementation for WASM simd128.
+        pub mod simd128 {
+            pub use crate::implementation::wasm32::simd128::validate_utf8_compat as validate_utf8;
+        }
+    }
+}
diff --git a/src/implementation/aarch64/mod.rs b/src/implementation/aarch64/mod.rs
new file mode 100644
index 0000000..b8a1d72
--- /dev/null
+++ b/src/implementation/aarch64/mod.rs
@@ -0,0 +1,41 @@
+#[cfg(any(feature = "aarch64_neon", target_feature = "neon"))]
+#[allow(dead_code)]
+pub(crate) mod neon;
+
+#[inline]
+#[cfg(any(feature = "aarch64_neon", target_feature = "neon"))]
+pub(crate) unsafe fn validate_utf8_basic(input: &[u8]) -> Result<(), crate::basic::Utf8Error> {
+    if input.len() < super::helpers::SIMD_CHUNK_SIZE {
+        return super::validate_utf8_basic_fallback(input);
+    }
+
+    validate_utf8_basic_neon(input)
+}
+
+#[inline(never)]
+#[cfg(any(feature = "aarch64_neon", target_feature = "neon"))]
+unsafe fn validate_utf8_basic_neon(input: &[u8]) -> Result<(), crate::basic::Utf8Error> {
+    neon::validate_utf8_basic(input)
+}
+
+#[cfg(not(any(feature = "aarch64_neon", target_feature = "neon")))]
+pub(crate) use super::validate_utf8_basic_fallback as validate_utf8_basic;
+
+#[inline]
+#[cfg(any(feature = "aarch64_neon", target_feature = "neon"))]
+pub(crate) unsafe fn validate_utf8_compat(input: &[u8]) -> Result<(), crate::compat::Utf8Error> {
+    if input.len() < super::helpers::SIMD_CHUNK_SIZE {
+        return super::validate_utf8_compat_fallback(input);
+    }
+
+    validate_utf8_compat_neon(input)
+}
+
+#[inline(never)]
+#[cfg(any(feature = "aarch64_neon", target_feature = "neon"))]
+unsafe fn validate_utf8_compat_neon(input: &[u8]) -> Result<(), crate::compat::Utf8Error> {
+    neon::validate_utf8_compat(input)
+}
+
+#[cfg(not(any(feature = "aarch64_neon", target_feature = "neon")))]
+pub(crate) use super::validate_utf8_compat_fallback as validate_utf8_compat;
diff --git a/src/implementation/aarch64/neon.rs b/src/implementation/aarch64/neon.rs
new file mode 100644
index 0000000..c13a960
--- /dev/null
+++ b/src/implementation/aarch64/neon.rs
@@ -0,0 +1,244 @@
+//! Contains the aarch64 UTF-8 validation implementation.
+
+use core::arch::aarch64::{
+    uint8x16_t, vandq_u8, vcgtq_u8, vdupq_n_u8, veorq_u8, vextq_u8, vld1q_u8, vmaxvq_u8,
+    vmovq_n_u8, vorrq_u8, vqsubq_u8, vqtbl1q_u8, vshrq_n_u8,
+};
+
+use crate::implementation::helpers::Utf8CheckAlgorithm;
+
+// aarch64 SIMD primitives
+
+type SimdU8Value = crate::implementation::helpers::SimdU8Value<uint8x16_t>;
+
+impl SimdU8Value {
+    #[inline]
+    #[allow(clippy::too_many_arguments)]
+    #[allow(clippy::cast_possible_wrap)]
+    unsafe fn from_32_cut_off_leading(
+        _v0: u8,
+        _v1: u8,
+        _v2: u8,
+        _v3: u8,
+        _v4: u8,
+        _v5: u8,
+        _v6: u8,
+        _v7: u8,
+        _v8: u8,
+        _v9: u8,
+        _v10: u8,
+        _v11: u8,
+        _v12: u8,
+        _v13: u8,
+        _v14: u8,
+        _v15: u8,
+        v16: u8,
+        v17: u8,
+        v18: u8,
+        v19: u8,
+        v20: u8,
+        v21: u8,
+        v22: u8,
+        v23: u8,
+        v24: u8,
+        v25: u8,
+        v26: u8,
+        v27: u8,
+        v28: u8,
+        v29: u8,
+        v30: u8,
+        v31: u8,
+    ) -> Self {
+        let arr: [u8; 16] = [
+            v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31,
+        ];
+        Self::from(vld1q_u8(arr.as_ptr()))
+    }
+
+    #[inline]
+    #[allow(clippy::too_many_arguments)]
+    #[allow(clippy::cast_possible_wrap)]
+    unsafe fn repeat_16(
+        v0: u8,
+        v1: u8,
+        v2: u8,
+        v3: u8,
+        v4: u8,
+        v5: u8,
+        v6: u8,
+        v7: u8,
+        v8: u8,
+        v9: u8,
+        v10: u8,
+        v11: u8,
+        v12: u8,
+        v13: u8,
+        v14: u8,
+        v15: u8,
+    ) -> Self {
+        let arr: [u8; 16] = [
+            v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15,
+        ];
+        Self::from(vld1q_u8(arr.as_ptr()))
+    }
+
+    #[inline]
+    #[allow(clippy::cast_ptr_alignment)]
+    unsafe fn load_from(ptr: *const u8) -> Self {
+        // WORKAROUND for https://github.com/rust-lang/stdarch/issues/1148
+        // The vld1q_u8 intrinsic is currently broken, it treats it as individual
+        // byte loads so the compiler sometimes decides it is a better to load
+        // individual bytes to "optimize" a subsequent SIMD shuffle
+        //
+        // This code forces a full 128-bit load.
+        let mut dst = core::mem::MaybeUninit::<uint8x16_t>::uninit();
+        core::ptr::copy_nonoverlapping(
+            ptr.cast::<u8>(),
+            dst.as_mut_ptr().cast::<u8>(),
+            core::mem::size_of::<uint8x16_t>(),
+        );
+        Self::from(dst.assume_init())
+    }
+
+    #[inline]
+    #[allow(clippy::too_many_arguments)]
+    unsafe fn lookup_16(
+        self,
+        v0: u8,
+        v1: u8,
+        v2: u8,
+        v3: u8,
+        v4: u8,
+        v5: u8,
+        v6: u8,
+        v7: u8,
+        v8: u8,
+        v9: u8,
+        v10: u8,
+        v11: u8,
+        v12: u8,
+        v13: u8,
+        v14: u8,
+        v15: u8,
+    ) -> Self {
+        Self::from(vqtbl1q_u8(
+            Self::repeat_16(
+                v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15,
+            )
+            .0,
+            self.0,
+        ))
+    }
+
+    #[inline]
+    #[allow(clippy::cast_possible_wrap)]
+    unsafe fn splat(val: u8) -> Self {
+        Self::from(vmovq_n_u8(val))
+    }
+
+    #[inline]
+    #[allow(clippy::cast_possible_wrap)]
+    unsafe fn splat0() -> Self {
+        Self::from(vdupq_n_u8(0))
+    }
+
+    #[inline]
+    unsafe fn or(self, b: Self) -> Self {
+        Self::from(vorrq_u8(self.0, b.0))
+    }
+
+    #[inline]
+    unsafe fn and(self, b: Self) -> Self {
+        Self::from(vandq_u8(self.0, b.0))
+    }
+
+    #[inline]
+    unsafe fn xor(self, b: Self) -> Self {
+        Self::from(veorq_u8(self.0, b.0))
+    }
+
+    #[inline]
+    unsafe fn saturating_sub(self, b: Self) -> Self {
+        Self::from(vqsubq_u8(self.0, b.0))
+    }
+
+    // ugly but shr<N> requires const generics
+
+    #[allow(clippy::cast_lossless)]
+    #[inline]
+    unsafe fn shr4(self) -> Self {
+        Self::from(vshrq_n_u8(self.0, 4))
+    }
+
+    // ugly but prev<N> requires const generics
+
+    #[allow(clippy::cast_lossless)]
+    #[inline]
+    unsafe fn prev1(self, prev: Self) -> Self {
+        Self::from(vextq_u8(prev.0, self.0, 16 - 1))
+    }
+
+    // ugly but prev<N> requires const generics
+
+    #[allow(clippy::cast_lossless)]
+    #[inline]
+    unsafe fn prev2(self, prev: Self) -> Self {
+        Self::from(vextq_u8(prev.0, self.0, 16 - 2))
+    }
+
+    // ugly but prev<N> requires const generics
+
+    #[allow(clippy::cast_lossless)]
+    #[inline]
+    unsafe fn prev3(self, prev: Self) -> Self {
+        Self::from(vextq_u8(prev.0, self.0, 16 - 3))
+    }
+
+    #[inline]
+    unsafe fn unsigned_gt(self, other: Self) -> Self {
+        Self::from(vcgtq_u8(self.0, other.0))
+    }
+
+    #[inline]
+    unsafe fn any_bit_set(self) -> bool {
+        vmaxvq_u8(self.0) != 0
+    }
+
+    #[inline]
+    unsafe fn is_ascii(self) -> bool {
+        vmaxvq_u8(self.0) < 0b1000_0000_u8
+    }
+}
+
+impl From<uint8x16_t> for SimdU8Value {
+    #[inline]
+    fn from(val: uint8x16_t) -> Self {
+        Self(val)
+    }
+}
+
+impl Utf8CheckAlgorithm<SimdU8Value> {
+    #[inline]
+    unsafe fn must_be_2_3_continuation(prev2: SimdU8Value, prev3: SimdU8Value) -> SimdU8Value {
+        let is_third_byte = prev2.unsigned_gt(SimdU8Value::splat(0b1110_0000 - 1));
+        let is_fourth_byte = prev3.unsigned_gt(SimdU8Value::splat(0b1111_0000 - 1));
+
+        is_third_byte.or(is_fourth_byte)
+    }
+}
+
+#[inline]
+#[cfg(feature = "aarch64_neon_prefetch")]
+unsafe fn simd_prefetch(ptr: *const u8) {
+    use core::arch::aarch64::{_prefetch, _PREFETCH_LOCALITY3, _PREFETCH_READ};
+    _prefetch(ptr.cast::<i8>(), _PREFETCH_READ, _PREFETCH_LOCALITY3);
+}
+
+#[inline]
+#[cfg(not(feature = "aarch64_neon_prefetch"))]
+unsafe fn simd_prefetch(_ptr: *const u8) {}
+
+const PREFETCH: bool = false;
+use crate::implementation::helpers::TempSimdChunkA16 as TempSimdChunk;
+simd_input_128_bit!("not_used");
+algorithm_simd!("not_used");
diff --git a/src/implementation/algorithm.rs b/src/implementation/algorithm.rs
new file mode 100644
index 0000000..3ec1c82
--- /dev/null
+++ b/src/implementation/algorithm.rs
@@ -0,0 +1,580 @@
+/// Macros requires newtypes in scope:
+/// `SimdU8Value` - implementation of SIMD primitives
+/// `SimdInput` - which  holds 64 bytes of SIMD input
+/// `TempSimdChunk` - correctly aligned `TempSimdChunk`, either `TempSimdChunkA16` or `TempSimdChunkA32`
+
+macro_rules! algorithm_simd {
+    ($feat:expr) => {
+        use crate::{basic, compat};
+
+        impl Utf8CheckAlgorithm<SimdU8Value> {
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            unsafe fn default() -> Self {
+                Self {
+                    prev: SimdU8Value::splat0(),
+                    incomplete: SimdU8Value::splat0(),
+                    error: SimdU8Value::splat0(),
+                }
+            }
+
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            unsafe fn check_incomplete_pending(&mut self) {
+                self.error = self.error.or(self.incomplete);
+            }
+
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            unsafe fn is_incomplete(input: SimdU8Value) -> SimdU8Value {
+                input.saturating_sub(SimdU8Value::from_32_cut_off_leading(
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0xff,
+                    0b1111_0000 - 1,
+                    0b1110_0000 - 1,
+                    0b1100_0000 - 1,
+                ))
+            }
+
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            #[allow(clippy::too_many_lines)]
+            unsafe fn check_special_cases(input: SimdU8Value, prev1: SimdU8Value) -> SimdU8Value {
+                const TOO_SHORT: u8 = 1 << 0;
+                const TOO_LONG: u8 = 1 << 1;
+                const OVERLONG_3: u8 = 1 << 2;
+                const SURROGATE: u8 = 1 << 4;
+                const OVERLONG_2: u8 = 1 << 5;
+                const TWO_CONTS: u8 = 1 << 7;
+                const TOO_LARGE: u8 = 1 << 3;
+                const TOO_LARGE_1000: u8 = 1 << 6;
+                const OVERLONG_4: u8 = 1 << 6;
+                const CARRY: u8 = TOO_SHORT | TOO_LONG | TWO_CONTS;
+
+                let byte_1_high = prev1.shr4().lookup_16(
+                    TOO_LONG,
+                    TOO_LONG,
+                    TOO_LONG,
+                    TOO_LONG,
+                    TOO_LONG,
+                    TOO_LONG,
+                    TOO_LONG,
+                    TOO_LONG,
+                    TWO_CONTS,
+                    TWO_CONTS,
+                    TWO_CONTS,
+                    TWO_CONTS,
+                    TOO_SHORT | OVERLONG_2,
+                    TOO_SHORT,
+                    TOO_SHORT | OVERLONG_3 | SURROGATE,
+                    TOO_SHORT | TOO_LARGE | TOO_LARGE_1000 | OVERLONG_4,
+                );
+
+                let byte_1_low = prev1.and(SimdU8Value::splat(0x0F)).lookup_16(
+                    CARRY | OVERLONG_3 | OVERLONG_2 | OVERLONG_4,
+                    CARRY | OVERLONG_2,
+                    CARRY,
+                    CARRY,
+                    CARRY | TOO_LARGE,
+                    CARRY | TOO_LARGE | TOO_LARGE_1000,
+                    CARRY | TOO_LARGE | TOO_LARGE_1000,
+                    CARRY | TOO_LARGE | TOO_LARGE_1000,
+                    CARRY | TOO_LARGE | TOO_LARGE_1000,
+                    CARRY | TOO_LARGE | TOO_LARGE_1000,
+                    CARRY | TOO_LARGE | TOO_LARGE_1000,
+                    CARRY | TOO_LARGE | TOO_LARGE_1000,
+                    CARRY | TOO_LARGE | TOO_LARGE_1000,
+                    CARRY | TOO_LARGE | TOO_LARGE_1000 | SURROGATE,
+                    CARRY | TOO_LARGE | TOO_LARGE_1000,
+                    CARRY | TOO_LARGE | TOO_LARGE_1000,
+                );
+
+                let byte_2_high = input.shr4().lookup_16(
+                    TOO_SHORT,
+                    TOO_SHORT,
+                    TOO_SHORT,
+                    TOO_SHORT,
+                    TOO_SHORT,
+                    TOO_SHORT,
+                    TOO_SHORT,
+                    TOO_SHORT,
+                    TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE_1000 | OVERLONG_4,
+                    TOO_LONG | OVERLONG_2 | TWO_CONTS | OVERLONG_3 | TOO_LARGE,
+                    TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+                    TOO_LONG | OVERLONG_2 | TWO_CONTS | SURROGATE | TOO_LARGE,
+                    TOO_SHORT,
+                    TOO_SHORT,
+                    TOO_SHORT,
+                    TOO_SHORT,
+                );
+
+                byte_1_high.and(byte_1_low).and(byte_2_high)
+            }
+
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            unsafe fn check_multibyte_lengths(
+                input: SimdU8Value,
+                prev: SimdU8Value,
+                special_cases: SimdU8Value,
+            ) -> SimdU8Value {
+                let prev2 = input.prev2(prev);
+                let prev3 = input.prev3(prev);
+                let must23 = Self::must_be_2_3_continuation(prev2, prev3);
+                let must23_80 = must23.and(SimdU8Value::splat(0x80));
+                must23_80.xor(special_cases)
+            }
+
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            unsafe fn has_error(&self) -> bool {
+                self.error.any_bit_set()
+            }
+
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            unsafe fn check_bytes(&mut self, input: SimdU8Value) {
+                let prev1 = input.prev1(self.prev);
+                let sc = Self::check_special_cases(input, prev1);
+                self.error = self
+                    .error
+                    .or(Self::check_multibyte_lengths(input, self.prev, sc));
+                self.prev = input;
+            }
+
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            unsafe fn check_utf8(&mut self, input: SimdInput) {
+                if input.is_ascii() {
+                    self.check_incomplete_pending();
+                } else {
+                    self.check_block(input);
+                }
+            }
+
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            #[allow(unconditional_panic)] // does not panic because len is checked
+            #[allow(const_err)] // the same, but for Rust 1.38.0
+            unsafe fn check_block(&mut self, input: SimdInput) {
+                // WORKAROUND
+                // necessary because the for loop is not unrolled on ARM64
+                if input.vals.len() == 2 {
+                    self.check_bytes(input.vals[0]);
+                    self.check_bytes(input.vals[1]);
+                    self.incomplete = Self::is_incomplete(input.vals[1]);
+                } else if input.vals.len() == 4 {
+                    self.check_bytes(input.vals[0]);
+                    self.check_bytes(input.vals[1]);
+                    self.check_bytes(input.vals[2]);
+                    self.check_bytes(input.vals[3]);
+                    self.incomplete = Self::is_incomplete(input.vals[3]);
+                } else {
+                    panic!("Unsupported number of chunks");
+                }
+            }
+        }
+
+        /// Validation implementation for CPUs supporting the SIMD extension (see module).
+        ///
+        /// # Errors
+        /// Returns the zero-sized [`basic::Utf8Error`] on failure.
+        ///
+        /// # Safety
+        /// This function is inherently unsafe because it is compiled with SIMD extensions
+        /// enabled. Make sure that the CPU supports it before calling.
+        ///
+        #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+        #[inline]
+        pub unsafe fn validate_utf8_basic(
+            input: &[u8],
+        ) -> core::result::Result<(), basic::Utf8Error> {
+            use crate::implementation::helpers::SIMD_CHUNK_SIZE;
+            let len = input.len();
+            let mut algorithm = Utf8CheckAlgorithm::<SimdU8Value>::default();
+            let mut idx: usize = 0;
+            let iter_lim = len - (len % SIMD_CHUNK_SIZE);
+
+            while idx < iter_lim {
+                let simd_input = SimdInput::new(input.get_unchecked(idx as usize..));
+                idx += SIMD_CHUNK_SIZE;
+                if !simd_input.is_ascii() {
+                    algorithm.check_block(simd_input);
+                    break;
+                }
+            }
+
+            while idx < iter_lim {
+                if PREFETCH {
+                    simd_prefetch(input.as_ptr().add(idx + SIMD_CHUNK_SIZE * 2));
+                }
+                let input = SimdInput::new(input.get_unchecked(idx as usize..));
+                algorithm.check_utf8(input);
+                idx += SIMD_CHUNK_SIZE;
+            }
+
+            if idx < len {
+                let mut tmpbuf = TempSimdChunk::new();
+                crate::implementation::helpers::memcpy_unaligned_nonoverlapping_inline_opt_lt_64(
+                    input.as_ptr().add(idx),
+                    tmpbuf.0.as_mut_ptr(),
+                    len - idx,
+                );
+                let simd_input = SimdInput::new(&tmpbuf.0);
+                algorithm.check_utf8(simd_input);
+            }
+            algorithm.check_incomplete_pending();
+            if algorithm.has_error() {
+                Err(basic::Utf8Error {})
+            } else {
+                Ok(())
+            }
+        }
+
+        /// Validation implementation for CPUs supporting the SIMD extension (see module).
+        ///
+        /// # Errors
+        /// Returns [`compat::Utf8Error`] with detailed error information on failure.
+        ///
+        /// # Safety
+        /// This function is inherently unsafe because it is compiled with SIMD extensions
+        /// enabled. Make sure that the CPU supports it before calling.
+        ///
+        #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+        #[inline]
+        pub unsafe fn validate_utf8_compat(
+            input: &[u8],
+        ) -> core::result::Result<(), compat::Utf8Error> {
+            validate_utf8_compat_simd0(input)
+                .map_err(|idx| crate::implementation::helpers::get_compat_error(input, idx))
+        }
+
+        #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+        #[inline]
+        unsafe fn validate_utf8_compat_simd0(input: &[u8]) -> core::result::Result<(), usize> {
+            use crate::implementation::helpers::SIMD_CHUNK_SIZE;
+            let len = input.len();
+            let mut algorithm = Utf8CheckAlgorithm::<SimdU8Value>::default();
+            let mut idx: usize = 0;
+            let mut only_ascii = true;
+            let iter_lim = len - (len % SIMD_CHUNK_SIZE);
+
+            'outer: loop {
+                if only_ascii {
+                    while idx < iter_lim {
+                        let simd_input = SimdInput::new(input.get_unchecked(idx as usize..));
+                        if !simd_input.is_ascii() {
+                            algorithm.check_block(simd_input);
+                            if algorithm.has_error() {
+                                return Err(idx);
+                            } else {
+                                only_ascii = false;
+                                idx += SIMD_CHUNK_SIZE;
+                                continue 'outer;
+                            }
+                        }
+                        idx += SIMD_CHUNK_SIZE;
+                    }
+                } else {
+                    while idx < iter_lim {
+                        if PREFETCH {
+                            simd_prefetch(input.as_ptr().add(idx + SIMD_CHUNK_SIZE * 2));
+                        }
+                        let simd_input = SimdInput::new(input.get_unchecked(idx as usize..));
+                        if simd_input.is_ascii() {
+                            algorithm.check_incomplete_pending();
+                            if algorithm.has_error() {
+                                return Err(idx);
+                            } else {
+                                // we are in pure ASCII territory again
+                                only_ascii = true;
+                                idx += SIMD_CHUNK_SIZE;
+                                continue 'outer;
+                            }
+                        } else {
+                            algorithm.check_block(simd_input);
+                            if algorithm.has_error() {
+                                return Err(idx);
+                            }
+                        }
+                        idx += SIMD_CHUNK_SIZE;
+                    }
+                }
+                break;
+            }
+            if idx < len {
+                let mut tmpbuf = TempSimdChunk::new();
+                crate::implementation::helpers::memcpy_unaligned_nonoverlapping_inline_opt_lt_64(
+                    input.as_ptr().add(idx),
+                    tmpbuf.0.as_mut_ptr(),
+                    len - idx,
+                );
+                let simd_input = SimdInput::new(&tmpbuf.0);
+
+                algorithm.check_utf8(simd_input);
+            }
+            algorithm.check_incomplete_pending();
+            if algorithm.has_error() {
+                Err(idx)
+            } else {
+                Ok(())
+            }
+        }
+
+        /// Low-level implementation of the [`basic::imp::Utf8Validator`] trait.
+        ///
+        /// This is implementation requires CPU SIMD features specified by the module it resides in.
+        /// It is undefined behavior to call it if the required CPU features are not
+        /// available.
+        #[cfg(feature = "public_imp")]
+        pub struct Utf8ValidatorImp {
+            algorithm: Utf8CheckAlgorithm<SimdU8Value>,
+            incomplete_data: [u8; 64],
+            incomplete_len: usize,
+        }
+
+        #[cfg(feature = "public_imp")]
+        impl Utf8ValidatorImp {
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            unsafe fn update_from_incomplete_data(&mut self) {
+                let simd_input = SimdInput::new(&self.incomplete_data);
+                self.algorithm.check_utf8(simd_input);
+                self.incomplete_len = 0;
+            }
+        }
+
+        #[cfg(feature = "public_imp")]
+        impl basic::imp::Utf8Validator for Utf8ValidatorImp {
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            #[must_use]
+            unsafe fn new() -> Self {
+                Self {
+                    algorithm: Utf8CheckAlgorithm::<SimdU8Value>::default(),
+                    incomplete_data: [0; 64],
+                    incomplete_len: 0,
+                }
+            }
+
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            unsafe fn update(&mut self, mut input: &[u8]) {
+                use crate::implementation::helpers::SIMD_CHUNK_SIZE;
+                if input.is_empty() {
+                    return;
+                }
+                if self.incomplete_len != 0 {
+                    let to_copy =
+                        core::cmp::min(SIMD_CHUNK_SIZE - self.incomplete_len, input.len());
+                    self.incomplete_data
+                        .as_mut_ptr()
+                        .add(self.incomplete_len)
+                        .copy_from_nonoverlapping(input.as_ptr(), to_copy);
+                    if self.incomplete_len + to_copy == SIMD_CHUNK_SIZE {
+                        self.update_from_incomplete_data();
+                        input = &input[to_copy..];
+                    } else {
+                        self.incomplete_len += to_copy;
+                        return;
+                    }
+                }
+                let len = input.len();
+                let mut idx: usize = 0;
+                let iter_lim = len - (len % SIMD_CHUNK_SIZE);
+                while idx < iter_lim {
+                    let input = SimdInput::new(input.get_unchecked(idx as usize..));
+                    self.algorithm.check_utf8(input);
+                    idx += SIMD_CHUNK_SIZE;
+                }
+                if idx < len {
+                    let to_copy = len - idx;
+                    self.incomplete_data
+                        .as_mut_ptr()
+                        .copy_from_nonoverlapping(input.as_ptr().add(idx), to_copy);
+                    self.incomplete_len = to_copy;
+                }
+            }
+
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            unsafe fn finalize(mut self) -> core::result::Result<(), basic::Utf8Error> {
+                if self.incomplete_len != 0 {
+                    for i in &mut self.incomplete_data[self.incomplete_len..] {
+                        *i = 0;
+                    }
+                    self.update_from_incomplete_data();
+                }
+                self.algorithm.check_incomplete_pending();
+                if self.algorithm.has_error() {
+                    Err(basic::Utf8Error {})
+                } else {
+                    Ok(())
+                }
+            }
+        }
+
+        /// Low-level implementation of the [`basic::imp::ChunkedUtf8Validator`] trait.
+        ///
+        /// This is implementation requires CPU SIMD features specified by the module it resides in.
+        /// It is undefined behavior to call it if the required CPU features are not
+        /// available.
+        #[cfg(feature = "public_imp")]
+        pub struct ChunkedUtf8ValidatorImp {
+            algorithm: Utf8CheckAlgorithm<SimdU8Value>,
+        }
+
+        #[cfg(feature = "public_imp")]
+        impl basic::imp::ChunkedUtf8Validator for ChunkedUtf8ValidatorImp {
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            #[must_use]
+            unsafe fn new() -> Self {
+                Self {
+                    algorithm: Utf8CheckAlgorithm::<SimdU8Value>::default(),
+                }
+            }
+
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            unsafe fn update_from_chunks(&mut self, input: &[u8]) {
+                use crate::implementation::helpers::SIMD_CHUNK_SIZE;
+
+                assert!(
+                    input.len() % SIMD_CHUNK_SIZE == 0,
+                    "Input size must be a multiple of 64."
+                );
+                for chunk in input.chunks_exact(SIMD_CHUNK_SIZE) {
+                    let input = SimdInput::new(chunk);
+                    self.algorithm.check_utf8(input);
+                }
+            }
+
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            unsafe fn finalize(
+                mut self,
+                remaining_input: core::option::Option<&[u8]>,
+            ) -> core::result::Result<(), basic::Utf8Error> {
+                use crate::implementation::helpers::SIMD_CHUNK_SIZE;
+
+                if let Some(mut remaining_input) = remaining_input {
+                    if !remaining_input.is_empty() {
+                        let len = remaining_input.len();
+                        let chunks_lim = len - (len % SIMD_CHUNK_SIZE);
+                        if chunks_lim > 0 {
+                            self.update_from_chunks(&remaining_input[..chunks_lim]);
+                        }
+                        let rem = len - chunks_lim;
+                        if rem > 0 {
+                            remaining_input = &remaining_input[chunks_lim..];
+                            let mut tmpbuf = TempSimdChunk::new();
+                            tmpbuf.0.as_mut_ptr().copy_from_nonoverlapping(
+                                remaining_input.as_ptr(),
+                                remaining_input.len(),
+                            );
+                            let simd_input = SimdInput::new(&tmpbuf.0);
+                            self.algorithm.check_utf8(simd_input);
+                        }
+                    }
+                }
+                self.algorithm.check_incomplete_pending();
+                if self.algorithm.has_error() {
+                    Err(basic::Utf8Error {})
+                } else {
+                    Ok(())
+                }
+            }
+        }
+    };
+}
+
+macro_rules! simd_input_128_bit {
+    ($feat:expr) => {
+        #[repr(C)]
+        struct SimdInput {
+            vals: [SimdU8Value; 4],
+        }
+
+        impl SimdInput {
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            #[allow(clippy::cast_ptr_alignment)]
+            unsafe fn new(ptr: &[u8]) -> Self {
+                Self {
+                    vals: [
+                        SimdU8Value::load_from(ptr.as_ptr()),
+                        SimdU8Value::load_from(ptr.as_ptr().add(16)),
+                        SimdU8Value::load_from(ptr.as_ptr().add(32)),
+                        SimdU8Value::load_from(ptr.as_ptr().add(48)),
+                    ],
+                }
+            }
+
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            unsafe fn is_ascii(&self) -> bool {
+                let r1 = self.vals[0].or(self.vals[1]);
+                let r2 = self.vals[2].or(self.vals[3]);
+                let r = r1.or(r2);
+                r.is_ascii()
+            }
+        }
+    };
+}
+
+macro_rules! simd_input_256_bit {
+    ($feat:expr) => {
+        #[repr(C)]
+        struct SimdInput {
+            vals: [SimdU8Value; 2],
+        }
+
+        impl SimdInput {
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            #[allow(clippy::cast_ptr_alignment)]
+            unsafe fn new(ptr: &[u8]) -> Self {
+                Self {
+                    vals: [
+                        SimdU8Value::load_from(ptr.as_ptr()),
+                        SimdU8Value::load_from(ptr.as_ptr().add(32)),
+                    ],
+                }
+            }
+
+            #[cfg_attr(not(target_arch="aarch64"), target_feature(enable = $feat))]
+            #[inline]
+            unsafe fn is_ascii(&self) -> bool {
+                self.vals[0].or(self.vals[1]).is_ascii()
+            }
+        }
+    };
+}
diff --git a/src/implementation/helpers.rs b/src/implementation/helpers.rs
new file mode 100644
index 0000000..a6bd693
--- /dev/null
+++ b/src/implementation/helpers.rs
@@ -0,0 +1,117 @@
+type Utf8ErrorCompat = crate::compat::Utf8Error;
+
+#[inline]
+pub(crate) fn validate_utf8_at_offset(input: &[u8], offset: usize) -> Result<(), Utf8ErrorCompat> {
+    #[allow(clippy::cast_possible_truncation)]
+    match core::str::from_utf8(&input[offset..]) {
+        Ok(_) => Ok(()),
+        Err(err) => Err(Utf8ErrorCompat {
+            valid_up_to: err.valid_up_to() + offset,
+            error_len: err.error_len().map(|len| {
+                // never truncates since std::str::err::Utf8Error::error_len() never returns value larger than 4
+                len as u8
+            }),
+        }),
+    }
+}
+
+#[cold]
+#[allow(dead_code)]
+pub(crate) fn get_compat_error(input: &[u8], failing_block_pos: usize) -> Utf8ErrorCompat {
+    let offset = if failing_block_pos == 0 {
+        // Error must be in this block since it is the first.
+        0
+    } else {
+        // The previous block is OK except for a possible continuation over the block boundary.
+        // We go backwards over the last three bytes of the previous block and find the
+        // last non-continuation byte as a starting point for an std validation. If the last
+        // three bytes are all continuation bytes then the previous block ends with a four byte
+        // UTF-8 codepoint, is thus complete and valid UTF-8. We start the check with the
+        // current block in that case.
+        (1..=3)
+            .into_iter()
+            .find(|i| input[failing_block_pos - i] >> 6 != 0b10)
+            .map_or(failing_block_pos, |i| failing_block_pos - i)
+    };
+    // UNWRAP: safe because the SIMD UTF-8 validation found an error
+    validate_utf8_at_offset(input, offset).unwrap_err()
+}
+
+#[allow(dead_code)]
+#[allow(clippy::missing_const_for_fn)] // clippy is wrong, it cannot really be const
+pub(crate) unsafe fn memcpy_unaligned_nonoverlapping_inline_opt_lt_64(
+    mut src: *const u8,
+    mut dest: *mut u8,
+    mut len: usize,
+) {
+    // This gets properly auto-vectorized on AVX 2 and SSE 4.2
+    #[inline]
+    unsafe fn memcpy_u64(src: &mut *const u8, dest: &mut *mut u8) {
+        #[allow(clippy::cast_ptr_alignment)]
+        dest.cast::<u64>()
+            .write_unaligned(src.cast::<u64>().read_unaligned());
+        *src = src.offset(8);
+        *dest = dest.offset(8);
+    }
+    if len >= 32 {
+        memcpy_u64(&mut src, &mut dest);
+        memcpy_u64(&mut src, &mut dest);
+        memcpy_u64(&mut src, &mut dest);
+        memcpy_u64(&mut src, &mut dest);
+        len -= 32;
+    }
+    if len >= 16 {
+        memcpy_u64(&mut src, &mut dest);
+        memcpy_u64(&mut src, &mut dest);
+        len -= 16;
+    }
+    if len >= 8 {
+        memcpy_u64(&mut src, &mut dest);
+        len -= 8;
+    }
+    while len > 0 {
+        *dest = *src;
+        src = src.offset(1);
+        dest = dest.offset(1);
+        len -= 1;
+    }
+}
+
+pub(crate) const SIMD_CHUNK_SIZE: usize = 64;
+
+#[repr(C, align(32))]
+#[allow(dead_code)]
+pub(crate) struct Utf8CheckAlgorithm<T> {
+    pub(crate) prev: T,
+    pub(crate) incomplete: T,
+    pub(crate) error: T,
+}
+
+#[repr(C, align(16))]
+#[allow(dead_code)]
+pub(crate) struct TempSimdChunkA16(pub(crate) [u8; SIMD_CHUNK_SIZE]);
+
+#[allow(dead_code)]
+impl TempSimdChunkA16 {
+    #[inline]
+    pub(crate) const fn new() -> Self {
+        Self([0; SIMD_CHUNK_SIZE])
+    }
+}
+
+#[repr(C, align(32))]
+#[allow(dead_code)]
+pub(crate) struct TempSimdChunkA32(pub(crate) [u8; SIMD_CHUNK_SIZE]);
+
+#[allow(dead_code)]
+impl TempSimdChunkA32 {
+    #[inline]
+    pub(crate) const fn new() -> Self {
+        Self([0; SIMD_CHUNK_SIZE])
+    }
+}
+
+#[derive(Clone, Copy)]
+pub(crate) struct SimdU8Value<T>(pub(crate) T)
+where
+    T: Copy;
diff --git a/src/implementation/mod.rs b/src/implementation/mod.rs
new file mode 100644
index 0000000..242b46b
--- /dev/null
+++ b/src/implementation/mod.rs
@@ -0,0 +1,96 @@
+//! Contains UTF-8 validation implementations.
+
+type Utf8ErrorCompat = crate::compat::Utf8Error;
+type Utf8ErrorBasic = crate::basic::Utf8Error;
+
+#[allow(unused_macros)]
+#[macro_use]
+mod algorithm;
+
+pub(crate) mod helpers;
+
+// UTF-8 validation function types
+
+#[allow(dead_code)]
+type ValidateUtf8Fn = unsafe fn(input: &[u8]) -> Result<(), Utf8ErrorBasic>;
+
+#[allow(dead_code)]
+type ValidateUtf8CompatFn = unsafe fn(input: &[u8]) -> Result<(), Utf8ErrorCompat>;
+
+// x86 implementation
+
+#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
+pub(crate) mod x86;
+
+/// Fn needed instead of re-import, otherwise not inlined in non-std case
+#[allow(clippy::inline_always)]
+#[inline(always)]
+#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
+pub(super) unsafe fn validate_utf8_basic(input: &[u8]) -> Result<(), Utf8ErrorBasic> {
+    x86::validate_utf8_basic(input)
+}
+
+/// Fn needed instead of re-import, otherwise not inlined in non-std case
+#[allow(clippy::inline_always)]
+#[inline(always)]
+#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
+pub(super) unsafe fn validate_utf8_compat(input: &[u8]) -> Result<(), Utf8ErrorCompat> {
+    x86::validate_utf8_compat(input)
+}
+
+// aarch64 implementation
+
+#[cfg(target_arch = "aarch64")]
+pub(crate) mod aarch64;
+
+#[cfg(target_arch = "aarch64")]
+pub(super) use aarch64::validate_utf8_basic;
+
+#[cfg(target_arch = "aarch64")]
+pub(super) use aarch64::validate_utf8_compat;
+
+// wasm32 implementation
+
+#[cfg(target_arch = "wasm32")]
+pub(crate) mod wasm32;
+
+#[cfg(target_arch = "wasm32")]
+pub(super) use wasm32::validate_utf8_basic;
+
+#[cfg(target_arch = "wasm32")]
+pub(super) use wasm32::validate_utf8_compat;
+
+// fallback for unsupported architectures
+
+#[cfg(not(any(
+    target_arch = "x86",
+    target_arch = "x86_64",
+    target_arch = "aarch64",
+    target_arch = "wasm32"
+)))]
+pub(super) use validate_utf8_basic_fallback as validate_utf8_basic;
+
+#[cfg(not(any(
+    target_arch = "x86",
+    target_arch = "x86_64",
+    target_arch = "aarch64",
+    target_arch = "wasm32"
+)))]
+pub(super) use validate_utf8_compat_fallback as validate_utf8_compat;
+
+// fallback method implementations
+
+#[inline]
+#[allow(dead_code)]
+pub(crate) fn validate_utf8_basic_fallback(input: &[u8]) -> Result<(), Utf8ErrorBasic> {
+    match core::str::from_utf8(input) {
+        Ok(_) => Ok(()),
+        Err(_) => Err(Utf8ErrorBasic {}),
+    }
+}
+
+#[inline]
+#[allow(dead_code)]
+pub(crate) fn validate_utf8_compat_fallback(input: &[u8]) -> Result<(), Utf8ErrorCompat> {
+    helpers::validate_utf8_at_offset(input, 0)
+}
diff --git a/src/implementation/wasm32/mod.rs b/src/implementation/wasm32/mod.rs
new file mode 100644
index 0000000..5462173
--- /dev/null
+++ b/src/implementation/wasm32/mod.rs
@@ -0,0 +1,41 @@
+#[cfg(target_feature = "simd128")]
+#[allow(dead_code)]
+pub(crate) mod simd128;
+
+#[inline]
+#[cfg(target_feature = "simd128")]
+pub(crate) unsafe fn validate_utf8_basic(input: &[u8]) -> Result<(), crate::basic::Utf8Error> {
+    if input.len() < super::helpers::SIMD_CHUNK_SIZE {
+        return super::validate_utf8_basic_fallback(input);
+    }
+
+    validate_utf8_basic_simd128(input)
+}
+
+#[inline(never)]
+#[cfg(target_feature = "simd128")]
+unsafe fn validate_utf8_basic_simd128(input: &[u8]) -> Result<(), crate::basic::Utf8Error> {
+    simd128::validate_utf8_basic(input)
+}
+
+#[cfg(not(target_feature = "simd128"))]
+pub(crate) use super::validate_utf8_basic_fallback as validate_utf8_basic;
+
+#[inline]
+#[cfg(target_feature = "simd128")]
+pub(crate) unsafe fn validate_utf8_compat(input: &[u8]) -> Result<(), crate::compat::Utf8Error> {
+    if input.len() < super::helpers::SIMD_CHUNK_SIZE {
+        return super::validate_utf8_compat_fallback(input);
+    }
+
+    validate_utf8_compat_simd128(input)
+}
+
+#[inline(never)]
+#[cfg(target_feature = "simd128")]
+unsafe fn validate_utf8_compat_simd128(input: &[u8]) -> Result<(), crate::compat::Utf8Error> {
+    simd128::validate_utf8_compat(input)
+}
+
+#[cfg(not(target_feature = "simd128"))]
+pub(crate) use super::validate_utf8_compat_fallback as validate_utf8_compat;
diff --git a/src/implementation/wasm32/simd128.rs b/src/implementation/wasm32/simd128.rs
new file mode 100644
index 0000000..fb12dba
--- /dev/null
+++ b/src/implementation/wasm32/simd128.rs
@@ -0,0 +1,284 @@
+//! Contains the wasm32 UTF-8 validation implementation.
+
+use core::arch::wasm32::{
+    u8x16, u8x16_bitmask, u8x16_gt, u8x16_shr, u8x16_shuffle, u8x16_splat, u8x16_sub_sat,
+    u8x16_swizzle, v128, v128_and, v128_any_true, v128_or, v128_xor,
+};
+
+use crate::implementation::helpers::Utf8CheckAlgorithm;
+
+// wasm32 SIMD primitives
+
+type SimdU8Value = crate::implementation::helpers::SimdU8Value<v128>;
+
+#[repr(C, align(16))]
+struct AlignV128Array([u8; 16]);
+
+impl SimdU8Value {
+    #[inline]
+    #[allow(clippy::too_many_arguments)]
+    #[allow(clippy::cast_possible_wrap)]
+    #[allow(clippy::cast_ptr_alignment)]
+    unsafe fn from_32_cut_off_leading(
+        _v0: u8,
+        _v1: u8,
+        _v2: u8,
+        _v3: u8,
+        _v4: u8,
+        _v5: u8,
+        _v6: u8,
+        _v7: u8,
+        _v8: u8,
+        _v9: u8,
+        _v10: u8,
+        _v11: u8,
+        _v12: u8,
+        _v13: u8,
+        _v14: u8,
+        _v15: u8,
+        v16: u8,
+        v17: u8,
+        v18: u8,
+        v19: u8,
+        v20: u8,
+        v21: u8,
+        v22: u8,
+        v23: u8,
+        v24: u8,
+        v25: u8,
+        v26: u8,
+        v27: u8,
+        v28: u8,
+        v29: u8,
+        v30: u8,
+        v31: u8,
+    ) -> Self {
+        let arr = AlignV128Array([
+            v16, v17, v18, v19, v20, v21, v22, v23, v24, v25, v26, v27, v28, v29, v30, v31,
+        ]);
+        Self::from(*(arr.0.as_ptr().cast::<v128>()))
+    }
+
+    #[inline]
+    #[allow(clippy::too_many_arguments)]
+    #[allow(clippy::cast_possible_wrap)]
+    #[allow(clippy::cast_ptr_alignment)]
+    unsafe fn repeat_16(
+        v0: u8,
+        v1: u8,
+        v2: u8,
+        v3: u8,
+        v4: u8,
+        v5: u8,
+        v6: u8,
+        v7: u8,
+        v8: u8,
+        v9: u8,
+        v10: u8,
+        v11: u8,
+        v12: u8,
+        v13: u8,
+        v14: u8,
+        v15: u8,
+    ) -> Self {
+        let arr = AlignV128Array([
+            v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15,
+        ]);
+        Self::from(*(arr.0.as_ptr().cast::<v128>()))
+    }
+
+    #[inline]
+    #[allow(clippy::cast_ptr_alignment)]
+    unsafe fn load_from(ptr: *const u8) -> Self {
+        Self::from(*(ptr.cast::<v128>()))
+    }
+
+    #[inline]
+    #[allow(clippy::too_many_arguments)]
+    unsafe fn lookup_16(
+        self,
+        v0: u8,
+        v1: u8,
+        v2: u8,
+        v3: u8,
+        v4: u8,
+        v5: u8,
+        v6: u8,
+        v7: u8,
+        v8: u8,
+        v9: u8,
+        v10: u8,
+        v11: u8,
+        v12: u8,
+        v13: u8,
+        v14: u8,
+        v15: u8,
+    ) -> Self {
+        Self::from(u8x16_swizzle(
+            Self::repeat_16(
+                v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15,
+            )
+            .0,
+            self.0,
+        ))
+    }
+
+    #[inline]
+    #[allow(clippy::cast_possible_wrap)]
+    unsafe fn splat(val: u8) -> Self {
+        Self::from(u8x16_splat(val))
+    }
+
+    #[inline]
+    #[allow(clippy::cast_possible_wrap)]
+    unsafe fn splat0() -> Self {
+        Self::from(u8x16(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0))
+    }
+
+    #[inline]
+    unsafe fn or(self, b: Self) -> Self {
+        Self::from(v128_or(self.0, b.0))
+    }
+
+    #[inline]
+    unsafe fn and(self, b: Self) -> Self {
+        Self::from(v128_and(self.0, b.0))
+    }
+
+    #[inline]
+    unsafe fn xor(self, b: Self) -> Self {
+        Self::from(v128_xor(self.0, b.0))
+    }
+
+    #[inline]
+    unsafe fn saturating_sub(self, b: Self) -> Self {
+        Self::from(u8x16_sub_sat(self.0, b.0))
+    }
+
+    // ugly but shr<N> requires const generics
+
+    #[allow(clippy::cast_lossless)]
+    #[inline]
+    unsafe fn shr4(self) -> Self {
+        Self::from(u8x16_shr(self.0, 4))
+    }
+
+    // ugly but prev<N> requires const generics
+
+    // TODO make this into a macro
+
+    #[allow(clippy::cast_lossless)]
+    #[inline]
+    unsafe fn prev1(self, prev: Self) -> Self {
+        Self::from(u8x16_shuffle::<
+            15,
+            16,
+            17,
+            18,
+            19,
+            20,
+            21,
+            22,
+            23,
+            24,
+            25,
+            26,
+            27,
+            28,
+            29,
+            30,
+        >(prev.0, self.0))
+    }
+
+    // ugly but prev<N> requires const generics
+
+    #[allow(clippy::cast_lossless)]
+    #[inline]
+    unsafe fn prev2(self, prev: Self) -> Self {
+        Self::from(u8x16_shuffle::<
+            14,
+            15,
+            16,
+            17,
+            18,
+            19,
+            20,
+            21,
+            22,
+            23,
+            24,
+            25,
+            26,
+            27,
+            28,
+            29,
+        >(prev.0, self.0))
+    }
+
+    // ugly but prev<N> requires const generics
+
+    #[allow(clippy::cast_lossless)]
+    #[inline]
+    unsafe fn prev3(self, prev: Self) -> Self {
+        Self::from(u8x16_shuffle::<
+            13,
+            14,
+            15,
+            16,
+            17,
+            18,
+            19,
+            20,
+            21,
+            22,
+            23,
+            24,
+            25,
+            26,
+            27,
+            28,
+        >(prev.0, self.0))
+    }
+
+    #[inline]
+    unsafe fn unsigned_gt(self, other: Self) -> Self {
+        Self::from(u8x16_gt(self.0, other.0))
+    }
+
+    #[inline]
+    unsafe fn any_bit_set(self) -> bool {
+        v128_any_true(self.0)
+    }
+
+    #[inline]
+    unsafe fn is_ascii(self) -> bool {
+        u8x16_bitmask(self.0) == 0
+    }
+}
+
+impl From<v128> for SimdU8Value {
+    #[inline]
+    fn from(v: v128) -> Self {
+        Self(v)
+    }
+}
+
+impl Utf8CheckAlgorithm<SimdU8Value> {
+    #[inline]
+    unsafe fn must_be_2_3_continuation(prev2: SimdU8Value, prev3: SimdU8Value) -> SimdU8Value {
+        let is_third_byte = prev2.unsigned_gt(SimdU8Value::splat(0b1110_0000 - 1));
+        let is_fourth_byte = prev3.unsigned_gt(SimdU8Value::splat(0b1111_0000 - 1));
+
+        is_third_byte.or(is_fourth_byte)
+    }
+}
+
+#[inline]
+const fn simd_prefetch(_ptr: *const u8) {
+    // no-op
+}
+
+const PREFETCH: bool = false;
+use crate::implementation::helpers::TempSimdChunkA16 as TempSimdChunk;
+simd_input_128_bit!("simd128");
+algorithm_simd!("simd128");
diff --git a/src/implementation/x86/avx2.rs b/src/implementation/x86/avx2.rs
new file mode 100644
index 0000000..14483ef
--- /dev/null
+++ b/src/implementation/x86/avx2.rs
@@ -0,0 +1,261 @@
+//! Contains the x86-64/x86 AVX2 UTF-8 validation implementation.
+
+#![allow(clippy::too_many_arguments)]
+
+#[cfg(target_arch = "x86")]
+use core::arch::x86::{
+    __m256i, _mm256_alignr_epi8, _mm256_and_si256, _mm256_cmpgt_epi8, _mm256_loadu_si256,
+    _mm256_movemask_epi8, _mm256_or_si256, _mm256_permute2x128_si256, _mm256_set1_epi8,
+    _mm256_setr_epi8, _mm256_setzero_si256, _mm256_shuffle_epi8, _mm256_srli_epi16,
+    _mm256_subs_epu8, _mm256_testz_si256, _mm256_xor_si256, _mm_prefetch, _MM_HINT_T0,
+};
+#[cfg(target_arch = "x86_64")]
+use core::arch::x86_64::{
+    __m256i, _mm256_alignr_epi8, _mm256_and_si256, _mm256_cmpgt_epi8, _mm256_loadu_si256,
+    _mm256_movemask_epi8, _mm256_or_si256, _mm256_permute2x128_si256, _mm256_set1_epi8,
+    _mm256_setr_epi8, _mm256_setzero_si256, _mm256_shuffle_epi8, _mm256_srli_epi16,
+    _mm256_subs_epu8, _mm256_testz_si256, _mm256_xor_si256, _mm_prefetch, _MM_HINT_T0,
+};
+
+use crate::implementation::helpers::Utf8CheckAlgorithm;
+
+// AVX 2 SIMD primitives
+
+type SimdU8Value = crate::implementation::helpers::SimdU8Value<__m256i>;
+
+impl SimdU8Value {
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn from_32_cut_off_leading(
+        v0: u8,
+        v1: u8,
+        v2: u8,
+        v3: u8,
+        v4: u8,
+        v5: u8,
+        v6: u8,
+        v7: u8,
+        v8: u8,
+        v9: u8,
+        v10: u8,
+        v11: u8,
+        v12: u8,
+        v13: u8,
+        v14: u8,
+        v15: u8,
+        v16: u8,
+        v17: u8,
+        v18: u8,
+        v19: u8,
+        v20: u8,
+        v21: u8,
+        v22: u8,
+        v23: u8,
+        v24: u8,
+        v25: u8,
+        v26: u8,
+        v27: u8,
+        v28: u8,
+        v29: u8,
+        v30: u8,
+        v31: u8,
+    ) -> Self {
+        #[allow(clippy::cast_possible_wrap)]
+        Self::from(_mm256_setr_epi8(
+            v0 as i8, v1 as i8, v2 as i8, v3 as i8, v4 as i8, v5 as i8, v6 as i8, v7 as i8,
+            v8 as i8, v9 as i8, v10 as i8, v11 as i8, v12 as i8, v13 as i8, v14 as i8, v15 as i8,
+            v16 as i8, v17 as i8, v18 as i8, v19 as i8, v20 as i8, v21 as i8, v22 as i8, v23 as i8,
+            v24 as i8, v25 as i8, v26 as i8, v27 as i8, v28 as i8, v29 as i8, v30 as i8, v31 as i8,
+        ))
+    }
+
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn repeat_16(
+        v0: u8,
+        v1: u8,
+        v2: u8,
+        v3: u8,
+        v4: u8,
+        v5: u8,
+        v6: u8,
+        v7: u8,
+        v8: u8,
+        v9: u8,
+        v10: u8,
+        v11: u8,
+        v12: u8,
+        v13: u8,
+        v14: u8,
+        v15: u8,
+    ) -> Self {
+        #[allow(clippy::cast_possible_wrap)]
+        Self::from_32_cut_off_leading(
+            v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15, v0, v1, v2, v3,
+            v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15,
+        )
+    }
+
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn load_from(ptr: *const u8) -> Self {
+        #[allow(clippy::cast_ptr_alignment)]
+        Self::from(_mm256_loadu_si256(ptr.cast::<__m256i>()))
+    }
+
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn lookup_16(
+        self,
+        v0: u8,
+        v1: u8,
+        v2: u8,
+        v3: u8,
+        v4: u8,
+        v5: u8,
+        v6: u8,
+        v7: u8,
+        v8: u8,
+        v9: u8,
+        v10: u8,
+        v11: u8,
+        v12: u8,
+        v13: u8,
+        v14: u8,
+        v15: u8,
+    ) -> Self {
+        Self::from(_mm256_shuffle_epi8(
+            Self::repeat_16(
+                v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15,
+            )
+            .0,
+            self.0,
+        ))
+    }
+
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn splat(val: u8) -> Self {
+        #[allow(clippy::cast_possible_wrap)]
+        Self::from(_mm256_set1_epi8(val as i8))
+    }
+
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn splat0() -> Self {
+        Self::from(_mm256_setzero_si256())
+    }
+
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn or(self, b: Self) -> Self {
+        Self::from(_mm256_or_si256(self.0, b.0))
+    }
+
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn and(self, b: Self) -> Self {
+        Self::from(_mm256_and_si256(self.0, b.0))
+    }
+
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn xor(self, b: Self) -> Self {
+        Self::from(_mm256_xor_si256(self.0, b.0))
+    }
+
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn saturating_sub(self, b: Self) -> Self {
+        Self::from(_mm256_subs_epu8(self.0, b.0))
+    }
+
+    // ugly but shr<N> requires const generics
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn shr4(self) -> Self {
+        Self::from(_mm256_srli_epi16(self.0, 4)).and(Self::splat(0xFF >> 4))
+    }
+
+    // ugly but prev<N> requires const generics
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn prev1(self, prev: Self) -> Self {
+        Self::from(_mm256_alignr_epi8(
+            self.0,
+            _mm256_permute2x128_si256(prev.0, self.0, 0x21),
+            16 - 1,
+        ))
+    }
+
+    // ugly but prev<N> requires const generics
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn prev2(self, prev: Self) -> Self {
+        Self::from(_mm256_alignr_epi8(
+            self.0,
+            _mm256_permute2x128_si256(prev.0, self.0, 0x21),
+            16 - 2,
+        ))
+    }
+
+    // ugly but prev<N> requires const generics
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn prev3(self, prev: Self) -> Self {
+        Self::from(_mm256_alignr_epi8(
+            self.0,
+            _mm256_permute2x128_si256(prev.0, self.0, 0x21),
+            16 - 3,
+        ))
+    }
+
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn signed_gt(self, other: Self) -> Self {
+        Self::from(_mm256_cmpgt_epi8(self.0, other.0))
+    }
+
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn any_bit_set(self) -> bool {
+        _mm256_testz_si256(self.0, self.0) != 1
+    }
+
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn is_ascii(self) -> bool {
+        _mm256_movemask_epi8(self.0) == 0
+    }
+}
+
+impl From<__m256i> for SimdU8Value {
+    #[inline]
+    fn from(val: __m256i) -> Self {
+        Self(val)
+    }
+}
+
+impl Utf8CheckAlgorithm<SimdU8Value> {
+    #[target_feature(enable = "avx2")]
+    #[inline]
+    unsafe fn must_be_2_3_continuation(prev2: SimdU8Value, prev3: SimdU8Value) -> SimdU8Value {
+        let is_third_byte = prev2.saturating_sub(SimdU8Value::splat(0b1110_0000 - 1));
+        let is_fourth_byte = prev3.saturating_sub(SimdU8Value::splat(0b1111_0000 - 1));
+
+        is_third_byte
+            .or(is_fourth_byte)
+            .signed_gt(SimdU8Value::splat0())
+    }
+}
+
+#[target_feature(enable = "avx2")]
+#[inline]
+unsafe fn simd_prefetch(ptr: *const u8) {
+    _mm_prefetch(ptr.cast::<i8>(), _MM_HINT_T0);
+}
+
+const PREFETCH: bool = true;
+use crate::implementation::helpers::TempSimdChunkA32 as TempSimdChunk;
+simd_input_256_bit!("avx2");
+algorithm_simd!("avx2");
diff --git a/src/implementation/x86/mod.rs b/src/implementation/x86/mod.rs
new file mode 100644
index 0000000..1995495
--- /dev/null
+++ b/src/implementation/x86/mod.rs
@@ -0,0 +1,199 @@
+#[allow(dead_code)]
+pub(crate) mod avx2;
+
+#[allow(dead_code)]
+pub(crate) mod sse42;
+
+#[allow(unused_imports)]
+use super::helpers::SIMD_CHUNK_SIZE;
+
+// validate_utf8_basic() std: implementation auto-selection
+
+#[cfg(all(feature = "std", not(target_feature = "avx2")))]
+#[inline]
+pub(crate) unsafe fn validate_utf8_basic(
+    input: &[u8],
+) -> core::result::Result<(), crate::basic::Utf8Error> {
+    use core::mem;
+    use std::sync::atomic::{AtomicPtr, Ordering};
+
+    type FnRaw = *mut ();
+
+    static FN: AtomicPtr<()> = AtomicPtr::new(get_fastest as FnRaw);
+
+    unsafe fn get_fastest(input: &[u8]) -> core::result::Result<(), crate::basic::Utf8Error> {
+        let fun = get_fastest_available_implementation_basic();
+        FN.store(fun as FnRaw, Ordering::Relaxed);
+        (fun)(input)
+    }
+
+    if input.len() < SIMD_CHUNK_SIZE {
+        return super::validate_utf8_basic_fallback(input);
+    }
+
+    let fun = FN.load(Ordering::Relaxed);
+    mem::transmute::<FnRaw, super::ValidateUtf8Fn>(fun)(input)
+}
+
+#[cfg(all(feature = "std", not(target_feature = "avx2")))]
+#[inline]
+fn get_fastest_available_implementation_basic() -> super::ValidateUtf8Fn {
+    if std::is_x86_feature_detected!("avx2") {
+        avx2::validate_utf8_basic
+    } else if std::is_x86_feature_detected!("sse4.2") {
+        sse42::validate_utf8_basic
+    } else {
+        super::validate_utf8_basic_fallback
+    }
+}
+
+// validate_utf8_basic() no-std: implementation selection by config
+
+#[cfg(target_feature = "avx2")]
+pub(crate) unsafe fn validate_utf8_basic(
+    input: &[u8],
+) -> core::result::Result<(), crate::basic::Utf8Error> {
+    if input.len() < SIMD_CHUNK_SIZE {
+        return super::validate_utf8_basic_fallback(input);
+    }
+
+    validate_utf8_basic_avx2(input)
+}
+
+#[cfg(target_feature = "avx2")]
+#[inline(never)]
+unsafe fn validate_utf8_basic_avx2(
+    input: &[u8],
+) -> core::result::Result<(), crate::basic::Utf8Error> {
+    avx2::validate_utf8_basic(input)
+}
+
+#[cfg(all(
+    not(feature = "std"),
+    not(target_feature = "avx2"),
+    target_feature = "sse4.2"
+))]
+pub(crate) unsafe fn validate_utf8_basic(
+    input: &[u8],
+) -> core::result::Result<(), crate::basic::Utf8Error> {
+    if input.len() < SIMD_CHUNK_SIZE {
+        return super::validate_utf8_basic_fallback(input);
+    }
+
+    validate_utf8_basic_sse42(input)
+}
+
+#[cfg(all(
+    not(feature = "std"),
+    not(target_feature = "avx2"),
+    target_feature = "sse4.2"
+))]
+#[inline(never)]
+unsafe fn validate_utf8_basic_sse42(
+    input: &[u8],
+) -> core::result::Result<(), crate::basic::Utf8Error> {
+    sse42::validate_utf8_basic(input)
+}
+
+#[cfg(all(
+    not(feature = "std"),
+    not(target_feature = "avx2"),
+    not(target_feature = "sse4.2")
+))]
+pub(crate) use super::validate_utf8_basic_fallback as validate_utf8_basic;
+
+// validate_utf8_compat() std: implementation auto-selection
+
+#[cfg(all(feature = "std", not(target_feature = "avx2")))]
+#[cfg(feature = "std")]
+#[inline]
+pub(crate) unsafe fn validate_utf8_compat(
+    input: &[u8],
+) -> core::result::Result<(), crate::compat::Utf8Error> {
+    use core::mem;
+    use std::sync::atomic::{AtomicPtr, Ordering};
+
+    type FnRaw = *mut ();
+
+    static FN: AtomicPtr<()> = AtomicPtr::new(get_fastest as FnRaw);
+
+    unsafe fn get_fastest(input: &[u8]) -> core::result::Result<(), crate::compat::Utf8Error> {
+        let fun = get_fastest_available_implementation_compat();
+        FN.store(fun as FnRaw, Ordering::Relaxed);
+        (fun)(input)
+    }
+
+    if input.len() < SIMD_CHUNK_SIZE {
+        return super::validate_utf8_compat_fallback(input);
+    }
+
+    let fun = FN.load(Ordering::Relaxed);
+    mem::transmute::<FnRaw, super::ValidateUtf8CompatFn>(fun)(input)
+}
+
+#[cfg(all(feature = "std", not(target_feature = "avx2")))]
+#[inline]
+fn get_fastest_available_implementation_compat() -> super::ValidateUtf8CompatFn {
+    if std::is_x86_feature_detected!("avx2") {
+        avx2::validate_utf8_compat
+    } else if std::is_x86_feature_detected!("sse4.2") {
+        sse42::validate_utf8_compat
+    } else {
+        super::validate_utf8_compat_fallback
+    }
+}
+
+// validate_utf8_basic() no-std: implementation selection by config
+
+#[cfg(target_feature = "avx2")]
+pub(crate) unsafe fn validate_utf8_compat(
+    input: &[u8],
+) -> core::result::Result<(), crate::compat::Utf8Error> {
+    if input.len() < SIMD_CHUNK_SIZE {
+        return super::validate_utf8_compat_fallback(input);
+    }
+
+    validate_utf8_compat_avx2(input)
+}
+
+#[cfg(target_feature = "avx2")]
+#[inline(never)]
+unsafe fn validate_utf8_compat_avx2(
+    input: &[u8],
+) -> core::result::Result<(), crate::compat::Utf8Error> {
+    avx2::validate_utf8_compat(input)
+}
+
+#[cfg(all(
+    not(feature = "std"),
+    not(target_feature = "avx2"),
+    target_feature = "sse4.2"
+))]
+pub(crate) unsafe fn validate_utf8_compat(
+    input: &[u8],
+) -> core::result::Result<(), crate::compat::Utf8Error> {
+    if input.len() < SIMD_CHUNK_SIZE {
+        return super::validate_utf8_compat_fallback(input);
+    }
+
+    validate_utf8_compat_sse42(input)
+}
+
+#[cfg(all(
+    not(feature = "std"),
+    not(target_feature = "avx2"),
+    target_feature = "sse4.2"
+))]
+#[inline(never)]
+pub(crate) unsafe fn validate_utf8_compat_sse42(
+    input: &[u8],
+) -> core::result::Result<(), crate::compat::Utf8Error> {
+    sse42::validate_utf8_compat(input)
+}
+
+#[cfg(all(
+    not(feature = "std"),
+    not(target_feature = "avx2"),
+    not(target_feature = "sse4.2")
+))]
+pub(crate) use super::validate_utf8_compat_fallback as validate_utf8_compat;
diff --git a/src/implementation/x86/sse42.rs b/src/implementation/x86/sse42.rs
new file mode 100644
index 0000000..90ee373
--- /dev/null
+++ b/src/implementation/x86/sse42.rs
@@ -0,0 +1,245 @@
+//! Contains the x86-64/x86 SSE4.2 UTF-8 validation implementation.
+
+#![allow(clippy::too_many_arguments)]
+
+#[cfg(target_arch = "x86")]
+use core::arch::x86::{
+    __m128i, _mm_alignr_epi8, _mm_and_si128, _mm_cmpgt_epi8, _mm_loadu_si128, _mm_movemask_epi8,
+    _mm_or_si128, _mm_prefetch, _mm_set1_epi8, _mm_setr_epi8, _mm_setzero_si128, _mm_shuffle_epi8,
+    _mm_srli_epi16, _mm_subs_epu8, _mm_testz_si128, _mm_xor_si128, _MM_HINT_T0,
+};
+#[cfg(target_arch = "x86_64")]
+use core::arch::x86_64::{
+    __m128i, _mm_alignr_epi8, _mm_and_si128, _mm_cmpgt_epi8, _mm_loadu_si128, _mm_movemask_epi8,
+    _mm_or_si128, _mm_prefetch, _mm_set1_epi8, _mm_setr_epi8, _mm_setzero_si128, _mm_shuffle_epi8,
+    _mm_srli_epi16, _mm_subs_epu8, _mm_testz_si128, _mm_xor_si128, _MM_HINT_T0,
+};
+
+use crate::implementation::helpers::Utf8CheckAlgorithm;
+
+// SSE 4.2 SIMD primitives
+
+type SimdU8Value = crate::implementation::helpers::SimdU8Value<__m128i>;
+
+impl SimdU8Value {
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn from_32_cut_off_leading(
+        _v0: u8,
+        _v1: u8,
+        _v2: u8,
+        _v3: u8,
+        _v4: u8,
+        _v5: u8,
+        _v6: u8,
+        _v7: u8,
+        _v8: u8,
+        _v9: u8,
+        _v10: u8,
+        _v11: u8,
+        _v12: u8,
+        _v13: u8,
+        _v14: u8,
+        _v15: u8,
+        v16: u8,
+        v17: u8,
+        v18: u8,
+        v19: u8,
+        v20: u8,
+        v21: u8,
+        v22: u8,
+        v23: u8,
+        v24: u8,
+        v25: u8,
+        v26: u8,
+        v27: u8,
+        v28: u8,
+        v29: u8,
+        v30: u8,
+        v31: u8,
+    ) -> Self {
+        #[allow(clippy::cast_possible_wrap)]
+        Self::from(_mm_setr_epi8(
+            v16 as i8, v17 as i8, v18 as i8, v19 as i8, v20 as i8, v21 as i8, v22 as i8, v23 as i8,
+            v24 as i8, v25 as i8, v26 as i8, v27 as i8, v28 as i8, v29 as i8, v30 as i8, v31 as i8,
+        ))
+    }
+
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn repeat_16(
+        v0: u8,
+        v1: u8,
+        v2: u8,
+        v3: u8,
+        v4: u8,
+        v5: u8,
+        v6: u8,
+        v7: u8,
+        v8: u8,
+        v9: u8,
+        v10: u8,
+        v11: u8,
+        v12: u8,
+        v13: u8,
+        v14: u8,
+        v15: u8,
+    ) -> Self {
+        #[allow(clippy::cast_possible_wrap)]
+        Self::from(_mm_setr_epi8(
+            v0 as i8, v1 as i8, v2 as i8, v3 as i8, v4 as i8, v5 as i8, v6 as i8, v7 as i8,
+            v8 as i8, v9 as i8, v10 as i8, v11 as i8, v12 as i8, v13 as i8, v14 as i8, v15 as i8,
+        ))
+    }
+
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn load_from(ptr: *const u8) -> Self {
+        #[allow(clippy::cast_ptr_alignment)]
+        Self::from(_mm_loadu_si128(ptr.cast::<__m128i>()))
+    }
+
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn lookup_16(
+        self,
+        v0: u8,
+        v1: u8,
+        v2: u8,
+        v3: u8,
+        v4: u8,
+        v5: u8,
+        v6: u8,
+        v7: u8,
+        v8: u8,
+        v9: u8,
+        v10: u8,
+        v11: u8,
+        v12: u8,
+        v13: u8,
+        v14: u8,
+        v15: u8,
+    ) -> Self {
+        Self::from(_mm_shuffle_epi8(
+            Self::repeat_16(
+                v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, v13, v14, v15,
+            )
+            .0,
+            self.0,
+        ))
+    }
+
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn splat(val: u8) -> Self {
+        #[allow(clippy::cast_possible_wrap)]
+        Self::from(_mm_set1_epi8(val as i8))
+    }
+
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn splat0() -> Self {
+        Self::from(_mm_setzero_si128())
+    }
+
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn or(self, b: Self) -> Self {
+        Self::from(_mm_or_si128(self.0, b.0))
+    }
+
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn and(self, b: Self) -> Self {
+        Self::from(_mm_and_si128(self.0, b.0))
+    }
+
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn xor(self, b: Self) -> Self {
+        Self::from(_mm_xor_si128(self.0, b.0))
+    }
+
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn saturating_sub(self, b: Self) -> Self {
+        Self::from(_mm_subs_epu8(self.0, b.0))
+    }
+
+    // ugly but shr<N> requires const generics
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn shr4(self) -> Self {
+        Self::from(_mm_srli_epi16(self.0, 4)).and(Self::splat(0xFF >> 4))
+    }
+
+    // ugly but prev<N> requires const generics
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn prev1(self, prev: Self) -> Self {
+        Self::from(_mm_alignr_epi8(self.0, prev.0, 16 - 1))
+    }
+
+    // ugly but prev<N> requires const generics
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn prev2(self, prev: Self) -> Self {
+        Self::from(_mm_alignr_epi8(self.0, prev.0, 16 - 2))
+    }
+
+    // ugly but prev<N> requires const generics
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn prev3(self, prev: Self) -> Self {
+        Self::from(_mm_alignr_epi8(self.0, prev.0, 16 - 3))
+    }
+
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn signed_gt(self, other: Self) -> Self {
+        Self::from(_mm_cmpgt_epi8(self.0, other.0))
+    }
+
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn any_bit_set(self) -> bool {
+        _mm_testz_si128(self.0, self.0) != 1
+    }
+
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn is_ascii(self) -> bool {
+        _mm_movemask_epi8(self.0) == 0
+    }
+}
+
+impl From<__m128i> for SimdU8Value {
+    #[inline]
+    fn from(val: __m128i) -> Self {
+        Self(val)
+    }
+}
+
+impl Utf8CheckAlgorithm<SimdU8Value> {
+    #[target_feature(enable = "sse4.2")]
+    #[inline]
+    unsafe fn must_be_2_3_continuation(prev2: SimdU8Value, prev3: SimdU8Value) -> SimdU8Value {
+        let is_third_byte = prev2.saturating_sub(SimdU8Value::splat(0b1110_0000 - 1));
+        let is_fourth_byte = prev3.saturating_sub(SimdU8Value::splat(0b1111_0000 - 1));
+
+        is_third_byte
+            .or(is_fourth_byte)
+            .signed_gt(SimdU8Value::splat0())
+    }
+}
+
+#[target_feature(enable = "sse4.2")]
+#[inline]
+unsafe fn simd_prefetch(ptr: *const u8) {
+    _mm_prefetch(ptr.cast::<i8>(), _MM_HINT_T0);
+}
+
+const PREFETCH: bool = false;
+use crate::implementation::helpers::TempSimdChunkA16 as TempSimdChunk;
+simd_input_128_bit!("sse4.2");
+algorithm_simd!("sse4.2");
diff --git a/src/lib.rs b/src/lib.rs
new file mode 100644
index 0000000..5bdffb8
--- /dev/null
+++ b/src/lib.rs
@@ -0,0 +1,116 @@
+#![deny(warnings)]
+#![warn(unused_extern_crates)]
+#![deny(
+    clippy::all,
+    clippy::unwrap_used,
+    clippy::unnecessary_unwrap,
+    clippy::pedantic,
+    clippy::nursery
+)]
+#![allow(clippy::redundant_pub_crate)] // check is broken
+#![allow(clippy::redundant_else)] // can make code more readable
+#![allow(clippy::explicit_iter_loop)] // can make code more readable
+#![allow(clippy::semicolon_if_nothing_returned)] // see https://github.com/rust-lang/rust-clippy/issues/7768
+#![deny(missing_docs)]
+#![cfg_attr(not(feature = "std"), no_std)]
+#![cfg_attr(docsrs, feature(doc_cfg))]
+
+//! Blazingly fast API-compatible UTF-8 validation for Rust using SIMD extensions, based on the implementation from
+//! [simdjson](https://github.com/simdjson/simdjson). Originally ported to Rust by the developers of [simd-json.rs](https://simd-json.rs), but now heavily improved.
+//!
+//! ## Quick start
+//! Add the dependency to your Cargo.toml file:
+//! ```toml
+//! [dependencies]
+//! simdutf8 = "0.1.4"
+//! ```
+//! For ARM64 SIMD support on Rust 1.59:
+//! ```toml
+//! [dependencies]
+//! simdutf8 = { version = "0.1.4", features = ["aarch64_neon"] }
+//! ```
+//!
+//! Use [`basic::from_utf8()`] as a drop-in replacement for `std::str::from_utf8()`.
+//!
+//! ```rust
+//! use simdutf8::basic::from_utf8;
+//!
+//! println!("{}", from_utf8(b"I \xE2\x9D\xA4\xEF\xB8\x8F UTF-8!").unwrap());
+//! ```
+//!
+//! If you need detailed information on validation failures, use [`compat::from_utf8()`]
+//! instead.
+//!
+//! ```rust
+//! use simdutf8::compat::from_utf8;
+//!
+//! let err = from_utf8(b"I \xE2\x9D\xA4\xEF\xB8 UTF-8!").unwrap_err();
+//! assert_eq!(err.valid_up_to(), 5);
+//! assert_eq!(err.error_len(), Some(2));
+//! ```
+//!
+//! ## APIs
+//!
+//! ### Basic flavor
+//! Use the `basic` API flavor for maximum speed. It is fastest on valid UTF-8, but only checks
+//! for errors after processing the whole byte sequence and does not provide detailed information if the data
+//! is not valid UTF-8. [`basic::Utf8Error`] is a zero-sized error struct.
+//!
+//! ### Compat flavor
+//! The `compat` flavor is fully API-compatible with `std::str::from_utf8()`. In particular, [`compat::from_utf8()`]
+//! returns a [`compat::Utf8Error`], which has [`valid_up_to()`](compat::Utf8Error#method.valid_up_to) and
+//! [`error_len()`](compat::Utf8Error#method.error_len) methods. The first is useful for verification of streamed data. The
+//! second is useful e.g. for replacing invalid byte sequences with a replacement character.
+//!
+//! It also fails early: errors are checked on the fly as the string is processed and once
+//! an invalid UTF-8 sequence is encountered, it returns without processing the rest of the data.
+//! This comes at a slight performance penalty compared to the [`basic`] API even if the input is valid UTF-8.
+//!
+//! ## Implementation selection
+//!
+//! ### X86
+//! The fastest implementation is selected at runtime using the `std::is_x86_feature_detected!` macro, unless the CPU
+//! targeted by the compiler supports the fastest available implementation.
+//! So if you compile with `RUSTFLAGS="-C target-cpu=native"` on a recent x86-64 machine, the AVX 2 implementation is selected at
+//! compile-time and runtime selection is disabled.
+//!
+//! For no-std support (compiled with `--no-default-features`) the implementation is always selected at compile time based on
+//! the targeted CPU. Use `RUSTFLAGS="-C target-feature=+avx2"` for the AVX 2 implementation or `RUSTFLAGS="-C target-feature=+sse4.2"`
+//! for the SSE 4.2 implementation.
+//!
+//! ### ARM64
+//! To get the SIMD implementation with Rust 1.59 on ARM64 the crate feature `aarch64_neon` needs to be enabled. For Rust Nightly
+//! this is no longer required (but does not hurt either). Once [Rust PR #90621](https://github.com/rust-lang/rust/pull/90621)
+//! lands in a stable version, this is no longer required.
+//!
+//! CAVE: If this features is not turned on with Rust 1.59 the non-SIMD std library implementation is used.
+//!
+//! ### WASM32
+//! For wasm32 support, the implementation is selected at compile time based on the presence of the `simd128` target feature.
+//! Use `RUSTFLAGS="-C target-feature=+simd128"` to enable the WASM SIMD implementation.  WASM, at
+//! the time of this writing, doesn't have a way to detect SIMD through WASM itself.  Although this capability
+//! is available in various WASM host environments (e.g., [wasm-feature-detect] in the web browser), there is no portable
+//! way from within the library to detect this.
+//!
+//! [wasm-feature-detect]: https://github.com/GoogleChromeLabs/wasm-feature-detect
+//!
+//! ### Access to low-level functionality
+//! If you want to be able to call a SIMD implementation directly, use the `public_imp` feature flag. The validation
+//! implementations are then accessible via [`basic::imp`] and [`compat::imp`]. Traits facilitating streaming validation are available
+//! there as well.
+//!
+//! ## Optimisation flags
+//! Do not use [`opt-level = "z"`](https://doc.rust-lang.org/cargo/reference/profiles.html), which prevents inlining and makes
+//! the code quite slow.
+//!
+//! ## Minimum Supported Rust Version (MSRV)
+//! This crate's minimum supported Rust version is 1.38.0.
+//!
+//! ## Algorithm
+//!
+//! See Validating UTF-8 In Less Than One Instruction Per Byte, Software: Practice and Experience 51 (5), 2021
+//! <https://arxiv.org/abs/2010.03090>
+
+pub mod basic;
+pub mod compat;
+mod implementation;
diff --git a/tests/tests.rs b/tests/tests.rs
new file mode 100644
index 0000000..ae3ad65
--- /dev/null
+++ b/tests/tests.rs
@@ -0,0 +1,491 @@
+#![allow(clippy::non_ascii_literal)]
+
+use simdutf8::basic::from_utf8 as basic_from_utf8;
+use simdutf8::basic::from_utf8_mut as basic_from_utf8_mut;
+use simdutf8::compat::from_utf8 as compat_from_utf8;
+use simdutf8::compat::from_utf8_mut as compat_from_utf8_mut;
+
+#[cfg(not(features = "std"))]
+extern crate std;
+
+#[cfg(not(features = "std"))]
+use std::{borrow::ToOwned, format};
+
+pub trait BStrExt {
+    fn repeat_x(&self, count: usize) -> Vec<u8>;
+}
+
+/// b"a".repeat() is not implemented for Rust 1.38.0 (MSRV)
+impl<T> BStrExt for T
+where
+    T: AsRef<[u8]>,
+{
+    fn repeat_x(&self, count: usize) -> Vec<u8> {
+        use std::io::Write;
+
+        let x = self.as_ref();
+        let mut res = Vec::with_capacity(x.len() * count);
+        for _ in 0..count {
+            #[allow(clippy::unwrap_used)]
+            res.write_all(x).unwrap();
+        }
+        res
+    }
+}
+
+fn test_valid(input: &[u8]) {
+    // std lib sanity check
+    assert!(std::str::from_utf8(input).is_ok());
+
+    assert!(basic_from_utf8(input).is_ok());
+    assert!(compat_from_utf8(input).is_ok());
+
+    let mut mut_input = input.to_owned();
+    assert!(basic_from_utf8_mut(mut_input.as_mut_slice()).is_ok());
+    assert!(compat_from_utf8_mut(mut_input.as_mut_slice()).is_ok());
+
+    #[cfg(feature = "public_imp")]
+    test_valid_public_imp(input);
+}
+
+// unused for cases where public_imp is set but no SIMD functions generated...
+#[cfg(feature = "public_imp")]
+#[allow(dead_code)]
+fn test_streaming<T: simdutf8::basic::imp::Utf8Validator>(input: &[u8], ok: bool) {
+    unsafe {
+        let mut validator = T::new();
+        validator.update(input);
+        assert_eq!(validator.finalize().is_ok(), ok);
+    }
+    for i in [64, 128, 256, 1024, 65536, 1, 2, 3, 36, 99].iter() {
+        test_streaming_blocks::<T>(input, *i, ok)
+    }
+}
+
+// unused for cases where public_imp is set but no SIMD functions generated...
+#[cfg(feature = "public_imp")]
+#[allow(dead_code)]
+fn test_streaming_blocks<T: simdutf8::basic::imp::Utf8Validator>(
+    input: &[u8],
+    block_size: usize,
+    ok: bool,
+) {
+    unsafe {
+        let mut validator = T::new();
+        for chunk in input.chunks(block_size) {
+            validator.update(chunk);
+        }
+        assert_eq!(validator.finalize().is_ok(), ok);
+    }
+}
+
+// unused for cases where public_imp is set but no SIMD functions generated...
+#[cfg(feature = "public_imp")]
+#[allow(dead_code)]
+fn test_chunked_streaming<T: simdutf8::basic::imp::ChunkedUtf8Validator>(input: &[u8], ok: bool) {
+    for i in [64, 128, 256, 1024, 65536].iter() {
+        test_chunked_streaming_with_chunk_size::<T>(input, *i, ok)
+    }
+}
+
+// unused for cases where public_imp is set but no SIMD functions generated...
+#[cfg(feature = "public_imp")]
+#[allow(dead_code)]
+fn test_chunked_streaming_with_chunk_size<T: simdutf8::basic::imp::ChunkedUtf8Validator>(
+    input: &[u8],
+    chunk_size: usize,
+    ok: bool,
+) {
+    unsafe {
+        let mut validator = T::new();
+        let mut chunks = input.chunks_exact(chunk_size);
+        for chunk in &mut chunks {
+            validator.update_from_chunks(chunk);
+        }
+        assert_eq!(validator.finalize(Some(chunks.remainder())).is_ok(), ok);
+    }
+}
+
+#[cfg(feature = "public_imp")]
+#[allow(clippy::missing_const_for_fn)]
+#[allow(unused_variables)]
+fn test_valid_public_imp(input: &[u8]) {
+    if cfg!(any(target_arch = "x86", target_arch = "x86_64")) {
+        #[cfg(target_feature = "avx2")]
+        unsafe {
+            assert!(simdutf8::basic::imp::x86::avx2::validate_utf8(input).is_ok());
+            assert!(simdutf8::compat::imp::x86::avx2::validate_utf8(input).is_ok());
+
+            test_streaming::<simdutf8::basic::imp::x86::avx2::Utf8ValidatorImp>(input, true);
+            test_chunked_streaming::<simdutf8::basic::imp::x86::avx2::ChunkedUtf8ValidatorImp>(
+                input, true,
+            );
+        }
+
+        #[cfg(target_feature = "sse4.2")]
+        unsafe {
+            assert!(simdutf8::basic::imp::x86::sse42::validate_utf8(input).is_ok());
+            assert!(simdutf8::compat::imp::x86::sse42::validate_utf8(input).is_ok());
+
+            test_streaming::<simdutf8::basic::imp::x86::sse42::Utf8ValidatorImp>(input, true);
+            test_chunked_streaming::<simdutf8::basic::imp::x86::sse42::ChunkedUtf8ValidatorImp>(
+                input, true,
+            );
+        }
+    }
+    #[cfg(all(
+        feature = "aarch64_neon",
+        target_arch = "aarch64",
+        target_feature = "neon"
+    ))]
+    unsafe {
+        assert!(simdutf8::basic::imp::aarch64::neon::validate_utf8(input).is_ok());
+        assert!(simdutf8::compat::imp::aarch64::neon::validate_utf8(input).is_ok());
+
+        test_streaming::<simdutf8::basic::imp::aarch64::neon::Utf8ValidatorImp>(input, true);
+        test_chunked_streaming::<simdutf8::basic::imp::aarch64::neon::ChunkedUtf8ValidatorImp>(
+            input, true,
+        );
+    }
+    #[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
+    unsafe {
+        assert!(simdutf8::basic::imp::wasm32::simd128::validate_utf8(input).is_ok());
+        assert!(simdutf8::compat::imp::wasm32::simd128::validate_utf8(input).is_ok());
+
+        test_streaming::<simdutf8::basic::imp::wasm32::simd128::Utf8ValidatorImp>(input, true);
+        test_chunked_streaming::<simdutf8::basic::imp::wasm32::simd128::ChunkedUtf8ValidatorImp>(
+            input, true,
+        );
+    }
+}
+
+fn test_invalid(input: &[u8], valid_up_to: usize, error_len: Option<usize>) {
+    // std lib sanity check
+    let err = std::str::from_utf8(input).unwrap_err();
+    assert_eq!(err.valid_up_to(), valid_up_to);
+    assert_eq!(err.error_len(), error_len);
+
+    assert!(basic_from_utf8(input).is_err());
+    let err = compat_from_utf8(input).unwrap_err();
+    assert_eq!(err.valid_up_to(), valid_up_to);
+    assert_eq!(err.error_len(), error_len);
+
+    #[cfg(feature = "public_imp")]
+    test_invalid_public_imp(input, valid_up_to, error_len);
+}
+
+#[cfg(feature = "public_imp")]
+#[allow(clippy::missing_const_for_fn)]
+#[allow(unused_variables)]
+fn test_invalid_public_imp(input: &[u8], valid_up_to: usize, error_len: Option<usize>) {
+    if cfg!(any(target_arch = "x86", target_arch = "x86_64")) {
+        #[cfg(target_feature = "avx2")]
+        unsafe {
+            assert!(simdutf8::basic::imp::x86::avx2::validate_utf8(input).is_err());
+            let err = simdutf8::compat::imp::x86::avx2::validate_utf8(input).unwrap_err();
+            assert_eq!(err.valid_up_to(), valid_up_to);
+            assert_eq!(err.error_len(), error_len);
+
+            test_streaming::<simdutf8::basic::imp::x86::avx2::Utf8ValidatorImp>(input, false);
+            test_chunked_streaming::<simdutf8::basic::imp::x86::avx2::ChunkedUtf8ValidatorImp>(
+                input, false,
+            );
+        }
+        #[cfg(target_feature = "sse4.2")]
+        unsafe {
+            assert!(simdutf8::basic::imp::x86::sse42::validate_utf8(input).is_err());
+            let err = simdutf8::compat::imp::x86::sse42::validate_utf8(input).unwrap_err();
+            assert_eq!(err.valid_up_to(), valid_up_to);
+            assert_eq!(err.error_len(), error_len);
+
+            test_streaming::<simdutf8::basic::imp::x86::sse42::Utf8ValidatorImp>(input, false);
+            test_chunked_streaming::<simdutf8::basic::imp::x86::sse42::ChunkedUtf8ValidatorImp>(
+                input, false,
+            );
+        }
+    }
+    #[cfg(all(
+        feature = "aarch64_neon",
+        target_arch = "aarch64",
+        target_feature = "neon"
+    ))]
+    unsafe {
+        assert!(simdutf8::basic::imp::aarch64::neon::validate_utf8(input).is_err());
+        let err = simdutf8::compat::imp::aarch64::neon::validate_utf8(input).unwrap_err();
+        assert_eq!(err.valid_up_to(), valid_up_to);
+        assert_eq!(err.error_len(), error_len);
+
+        test_streaming::<simdutf8::basic::imp::aarch64::neon::Utf8ValidatorImp>(input, false);
+        test_chunked_streaming::<simdutf8::basic::imp::aarch64::neon::ChunkedUtf8ValidatorImp>(
+            input, false,
+        );
+    }
+    #[cfg(all(target_arch = "wasm32", target_feature = "simd128"))]
+    unsafe {
+        assert!(simdutf8::basic::imp::wasm32::simd128::validate_utf8(input).is_err());
+        let err = simdutf8::compat::imp::wasm32::simd128::validate_utf8(input).unwrap_err();
+        assert_eq!(err.valid_up_to(), valid_up_to);
+        assert_eq!(err.error_len(), error_len);
+
+        test_streaming::<simdutf8::basic::imp::wasm32::simd128::Utf8ValidatorImp>(input, false);
+        test_chunked_streaming::<simdutf8::basic::imp::wasm32::simd128::ChunkedUtf8ValidatorImp>(
+            input, false,
+        );
+    }
+}
+
+fn test_invalid_after_specific_prefix(
+    input: &[u8],
+    valid_up_to: usize,
+    error_len: Option<usize>,
+    with_suffix_error_len: Option<usize>,
+    repeat: usize,
+    prefix_bytes: &[u8],
+) {
+    {
+        let mut prefixed_input = prefix_bytes.repeat_x(repeat);
+        let prefix_len = prefixed_input.len();
+        prefixed_input.extend_from_slice(input);
+        test_invalid(prefixed_input.as_ref(), valid_up_to + prefix_len, error_len)
+    }
+
+    if repeat != 0 {
+        let mut prefixed_input = prefix_bytes.repeat_x(repeat);
+        let prefix_len = prefixed_input.len();
+        prefixed_input.extend_from_slice(input);
+        prefixed_input.extend_from_slice(prefix_bytes.repeat_x(repeat).as_slice());
+        test_invalid(
+            prefixed_input.as_ref(),
+            valid_up_to + prefix_len,
+            with_suffix_error_len,
+        )
+    }
+}
+
+fn test_invalid_after_prefix(
+    input: &[u8],
+    valid_up_to: usize,
+    error_len: Option<usize>,
+    with_suffix_error_len: Option<usize>,
+    repeat: usize,
+) {
+    for prefix in [
+        "a",
+        "ö",
+        "😊",
+        "a".repeat(64).as_str(),
+        ("a".repeat(64) + "ö".repeat(32).as_str()).as_str(),
+    ]
+    .iter()
+    {
+        test_invalid_after_specific_prefix(
+            input,
+            valid_up_to,
+            error_len,
+            with_suffix_error_len,
+            repeat,
+            prefix.as_bytes(),
+        );
+    }
+}
+
+fn test_invalid_after_prefixes(
+    input: &[u8],
+    valid_up_to: usize,
+    error_len: Option<usize>,
+    with_suffix_error_len: Option<usize>,
+) {
+    for repeat in [
+        0, 1, 2, 7, 8, 9, 15, 16, 16, 31, 32, 33, 63, 64, 65, 127, 128, 129,
+    ]
+    .iter()
+    {
+        test_invalid_after_prefix(
+            input,
+            valid_up_to,
+            error_len,
+            with_suffix_error_len,
+            *repeat,
+        );
+    }
+}
+
+#[test]
+fn simple_valid() {
+    test_valid(b"");
+
+    test_valid(b"\0");
+
+    test_valid(b"a".repeat_x(64).as_ref());
+
+    test_valid(b"a".repeat_x(128).as_ref());
+
+    test_valid(b"The quick brown fox jumps over the lazy dog");
+
+    // umlauts
+    test_valid("öäüÖÄÜß".as_bytes());
+
+    // emojis
+    test_valid("❤️✨🥺🔥😂😊✔️👍🥰".as_bytes());
+
+    // Chinese
+    test_valid("断用山昨屈内銀代意検瓶調像。情旗最投任留財夜隆年表高学送意功者。辺図掲記込真通第民国聞平。海帰傷芸記築世防橋整済歳権君注。選紙例並情夕破勢景移情誇進場豊読。景関有権米武野範随惑旬特覧刊野。相毎加共情面教地作減関絡。暖料児違歩致本感閉浦出楽赤何。時選権週邑針格事提一案質名投百定。止感右聞食三年外積文載者別。".as_bytes());
+
+    // Japanese
+    test_valid("意ざど禁23費サヒ車園オスミト規更ワエ異67事続トソキ音合岡治こ訪京ぴ日9稿がト明安イ抗的ウクロコ売一エコヨホ必噴塗ッ。索墓ー足議需レ応予ニ質県トぴン学市機だほせフ車捕コニ自校がこで極3力イい増娘汁表製ク。委セヤホネ作誌ミマクソ続新ほし月中報制どてびフ字78完りっせが村惹ヨサコ訳器りそ参受草ムタ大移ッけでつ番足ほこン質北ぽのよう応一ア輝労イ手人う再茨夕へしう。".as_bytes());
+
+    // Korean
+    test_valid("3인은 대법원장이 지명하는 자를 임명한다, 대통령은 제3항과 제4항의 사유를 지체없이 공포하여야 한다, 제한하는 경우에도 자유와 권리의 본질적인 내용을 침해할 수 없다, 국가는 전통문화의 계승·발전과 민족문화의 창달에 노력하여야 한다.".as_bytes());
+}
+
+#[test]
+fn simple_invalid() {
+    test_invalid_after_prefixes(b"\xFF", 0, Some(1), Some(1));
+
+    // incomplete umlaut
+    test_invalid_after_prefixes(b"\xC3", 0, None, Some(1));
+
+    // incomplete emoji
+    test_invalid_after_prefixes(b"\xF0", 0, None, Some(1));
+    test_invalid_after_prefixes(b"\xF0\x9F", 0, None, Some(2));
+    test_invalid_after_prefixes(b"\xF0\x9F\x98", 0, None, Some(3));
+}
+
+#[test]
+fn incomplete_on_32nd_byte() {
+    let mut invalid = b"a".repeat_x(31);
+    invalid.push(b'\xF0');
+    test_invalid(&invalid, 31, None)
+}
+
+#[test]
+fn incomplete_on_64th_byte() {
+    let mut invalid = b"a".repeat_x(63);
+    invalid.push(b'\xF0');
+    test_invalid(&invalid, 63, None)
+}
+
+#[test]
+fn incomplete_on_64th_byte_65_bytes_total() {
+    let mut invalid = b"a".repeat_x(63);
+    invalid.push(b'\xF0');
+    invalid.push(b'a');
+    test_invalid(&invalid, 63, Some(1))
+}
+
+#[test]
+fn error_display_basic() {
+    assert_eq!(
+        format!("{}", basic_from_utf8(b"\xF0").unwrap_err()),
+        "invalid utf-8 sequence"
+    );
+    assert_eq!(
+        format!("{}", basic_from_utf8(b"a\xF0a").unwrap_err()),
+        "invalid utf-8 sequence"
+    );
+}
+
+#[test]
+fn error_display_compat() {
+    assert_eq!(
+        format!("{}", compat_from_utf8(b"\xF0").unwrap_err()),
+        "incomplete utf-8 byte sequence from index 0"
+    );
+    assert_eq!(
+        format!("{}", compat_from_utf8(b"a\xF0a").unwrap_err()),
+        "invalid utf-8 sequence of 1 bytes from index 1"
+    );
+    assert_eq!(
+        format!("{}", compat_from_utf8(b"a\xF0\x9Fa").unwrap_err()),
+        "invalid utf-8 sequence of 2 bytes from index 1"
+    );
+    assert_eq!(
+        format!("{}", compat_from_utf8(b"a\xF0\x9F\x98a").unwrap_err()),
+        "invalid utf-8 sequence of 3 bytes from index 1"
+    );
+}
+
+#[test]
+fn error_debug_basic() {
+    assert_eq!(
+        format!("{:?}", basic_from_utf8(b"\xF0").unwrap_err()),
+        "Utf8Error"
+    );
+}
+
+#[test]
+fn error_debug_compat() {
+    assert_eq!(
+        format!("{:?}", compat_from_utf8(b"\xF0").unwrap_err()),
+        "Utf8Error { valid_up_to: 0, error_len: None }"
+    );
+    assert_eq!(
+        format!("{:?}", compat_from_utf8(b"a\xF0a").unwrap_err()),
+        "Utf8Error { valid_up_to: 1, error_len: Some(1) }"
+    );
+}
+
+#[test]
+fn error_derives_basic() {
+    let err = basic_from_utf8(b"\xF0").unwrap_err();
+    #[allow(clippy::clone_on_copy)] // used for coverage
+    let err2 = err.clone();
+    assert_eq!(err, err2);
+    assert!(!(err != err2));
+}
+
+#[test]
+fn error_derives_compat() {
+    let err = compat_from_utf8(b"\xF0").unwrap_err();
+    #[allow(clippy::clone_on_copy)] // used for coverage
+    let err2 = err.clone();
+    assert_eq!(err, err2);
+    assert!(!(err != err2));
+}
+
+#[test]
+#[should_panic]
+#[cfg(all(feature = "public_imp", target_feature = "avx2"))]
+fn test_avx2_chunked_panic() {
+    test_chunked_streaming_with_chunk_size::<
+        simdutf8::basic::imp::x86::avx2::ChunkedUtf8ValidatorImp,
+    >(b"abcd", 1, true);
+}
+
+#[test]
+#[should_panic]
+#[cfg(all(feature = "public_imp", target_feature = "sse4.2"))]
+fn test_sse42_chunked_panic() {
+    test_chunked_streaming_with_chunk_size::<
+        simdutf8::basic::imp::x86::sse42::ChunkedUtf8ValidatorImp,
+    >(b"abcd", 1, true);
+}
+
+#[test]
+#[should_panic]
+#[cfg(all(
+    feature = "public_imp",
+    target_arch = "aarch64",
+    feature = "aarch64_neon"
+))]
+fn test_neon_chunked_panic() {
+    test_chunked_streaming_with_chunk_size::<
+        simdutf8::basic::imp::aarch64::neon::ChunkedUtf8ValidatorImp,
+    >(b"abcd", 1, true);
+}
+
+// the test runner will ignore this test probably due to limitations of panic handling/threading
+// of that target--keeping this here so that when it can be tested properly, it will
+// FIXME: remove this comment once this works properly.
+#[test]
+#[should_panic]
+#[cfg(all(
+    feature = "public_imp",
+    target_arch = "wasm32",
+    target_feature = "simd128"
+))]
+fn test_simd128_chunked_panic() {
+    test_chunked_streaming_with_chunk_size::<
+        simdutf8::basic::imp::wasm32::simd128::ChunkedUtf8ValidatorImp,
+    >(b"abcd", 1, true);
+}
diff --git a/wasm32-development.md b/wasm32-development.md
new file mode 100644
index 0000000..9d56281
--- /dev/null
+++ b/wasm32-development.md
@@ -0,0 +1,46 @@
+# Developing/Testing the `wasm32` Target
+
+Since there is no native host platform for WebAssembly, developing/targeting requires a bit more setup than a vanilla
+Rust toolchain environment.  To build/target this library outside a `wasm-pack` context, you can do the following:
+
+* Install toolchain with `wasm32-wasi` or `wasm32-unknown-unknown` (e.g. `rustup target add wasm32-wasi`).
+  * `wasm32-wasi` is a nice target because it gives us the capability to run the tests as-is with a WASM runtime.
+* Install a WASM runtime (e.g. [Wasmer]/[Wasmtime]/[WAVM]).
+* Install `wasm-runner` a simple runner wrapper to run WASM targeted code with a WASM runtime:
+
+```
+$ cargo install wasm-runner
+```
+
+* Add a Cargo configuration file to target `wasm` and allow the unit tests to be run with a WASM VM *by default*:
+
+```
+[build]
+target = "wasm32-wasi"
+rustflags = "-C target-feature=+simd128"
+
+[target.'cfg(target_arch="wasm32")']
+runner = ["wasm-runner", "wasmer"]
+```
+
+* Run the build/tests:
+
+```
+$ cargo test
+$ cargo test --all-features
+```
+
+You can do this without configuration as well:
+
+```
+$ RUSTFLAGS="-C target-feature=+simd128" \
+    CARGO_TARGET_WASM32_WASI_RUNNER="wasm-runner wasmer" \
+    cargo test --target wasm32-wasi
+$ RUSTFLAGS="-C target-feature=+simd128" \
+    CARGO_TARGET_WASM32_WASI_RUNNER="wasm-runner wasmer" \
+    cargo test --target wasm32-wasi --all-features
+```
+
+[wasmer]: https://wasmer.io/
+[wasmtime]: https://wasmtime.dev/
+[wavm]: https://wavm.github.io/
+\ No newline at end of file
author	DongHun Kwak <dh0128.kwak@samsung.com>	2023-05-17 16:16:16 +0900
committer	DongHun Kwak <dh0128.kwak@samsung.com>	2023-05-17 16:16:16 +0900
commit	a56aafc610b5a5112b48320f9532ff3da969eb2f (patch)
tree	7e6cc4da83f6bfd1c56a6f565064a8cd6251bbaa
download	rust-simdutf8-upstream/0.1.4.tar.gz rust-simdutf8-upstream/0.1.4.tar.bz2 rust-simdutf8-upstream/0.1.4.zip