Skip to content

Detect encoding from ReadOnlySpan<byte>#204

Open
harnel-tngn wants to merge 6 commits into
CharsetDetector:masterfrom
harnel-tngn:detect-from-readonlyspan
Open

Detect encoding from ReadOnlySpan<byte>#204
harnel-tngn wants to merge 6 commits into
CharsetDetector:masterfrom
harnel-tngn:detect-from-readonlyspan

Conversation

@harnel-tngn

Copy link
Copy Markdown

Add an overload that receives ReadOnlySpan<byte> instead of byte[], so callers can detect the encoding of a Span<T> or ReadOnlySpan<T> without copying to a byte[]:

public class CharsetDetector
{
    public static DetectionResult DetectFromBytes(ReadOnlySpan<byte> bytes);
}

The existing byte[] overloads forward to it. The other methods invoked from DetectFromBytes now take ReadOnlySpan<byte> and use slicing instead of offset/len.

This also affects some related methods, such as CharsetDetector.Feed, CharsetProber.HandleData.

Most of the changes are just signature updates and slicing instead of passing an offset to methods.

As an implementation note, since .NET Standard 2.0 does not have a MemoryStream.Write(ReadOnlySpan<byte>) method, the data is copied into an array buffer and then written to the stream. This may reduce performance slightly, but I think it is the best approach without using unsafe blocks or reflections.

Also this may break some codes outside of UTF-unknown that overload CharsetDetector.Feed or derived class of CharsetProber, but I believe that migrating to new signature should not be that hard.

@harnel-tngn harnel-tngn changed the title Detect from readonlyspan Detect encoding from ReadOnlySpan<byte> Jun 29, 2026
@304NotModified

304NotModified commented Jun 29, 2026

Copy link
Copy Markdown
Member

Thanks for the PR!

As an implementation note, since .NET Standard 2.0 does not have a MemoryStream.Write(ReadOnlySpan<byte>) method

This is supported in .NET 8? So we could use #IF NET8_0_OR_GREATER. We could also target .NET Standard 2.1 (not instead of .NET Standard 2.0)

Note, I will remove .NET 6 support first (#205) - update, done

@304NotModified 304NotModified added this to the 2.7 milestone Jun 29, 2026
@304NotModified

Copy link
Copy Markdown
Member

Close/reopen for new merge commit

@harnel-tngn

Copy link
Copy Markdown
Author

MemoryStream.Write(ReadOnlySpan<byte>) is supported from .NET Core 2.1. Here is a link to the MSDN document.

CharsetProber.WriteSpanToStream already uses MemoryStream.Write when the target framework is .NET Standard 2.1 / .NET Core 2.1 or newer. If we bump the target framework to .NET Standard 2.1, we can just remove the CharsetProber.WriteSpanToStream method and call MemoryStream.Write directly.

    private static void WriteSpanToStream(MemoryStream stream, ReadOnlySpan<byte> buffer)
    {
#if NETSTANDARD2_1_OR_GREATER || NETCOREAPP2_1_OR_GREATER
        stream.Write(buffer);
#else
        byte[] rent = ArrayPool<byte>.Shared.Rent(buffer.Length);

        try
        {
            buffer.CopyTo(rent);

            stream.Write(rent, 0, buffer.Length);
        }
        finally
        {
            ArrayPool<byte>.Shared.Return(rent);
        }
#endif
    }

I also updated System.Memory package to resolve a version conflict between System.Memory from Microsoft.SourceLink.GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants