Skip to content

Parsing error for citations with defendant 'Thompson' #174

Description

@ERosendo

In issue #3924, we identified a bug in Eyecite's parsing method when the defendant's last name is 'Thompson'.

For example, for the citation 'Shapiro v. Thompson, 394 U. S. 618':

  • Expected output: volume: 394, reporter: 'U.S.', page: '618'
  • Actual output: volume: None, reporter: 'Thompson', page: '394'

Other examples of inputs that are incorrectly parsed are: Adams v. Thompson, 560 F. Supp. 894 and Mozena v. Thompson, 44 A.2d 276.

I've been using the first example to debug this issue, and noticed that Eyecite identifies two tokens within the input string: "Thompson's Unreported Cases (TN)" and "United States Supreme Court Reports.". The problem arises because these tokens overlap (both include "394") and Eyecite's tokenize method prioritizes the rightmost token when encountering overlaps, leading to this results.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Fields

No fields configured for issues without a type.

Projects

Status
Done
Status
Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions