dwillis's comments

dwillis · 2025-06-20T16:06:13 1750435573

That's what I did.

dwillis · 2025-06-20T00:00:46 1750377646

Totally reasonable view, and one of our volunteers actually got the law in Kansas changed to mandate electronic publishing of statewide precinct results in a structured format! But finding legislative champions for this issue isn't easy.

ghghgfdfgh · 2025-06-20T07:47:19 1750405639

I’ve tried using LLM’s to do the same exact thing (turning precinct-level election results into a spreadsheet) and in my experience they worked rather poorly. Less accurate than traditional OCR, and considering how many fixes I had to make, altogether slower than manual entry. The resolution of the page made an outsized difference. It’s nice that you got it to work, but I am skeptical of it as a permanent solution.

Tangentially- I appreciate what OpenElections does- however, I wish there was a similar organization that did not limit themselves to officially certified results. There are already other organizations who collect precinct results post-2016, and using only official results basically limits you to 2008 and afterwards, but historical election results are the real intrigue. Not to mention that I have noticed many blatant errors in election results that have supposedly been “certified” by a state/county government. The precinct results Pennsylvania publishes, for example, are riddled with issues.

dwillis · 2025-06-20T12:05:51 1750421151

Skepticism is a necessary trait in this type of work, for sure. I will say that the performance has improved substantially in the past year, and there are still PDFs that require a lot of work.

We went with official precinct results for two main reasons: there are differences between election night and final results (some of them non-trivial) and to make the work more manageable. Agree that historical results are a real problem, and as a PA native I know only too well the errors that the state data contains, which is why we go county-by-county there.

dwillis · 2025-06-19T22:10:48 1750371048

In some cases that's true, but for many jurisdictions the results systems are third-party vendor platforms, too.

dwillis · 2025-06-19T20:28:27 1750364907

Many jurisdictions do risk-limiting audits using the original ballots, so futzing with the results wouldn't necessarily make that easier. Also, cast vote records are public in many states - those are records of each ballot cast. So people can check.

philips · 2025-06-19T20:30:34 1750365034

I think you mean risk limiting, right?

dwillis · 2025-06-19T22:11:36 1750371096

Yes, thanks! Fixed.

bilbo0s · 2025-06-19T21:29:11 1750368551

Freudian Slip?

dwillis · 2025-06-19T19:58:31 1750363111

Yeah, this is a very well-traveled road, but LLMs have made some big improvements. If you asked me (the guy who wrote the original piece linked above) what I'd use if accuracy alone was the goal, probably would be AWS Textract. But accuracy and structure? Gemini.

dwillis · on June 27, 2015

For NYT story on name changes by women at marriage, we did an analysis of wedding announcements & released the results: https://github.com/TheUpshot/nyt_weddings.

dwillis · on Oct 18, 2011

As the author of the original piece, it's hard to disagree.

dwillis · on Aug 31, 2011

I've used Navicat Premium with Postgres and have had no issues. Pricey, but it works.