MARC Export missing books?

DiscussãoBug Collectors

Entre no LibraryThing para poder publicar.

MARC Export missing books?

Jul 2, 2022, 3:49 pm

I just attempted to exported a fuzzy/integrated MARC file for my 2142 books (all collections). It was smaller than the file I had from a few months ago, so after exploring with MarcEdit, that program only reported 1845 books in the file.

I did a quick compare with a CSV export, and the differences are pretty random. The MARC file skips book 10 (The Odyssey) then does fine for another twenty-five or so titles before completely leaving out six Harry Potter books cataloged together.

Anyone else seeing problems?

Editado: Jul 3, 2022, 12:57 pm

Yes, I'm also seeing it. A basic MARC export gives 3783 records. And I'd expect 8538. The other option to flesh out the MARC export a bit gives 3767

So something seems to be wrong.

ETA: Both MARC export options say:
Processing done (8,638 records). Click to download.

Jul 4, 2022, 7:50 am

Thanks for verifying. It had been a while since I'd used MARC, so I wasn't sure if my skills were rusty....

Additionally, my "Processing done" line did show the number of records I *should have* had (2142), so you may have found another gremlin lurking about.

Jul 5, 2022, 9:21 am

Okay, this is good information, thanks for the report. I was working on trying to figure out why a 726-record MARC export only had 535 records in the file. Will talk to ccatalfo about this.

Jul 5, 2022, 12:23 pm

Yes, looks as if certain records are not being generated correctly into MARC (with some additional logging I can see 295 records errored out - which would account for just about all of the ones for legallypuzzled).

Next step is to figure out why they are not coming out correctly.

Jul 12, 2022, 7:29 pm

>4 kristilabrie: for reference this was me.

Jul 13, 2022, 8:41 am

>6 IrrationalDM: this fell off the radar for a few days, but ccatalfo is going to try and look at it today!

Jul 13, 2022, 8:56 am

Are these two bugs connected?

Jul 14, 2022, 2:59 pm

>8 gilroy: Hard to say, but thanks for bringing it on here!

Ago 22, 2022, 1:49 pm

I noticed some MARC activity over the last little bit, so I double-checked my export. Unfortunately, now the file only gives me 1842 records, which is three fewer than I originally had last month. :(

That number includes the clean-up process. There's a few spots where the record includes "ERROR: 'NoneType' object has no attribute 'as_marc'" and one record didn't have an "end record" flag on it. (In case any of that helps track the problem down.)

Ago 22, 2022, 3:19 pm

ccatalfo is currently working on some MARC issues which might be affecting this. I'll ping him your note so he's aware!

Nov 4, 2022, 9:33 am

Each month seems to remove a few more records from the MARC export .... I now have 1834 records for my 2143 books (still using fuzzy/integrated, all collections).

If this keeps up, my export will be at zero books in 2060. :( At least my physical books are safe!

Dez 31, 2022, 11:16 am

End of year backup time... and this is still having issues. The downloaded file (fuzzy, integrated, all) now contains HTML snippets that have a title "Bad Request" and a body saying "Bad Request" and "Request Line is too large."

Dez 31, 2022, 8:19 pm

>13 legallypuzzled: I think "Request Line is too large" should be at least nominated for "Weirdest Error Message of the Year, 2022".

Jan 2, 2023, 9:23 am

>13 legallypuzzled: Thanks for the ping on this. I've reminded ccatalfo - I'm also seeing the little error snippets you report, in my own export.

Mar 21, 2023, 3:54 pm

In addition to the "Request Line is too large" messages (now giving some indication of *how* large, all measured over 4094), I am also seeing "Request-URI Too Large" and "ERROR: 'NoneType' object has no attribute 'as_marc'."

As a weird side note, I seem to now have 1817 records instead of the expected 2165, although MarcEdit can only parse 788 of them.

Mar 21, 2023, 4:59 pm

I'm guessing that one of the errors we see is caused by the export trying to cram in large reviews in the marc records. And maybe also including raw newlines, i.e. breaking the structure of the marc record. Anyway I've given up on using the marc export, so I just use it to support whatever bug records other LT'ers come up with :-)

Mar 21, 2023, 8:47 pm

I thought the long-winded reviews might be at fault too, but other portions seem to be working fine. One title (The World's Writing Systems) has a ridiculous number of entries in the table of contents field, longer than any review I've written (I think), and that one came through fine.

A MARC record has a limit of 9999 characters in any given variable field, so I don't know where that 4094 is coming from.

It's been a while since I've looked at the other exports, but that is probably the best option. MARC, though, seemed to have the most content; maybe that's changed in the last little bit.

Mar 22, 2023, 1:28 am

>18 legallypuzzled: Thanks for that extra information. I've tried to keep information about the export formats here:

I think the genre bugs in json export have been fixed, but otherwise I believe it's up to date. I use linux, so the tools described might or might not be available on windows.

Ago 2, 2023, 10:12 am

Bumping this, as I have received a new report via email. The member contacting me has attempted to export his catalog in MARC format, using existing library records, with fuzzy matching / integrated catalog, and finds the exported file full of errors. I verified that this occurred in his export, as well as my own.

Steps to Reproduce:

1. Log in as member micbello
2. Proceed to Import/Export page, and choose the MARC export option:
3. Use existing library records, fuzzy matching, integrated catalog (the default settings) and export all books
4. When export is complete, download
5. The resultant file will have multiple errors, for most of the books in the export

Ago 3, 2023, 9:34 am

I think there are a few problems within this bug report, but they're all happening at the same time :(

1) If any of the text fields have line feeds (or line feeds/carriage returns), those are being passed straight through to the MARC export file. Unfortunately, the MARC record standard doesn't allow for them to appear in a file. {Noted by bnielsen, above, 17.}

2) As the MARC export process does its magic, apparently combining several bits of records from various places, it is timing out, or it is getting back records which contain more data than was expected ("Request Line is too large"; "Request-URI Too Large"). Those error messages will also break the export file.

3) I've since deleted the file that I was working on, but the MARC export file was occasionally failing to add a "new record" leader. That, then, meant that a MARC processor would read the prior and the current record as a single record, and would fail due to the checksum size of the record not matching the data in its leader. It seems like these most often appeared when the export was timing out, but I worked on this almost a year ago so the details are fuzzy. (Since this was a fuzzy export, I don't know if that's a feature or a bug.)

Dez 30, 2023, 2:35 pm

End of year backup time .... and still having problems. I used the "basic" MARC export this time, and, although there was a significant amount of cleanup (partially with line feeds in the reviews, partially with errors), I have 1841 books, which is less than the original bug post, but an increase from the files I've been downloading in the last year.

It looks like the JSON export has at least a significant amount of data (all?) that I wanted to back up, so I guess I'll just switch to that.

Fev 27, 7:06 am

>13 legallypuzzled: Adding a note here, as I was doing some other MARC import/export testing and have not forgotten about this bug. I saw those "Bad Request" and "Request Line is too large." errors after I added lending statuses to a couple of items in a library. Curious if having any lending history or current lending statuses on records is related to this or not.