Commons talk:Maximum file size

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

"Summer 2013"[edit]

Does "summer 2013" (the date when the PNG thumbnail maximum size was increased) mean Northern Hemisphere summer (June to September), or Southern Hemisphere summer (December 2012 to March 2013)? darkweasel94 21:44, 5 September 2013 (UTC)[reply]

Since Nemo_bis (diff) seems to be/ was in some relation to Wikimedia Italia, I would guess it's referring to the Northern Hemisphere summer (this is also what I do remember). -- Rillke(q?) 22:18, 5 September 2013 (UTC)[reply]

Long filenames and thumbnails[edit]

Unlike written in the last break, thumbnails seems to work now also with long filenames. The linked example works now fine: [1]
Could this break now being removed, or is there still a problem?
Best regards, --#Reaper (talk) 14:31, 31 January 2014 (UTC)[reply]

> 100 MB[edit]

Hi @Inductiveload, you added a section for files > 100 MB. I am not sure that this is correct. I have uploaed in the last years and months a lot of files with the UploadWizard > 100 MB without any problems, i.e File:Statement Steinmeier.webm and most of the files in Category:LVR-Institut für Landeskunde und Regionalgeschichte. The UploadWizard support uploads until ~ 4 GB. Furthermore the link to Discord is not open for everyone. A registration is needed. Raymond 19:51, 15 April 2021 (UTC)[reply]

I find it interesting that the largest SVG-files are all below 100MB, see Category:Large_SVG_files. The Discord link is not in an open channel (I have a discord account and got that I have no access or this channel does not exist.) So a invention is needed (registration is not enough). @Inductiveload: Maybe you can add/invite me Rekursiv#0559?  — Johannes Kalliauer - Talk | Contributions 21:08, 15 April 2021 (UTC)[reply]
@JoKalliauer: @Raymond: I was also surprised to be told that it's not expected that 100MB files are fully supported, as I have uploaded very many in the past, but they've suddenly stopped working recently (several weeks ago). I could find no reference to it on Wiki, but the relevant discussion, to which I was referred, is phab:F34619646. I was told to, quote, "be bold" in mentioning it in the help pages. I don't know how Discord works with the links, this channel is just the #commons channel from w:Wikipedia:Discord#Site_channels, it's not any kind of private thing.
I would welcome a more formal statement of how reliable large uploads are expected to be, of course. I did ask for feedback on the wording in the Discord at the time and there was no reply. Inductiveload (talk) 22:12, 15 April 2021 (UTC)[reply]
@Inductiveload, @JoKalliauer: I uploaded today 2 files > 100 MB without any problem using the UploadWizard: File:Kirmes in Nettersheim. Teil 2 Kirmessonntag und -montag. Das Fest in Familie und Dorf.webm (259 MB) and Vogelschießen. Bei der St. Sebastiani Armbrustschützen-Gesellschaft Herzogenrath anno 1250.webm (170 MB). I will revert the new section from yesterday but I am open for a section what to do when upload fails. To be fair: The UploadWizard i sometimes wonky, maybe related to the upload speed? Raymond 11:20, 16 April 2021 (UTC)[reply]
The 4GB is a bit of a myth. In practice PDFs over 1GB are likely to be undisplayable for unclear reasons apart from obscure server settings, see example. At the same time, PDFs as small as 15MB can be unuploadable, even though there is nothing specifically wrong with them as they can be viewed elsewhere, example.
At best, maximum file size is mostly a guide, YMMV. -- (talk) 12:02, 16 April 2021 (UTC)[reply]
(FYI, I also thought the addition was weird, and asked about it at User talk:Inductiveload#"Files over 100MB are not formally supported".) Matma Rex (talk) 14:02, 16 April 2021 (UTC)[reply]
@AntiCompositeNumber: I'm linking in your summary here because it's useful for future reference on this page.
The significantly increased failure rate for these uploads is also something that's happened over the last few months (3 months? something like that), so either something changed that directly made uploads way way more finicky, or something else happened that exacerbated all the existing failure modes (is the imginfo table getting close to a limit due to the massive amount of OCR text? is the jobqueue struggling and these uploads are pushing it over a timeout or other limit? is this an API thing? ("Help us Obi-Wan Anomie. You're our only hope!" 😀)). From my subjective experience, using BigChunkedUpload for ~multiple daily uploads of anything up to about a GB worked fine and was fast for at least a year (I think two, but I'd need to check); but over the last few months uploads have been noticeably slower and anything in the neighbourhood of 100MB almost certainly fails. Some times they can be manually published from the stash, but often (mostly?) not. In other words, from my personal experience large(ish) uploads are currently almost entirely broken.
There's also: T282173, T280926, T278389, T278104, T255981, (and possibly there is some sort of relation to T89971 and its causes and subtasks). --Xover (talk) 11:42, 11 May 2021 (UTC)[reply]
@Xover: If I'm your only hope, you may be in trouble. WMF seems to have decided that office politics were more important than anything else. Anomie (talk) 13:20, 11 May 2021 (UTC)[reply]
I could write many many words on this topic. Out of the kindness of my heart I will refrain. You're welcome. Xover (talk) 13:37, 11 May 2021 (UTC)[reply]
Hurray, office politics, the final tar pit that all old fat organizations end up in to make way for tiny furry competitors. -- (talk) 14:43, 11 May 2021 (UTC)[reply]

Warning users[edit]

Per the comments by Inductiveload posted here. Namely, "This has been a constant issue for batch-uploads of texts for Wikisource (which frequently come in over 100MB which encoded with FOSS tooling), but we were told that it's our workflow at fault, and 100MB+ files are not actually expected to work. I tried to document this at COM:100MB, but was reverted". If there are persistent issues with large files, why aren't these documented specifically on the page about large media files? --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 14:17, 4 October 2021 (UTC)[reply]

Apparently this is listed above, then why not make it a sub-page of this page or include the section, I can't find much reason for its exclusion, especially not if it's such a recurring issue. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 14:19, 4 October 2021 (UTC)[reply]
What's more concerning is that "multimedia" does not appear to currently be supported by any team or group at the WMF, or at least nobody has stepped up to take ownership of the issue. Uploads larger than 100MB are apparently simply not supported, but may happen to sometimes work, using the various methods outlined on this page, if the phase of the moon aligns with… something. I think this had better also be documented on this page as well as providing instructions to try requesting server-side uploads first. Xover (talk) 14:42, 4 October 2021 (UTC)[reply]
Let's be honest here, the Wikimedia Commons is just an afterthought to the Wikimedia Foundation (WMF) and understandably so, in all my years here the only active engagement I have seen from them with the content is with the Structured Data on Wikimedia Commons (SDC) programme which while I think is wonderful doesn't really help with uploading new media, just searching and organising existing media. I have been saying for a while now that we should try to get Wikimedia Deutschland (WMDE) to dedicate a small team to help actively develop for the Wikimedia Commons like they do for Wikidata, great applications like Flickr2Commons, URL2Commons, Video2Commons, Etc. are hopelessly undermaintained, proposals to improve the technical level of categories are considered to be "low priority" on the Phabricator.wikimedia.org website, and we see very little creator-centric additions to the software. Patrolling and deleting (which are very important things) get new software features all the time, but actual content creation is an afterthought at best (note that there needs to be content to patrol and delete for the aforementioned tools to be useful). The problem is, I have no idea how to convince Wikimedia Deutschland (WMDE) to come over here, they have "the technical army" to solve most of these issues, plus they have the lobbying ability to actually get educational institutions to donate (note that some of the largest image donors here are institutions from German-speaking countries), my impression is thật waiting for the Wikimedia Foundation (WMF) won't solve most issues at the Wikimedia Commons. So it's better to warn users about the software's limitations than let them waste valuable volunteer hours having to deal with these technical limitations. --Donald Trung 『徵國單』 (No Fake News 💬) (WikiProject Numismatics 💴) (Articles 📚) 18:56, 4 October 2021 (UTC)[reply]
@Raymond: You reverted Inductiveload's addition of this (cf. the previous thread), but I'm guessing that was primarily due to the wonky Discord link? Can you clarify whether you are actually opposed to the addition of a suitably formulated section clarifying the status of >100MB uploads? The WMF's development priorities aside, leading contributors on a wild goose chase and wasting their time is neither a good use of volunteer resources nor leaves them with a particularly favourable impression of Commons and the Movement at large (since almost all projects redirect uploads to Commons and/or suffer the same problem for large uploads). Xover (talk) 09:48, 6 October 2021 (UTC)[reply]
@Xover As wrote in April I am not against a section about the current problems. But the text I reverted was wrong: Upload to Commons is formally up to 4 GB allowed. I have uploaded this year a lot of files > 100 MB. But yes, not very upload was successful and I know that the upload is wonky for various reasons. So I am open for a new section with better explanations. Raymond 17:51, 6 October 2021 (UTC)[reply]
As the word "formally" is the issue, I have reinstated it and rewritten to make it clear that it is not a formal limit, but rather a bug, or constellation of bugs, of unknown cause.
You would think that an organisation with a $130,000,000+ budget that actually employs a Senior Technical Writer and an entire team of Developer Advocates might see fit to document something, even if they somehow can't pay an engineer to fix it, but I guess why would they when they're just wasting time freely given. Inductiveload (talk) 21:04, 6 October 2021 (UTC)[reply]
Past experiences have taught many WMF staffers that they must not ever touch anything maintained by the community. Even as an editor for many years before getting hired by WMF, I often worry that any correction I make will be perceived as an offense. All WMF's fault of course, but you're not really making this a welcoming place for them. Matma Rex (talk) 21:34, 6 October 2021 (UTC)[reply]
Then document it at Mediawiki. Or Wikitech. Write an email and post a link on COM:VPT. MassMessage. Tweet about it (actually, please don't: using Twitter for communication is sign of complete organizational dysfunction). Message in a bottle and throw it in the sea and hope a Wiki user finds it. Whatever.
Sorry if I'm sounding rude, but I have spent literally (quite literally) days faffing with this issue, and currently my best plan is "scp to Toolforge, to somewhere I got copy-uploads enabled for this exact purpose, then copy-upload from there and delete the file". Not very scalable.
Anyhoo, I don't quite think "superprotect" and "documenting the software" are quite the same thing. Inductiveload (talk) 21:47, 6 October 2021 (UTC)[reply]
I concur with Inductiveload here that something is seriously broken, and need an urgent fix. BTW upload-by-URL is broken too. See my bug reports on COM:VPT. Regards, Yann (talk) 09:53, 7 October 2021 (UTC)[reply]
@Yann @Inductiveload It is broken for sure. I think we just lack experts in this area; there are maybe 3 people who could be able to debug this problem and they're all working on something else. I'll see if I can get anyone to look into it, but no promises. Matma Rex (talk) 20:53, 9 October 2021 (UTC)[reply]
@Yann (and @Inductiveload) I realized I forgot to follow up here, I was reminded today by the thread you started on everyone's favorite mailing list. I raised the issue on the internal WMF "engineering-all" channel, I got a few replies but no commitment from anyone. If anything about this happens, it will be on task T292954 or T275752 (which was said to be a possible cause). Sorry I don't have better news. Matma Rex (talk) 21:22, 27 October 2021 (UTC)[reply]
@Matma Rex: Thanks!   — Jeff G. please ping or talk to me 01:10, 29 October 2021 (UTC)[reply]
@Matma Rex: I agree with both your points. But Superprotect is, like the F***BAN incident, a bad example of this because in both cases the WMF overreached in way that 1) had or would have had significant negative consequences (and I don't mean the minor detail of the media viewer), and 2) they should have never come even close to stepping in to had their organisational perspective been even close to healthy. There are much better examples of the community biting some poor staffer's head off for some piddling perceived transgression, and I really wish the community would start to aggressively police this behaviour for multiple reasons (only one of which is basic human decency). But it is a fact that the gulf between the WMF and the community is large and growing wider, and at an ever increasing rate. I think this is both an existential threat to the Movement, and the most critical threat in both the near and far term. But as I take every opportunity to try to communicate to the Board (or anywhere else anyone will listen to me), the professional party—the one that can unilaterally set policies, assign tasks, and compensate commensurate with, e.g., the levels of stress involved—is the WMF. The community, by design, cannot be top-down instructed to stop being mean to staffers (unless we get into actual T&S territory). So the ball is in the WMF's court: if they recognise a challenge here it is they who must lead by example. And let's pray the community is still capable of learning from such examples.
I could go on, at length, on this topic, and am happy to do so to anyone that's willing to listen (and that's nobody with any kind of fancy title at the WMF: I've tried); but I've already bored everyone here to tears so I'll leave it at that. Xover (talk) 08:59, 9 October 2021 (UTC)[reply]
@Xover So, I'm not entirely happy about the WMF–community relationship, but I don't feel like the gulf is growing wider. Commons has it tough, both because it is very much underserved by WMF, and because the folks here are rather mean direct. But in my recent experience (working on the talk pages project), folks on my team has been very open to requests (and also the whole project is basically a huge request from the community, rather than something coming from the top at WMF), and folks on the various wikis have been appreciative and helpful. It feels nicer than a few years ago and I think we're making progress. Matma Rex (talk) 20:47, 9 October 2021 (UTC)[reply]
@Inductiveload: I have committed some copy-editing for clarity. Please check that it's still representative. Xover (talk) 09:32, 9 October 2021 (UTC)[reply]
I think it's a fine edit. :-) Inductiveload (talk) 19:42, 9 October 2021 (UTC)[reply]
The underlying bug has mostly been fixed now, and since it was I and others have been able to upload large files again so I've updated the advice on the page to reflect the current status. Legoktm (talk) 18:39, 6 November 2021 (UTC)[reply]
@Legoktm: Thank you!   — Jeff G. please ping or talk to me 05:44, 7 November 2021 (UTC)[reply]
@Legoktm: Yes, thank you for all the work you put in on this one (you and everyone else that helped out: we see you Filippo, Alexandros, Joe, Tim, Arturo, and "et al"!). Not just the technical bits in T292954 and linked tasks, but also the great writeup at Wikitech. I'm particularly glad to see the challenges with a lack of dedicated maintainers in this area, and the consequent need to remedy that gap, called out so clearly. I'd also like to highlight the gap discussed in the "#Links to relevant documentation" section. Clear docs on how this part of the stack actually works would make it possible for technical(ish) community members to write clearer and more actionable bug reports, and perhaps even do more of the actual debugging. At the very least it might enable the community to bug the right(ish) people rather than spray every vaguely related team / component tag into the Phab ticket and pray someone will retag it with the actually correct tag (poor Andre must have the most unenviable job there is when trying to untangle those tasks!).
PS. Matma Rex, the poke to the right venue at the right time seems to have triggered a flurry of activity that not only ended up fixing the immediate cause, but also started the rearchitecting of several other bits that should provide some nice performance improvements, reduction in technical debt, and both fewer and more actionable error messages to users (if I'm understanding all the linked tasks correctly). I think you should take yourself an appropriate amount of credit for that (`cause you sure get a star in my book!).
PPS. @Legoktm (as the one writing it up; it's probably properly addressed to "SRE" generally): I'm not sure what's the best place to bring it up, but the bit about "It is unclear if alerting would be valuable here given that the issue was noticed almost immediately …" is, I feel, a possibly a little bit too narrowly focussed on the Jobrunner timeouts and on the timeout condition. The performance of the Jobrunner tasks, if the timeouts were never triggered, would not necessarily have been detected immediately. And the performance regression and upload failures that were the end-user visible results weren't noticed and connected until much much after the fact. Having alerting on some performance metrics, such as perhaps "average throughput of large-file Jobrunner tasks" (or something that makes sense), would quite possibly have both detected and escalated this cluster of issues much sooner (and closer to / clearly correlated with the Buster migration). It may be worthwhile to put investigating that on someone's radar. Xover (talk) 08:23, 10 November 2021 (UTC)[reply]
You're welcome :) and I appreciate the feedback on the incident report. Regarding your PPS, that's a good point, I updated the section to say that for the general case some form of metrics/alerting might be useful and filed T295482. I still don't have a good understanding of all the moving parts here and left a generic actionable of "There is a general need for better metrics collection and logging of the individual actions in SwiftFileBackend", but certainly measuring throughput of these uploads seems like a good start. Legoktm (talk) 16:23, 10 November 2021 (UTC)[reply]

Still an issue?[edit]

@Legoktm: Hi, I am not sure the issue is fixed yet. I tried to upload a 74 MB TIFF file yesterday, and it failed both via upload-by-url from IA and via regular upload. It worked on the second try through. Since there was still only a criptic message, I can't report much more. I am quite surprised, as I expected files of this size to upload without any issue. Was it a temporary glitch? I would be happy to test more files if needed. Regards, Yann (talk) 21:38, 12 November 2021 (UTC)[reply]

@Yann to clarify by "regular upload", did you use Special:Upload or Special:UploadWizard? The only new issue I'm aware of is T295343, "502 next hop failed" or "502 server hangup"...if you can copy/paste or screenshot the error message next time we can at least file a bug report to track it. Legoktm (talk) 22:04, 12 November 2021 (UTC)[reply]
@Legoktm: I always use Special:Upload, as I copy-paste information from one file to another. I got a message similar to what I reported in phab:T292954 (Backend fetch failed, Oct 31st to Nov 1st). Yann (talk) 22:15, 12 November 2021 (UTC)[reply]
@Yann: I will try to bring it to the attention of the SRE Traffic team on Monday since it's in their area of expertise. But in general I think there are issues with Special:Upload that you might want to use User talk:Rillke/bigChunkedUpload.js instead. Legoktm (talk) 23:54, 12 November 2021 (UTC)[reply]

Server side upload limit now 5gb[edit]

Just FYI, there is now experimental support for uploads up to 5GB big when using server side uploads. If all goes well this will probably be extened to chunked uploads at some point in the nearish future. Bawolff (talk) 19:12, 8 February 2024 (UTC)[reply]

Technical reason for new 5 GiB limit[edit]

It's nice to see the limit progressing higher, but for curiosity's sake, what is the technical reason for the 5 GiB limit?

5 GiB isn't a power of two like 4 GiB are. The 4 GiB limit was due to being the highest number representable by a 32-bit integer, so which technical reason makes the new limit exactly 5 GiB? It seems like an arbitrary number set by the creators of OpenStack. Elominius (talk) 00:45, 11 April 2024 (UTC)[reply]

@Bawolff: might be able to answer that. Yann (talk) 09:48, 11 April 2024 (UTC)[reply]
@Elominius: It seems like a small step over the 4 GiB limit as experimental proof of concept before the developers go higher. It seems to be going well, aside from my results documented at Commons talk:Video2commons#missingresult: No result in status data.   — 🇺🇦Jeff G. please ping or talk to me🇺🇦 10:25, 11 April 2024 (UTC)[reply]
It is indeed a limit of the software used. As I understand, this limit will not be raised, but there might be a way to upload/store/deliver larger files in chunks of no more than 5GiB - once the 5GiB upload has proved to be stable. Unfortunately the uplaod is not stable. I have a file of about 4.9GiB that cannot be uploaded. Until now it was possible to upload a frgrment of about 3.2GB of it, and while uploading one larger fragment after the other errors in the upload process (that only affect some files, but can affect any file) have been identified, and code changes in the upload process have been made to try to address this errors. But it is now possible to use AV1 instead of VP9 and end up with smaller files with equal quality. Another problem is transcoding: If the transcoded version of a file is estimated to be larger than 3GiB (or another limit installed into MW) the transcoding will not start at all. If transcodes turn out to be larger than a limit, the transcode will stop and fail. (also: transcodes are VP9 in a MP4 container for compatibility with web browsers). C.Suthorn (@Life_is@no-pony.farm - p7.ee/p) (talk) 12:00, 11 April 2024 (UTC)[reply]
Yes, the 5gb limit is the largest size before we have to use swift large object support [2]. MediaWiki does not currently support swift large objects. In addition to mediawiki support there are other blockers to allowing large objects on the SRE side (from what i understand, i don't know the full details [3]. In any case, the effort is high enough that we probably aren't going to bother with >5gb anytime soon unless there is a very compelling need for it) . As C.Suthorn noted, large file uploads are also broken right now (phab:T358308) so you can really only hit the 5gb limit via server side upload. However that is being (slowly) worked on. Bawolff (talk) 14:33, 11 April 2024 (UTC)[reply]