Hosting assets on the cloud

Anthony Gallon Posted in Performance and Scalability 3 years ago

Whenever a user uploads a file, like a photo in an album, or an MP3 or a video etc, they are stored in the ossn_data directory. Photos and videos can often be a few megabytes each, and if members begin to upload lots of data, it doesn't take long for gigabytes to add up:

100 users x 10 albums x 20 photos at 800kb each = 15 Gigabytes.
100 users x 20 videos at 5mb each = 10 Gigabytes
100 users x 50 songs at 4mb each = 5 Gigabytes

Most shared hosting services will charge big money for hard drive space on a server, so that's why there are online services for hosting asset files at much cheaper rates.

As a benchmark price, Digital Ocean Spaces offers plans of 250GB per month for $5 with no extra charges. There are other providers too that offer slightly different options:

https://www.coralnodes.com/amazon-s3-alternatives/

Another benefit of moving assets onto cloud storage is easier load-balancing. For example, if there becomes a lot of traffic and an extra web server needs to be deployed, then the two web servers will need to share a common data source. One way to do this is to use a shared volume that is mounted on both web servers, so that each of them are reading the exact same files. However, the price for shared volumes is around five times the price for Cloud Storage.

I'd like an option for cloud storage so members can upload large photo albums and videos etc. I think it should be able to provide support for the multiple Cloud Storage service providers, and I think it should be done by a component so that the default behaviour of OSSN is unaffected. I think the built-in hook system in OSSN can provide ample means for this, but it would require a few hook events to be added in the core at various points of execution.

I'm interested to hear what others think about these ideas before I make a formal plan. For example, is there any existing Cloud Storage library that you'd like to see being used as the backend API in the component? Is there a favourite Cloud Storage service of yours that isn't mentioned here? etc. Any big ideas for future development phases that you think should be planned for from the start? etc, things like that would be good to talk about.

Replies
nz Anthony Gallon Replied 3 years ago

Yes I think it is worthwhile to develop the presigned URL feature for that reason, and with configurable settings for the lifespan of a URL etc.

The difference in costs between Spaces and Volumes on Digital Ocean is five times (ie: $5 per month for 250GB vs $10 per month for 100GB). It doesn't make sense to host media files on SSD volumes when Spaces exists for that purpose, and it gives the additional benefit of CDN with no extra cost.

There are other services that could be more economical too.

Indonesian Arsalan Shah Replied 3 years ago

There is other option like NFS but you mentioned in email that its bit expensive.

(https://www.digitalocean.com/community/questions/can-i-setup-blockstorage-as-a-nfs-for-my-web-app)

Indonesian Arsalan Shah Replied 3 years ago

I see, I believe that granting short lifespan in many cases. I can only try to add hooks to alter the file path and values. But handling such a scenarios like

  1. UserA made a video public,
  2. Users viewing it and may have direct URL of CDN
  3. UserA made is for friends only but users have direct URL of CDN maybe able to access

I mean this is just a one example I hope you know what I wanted to say. Using the direct resource URLs will make the control of viewing/downloading file out of the OSSN control.

One more example.

I have a important file that only loggedin users can download.
Someone from comunity just shared that URL to public and non-loggedin users can download it.

Is there way way to control this? even if URL is passed it should not work if user is not loggedin. But I am afraid I think this is not easy to achieve.

nz Anthony Gallon Replied 3 years ago

What you're describing Arsalan, is the presigned URL feature that AWS S3 has:

https://www.youtube.com/watch?v=fLAT5Xjbp1w

There's a link here to a discussion about DO Spaces having the same feature:

https://www.digitalocean.com/community/questions/signed-urls-for-private-objects-in-spaces

But not all Cloud Storage providers support presigned URLs, and even worse, is that presigned url's can't be revoked. So, the only way to use presigned urls for this would be to grant a very short lifespan for the signed url and to refresh that signed url every time it is requested, which might not be so practical.

The suggested work-around for those who need to revoke presigned urls, is to rename the object in the bucket:

https://stackoverflow.com/questions/38980769/is-it-possible-to-invalidate-or-revoke-an-aws-cloudfront-signed-url-after-it-has

It is possible to move a file in S3 to a new filename without having to download the file to the server and re-upload it. S3 takes care of the renaming via the API call. So that's probably the best way to handle it. The component would need to generate a unique filename every time it moves a file on the CDN, and it would have to make sure that the latest filename is mapped to the original filename in the database, and we would need to expose a new hook in the component to do that.

in Balamurali Govindan Replied 3 years ago

Thank You Anthony Gallon. I was planning to start on this topic, because this could be a big benefit to the ossn performance.

DO spaces support CDN and other providers also support inbuilt CDN and files/videos/images (the bulk of the data) would be served from the cloud, so the server would be at its best to serve the web traffic. So there would be a significant performance boost since the file cache is also taken care at the cloud.

Great idea and initiative. POC would be awesome to test this.

Indonesian Arsalan Shah Replied 3 years ago

Thanks for writing a topic here as discussed on email with you. One thing I need to ask is how these CDN works in respect of public url? forexample

  1. There is a video file (example)
  2. I made a public video
  3. Someone viewed the video page (not viewed yet)
  4. I change privacy to friends only.
  5. Now in current OSSN implementation if that user tries to load the video resource it won't load.

How this works in CDN? Because to achieve what you mentioned I need to know how these CDN will handles these cases.

nz Anthony Gallon Replied 3 years ago

It will probably touch a few different files, but I have been thinking about it and the idea I have is to use a queue worker to shift the files off the server after they have been uploaded, and then the component would listen with a hook in order to rewrite the URL's at read-time.

As I see it, there's only four places where a URL is being generated: Profile Photo, Cover Image, Wall Image and Photo Albums. Other types could be supported by the same method too.

So OSSN doesn't need to know anything about the fact that the files aren't there because it gets all the info about file urls from the database, and that data is compiled at the time when the files are uploaded. The only change would happen at the very last minute when reading the url from the database and just before it gets written into an <img> tag in the template. The hook would be called and the component would replace the value of $url.

The other part that will be affected is anywhere that the server is reading the file on the filesystem, like for example if it is doing a filesize() or file_exists() etc.

Asides from the PHP moveuploadedfile() and filesize() at the time of an upload, I haven't seen any place where the files are being read off the hard drive except for when they are being served as a result of a URL pointing to them. So if the URL in the template is pointing to the Cloud, then OSSN would never be called to serve the image from the filesystem anyway.

Finally, whenever an upload is deleted, as for example replacing a profile photo or cover image, deleting a post etc. There would need to be a hook there too so that the file can be removed from the Cloud. It probably should place a 1px placeholder image on the filesystem just so OSSN can delete it as it normally would without an error.

There's only one more case that hasn't been covered, it is the altering of existing photos, but from what I've found it doesn't look as though a file is ever processed a second time in OSSN after it has been uploaded. In that case though, the component would need to download the file from the Cloud and put it in the ossn_data directory before OSSN can operate on it. So I'd want to make sure that there is a working hook for that in the component, even if it isn't yet being used.

This should all be transparent to OSSN core so that the core continues to function as it does presently even if the Cloud component is disabled or deleted.

It probably sounds like a lot of work, but I really think it's not that much compared to the value of the feature. It's actually a requirement for anyone who wants to scale a website.

Tell me more about your case though: you said that you've been wanting this for a while, maybe you have something else for me to keep in mind.

gr Rafail Stratiotis Replied 3 years ago

i'm sure it can be done somehow..
but i'm not sure if ossn should connect to cloud storage or dedicated-vps server connect to cloud storage or both
this has been it troubles me to for a long time..

this is a source I found from Stackoverflow for this topic ( Link )