I just listened to this AI generated audiobook and if it didn’t say it was AI, I’d have thought it was human-made. It has different voices, dramatization, sound effects… The last I’d heard about this tech was a post saying Stephen Fry’s voice was stolen and replicated by AI. But since then, nothing, even though it’s clearly advanced incredibly fast. You’d expect more buzz for something that went from detectable as AI to indistinguishable from humans so quickly. How is it that no one is talking about AI generated audiobooks and their rapid improvement? This seems like a huge deal to me.

  • simple@lemm.ee
    link
    fedilink
    English
    arrow-up
    118
    ·
    1 year ago

    A lot of people just aren’t aware of how fast AI is moving. AI voices were pretty meh earlier this year. A lot of people working on the audiobook/voice acting scene have been talking about this though.

    • driving_crooner@lemmy.eco.br
      link
      fedilink
      arrow-up
      40
      ·
      1 year ago

      I recommend everyone to check the YouTube channel “two minute papers” who have being doing videos about papers on AI for the last 10 years on so to see the accelerated progress AI have. Like 5 years ago those images generating AI looked like LSD infused dreams and now they look almost perfect.

      • Magrath@lemmy.ca
        link
        fedilink
        arrow-up
        8
        arrow-down
        1
        ·
        1 year ago

        I wish I could watch his videos but the way he talks is awful. It’s like some exaggerated evolution of YouTube talk.

      • mindbleach@sh.itjust.works
        link
        fedilink
        arrow-up
        4
        arrow-down
        1
        ·
        1 year ago

        I’m only shocked that video isn’t better. Diffusion models work like denoising - so you’d figure all the wiggly nonsense between frames would be the first thing to filter out.

        • Turun@feddit.de
          link
          fedilink
          arrow-up
          3
          ·
          1 year ago

          I expect the data size to be a problem. Stable diffusion defaults to 512x512px, because it simply requires a lot of resources to generate an image. Even more so to train one. Now do that times 30 to generate even one second of video. I think we need something that scales better.

          I fully expect this to work decently in a few years though, no matter how hard the challenge is, ai is moving really fast.

            • Turun@feddit.de
              link
              fedilink
              arrow-up
              1
              ·
              1 year ago

              Of course, but that is precisely the problem. It gets expensive really really fast.

          • mindbleach@sh.itjust.works
            link
            fedilink
            arrow-up
            1
            ·
            edit-2
            1 year ago

            “Fisheye” generation seems obvious. Give the network a distorted view of an arbitrarily large image, where distant stuff scrunches inward toward a full-resolution point of focus. Predict only a small area - or even a single pixel. This would massively decrease the necessary network size, allowing faster training. (Or more likely, deeper networks). It’d also Hamburger Helper any size dataset by training on arbitrarily many spots within each image instead of swallowing the whole elephant.

            Even without that, video only needs a few frames at a time. You want to predict a future frame from several past frames. You want to tween a frame in the middle of past and future frames. That’s… pretty much it. Time-lapse “past frames” by sampling one per second, and you can predict the next second instead of the next frame. Then the stuff between can be tweened.

        • driving_crooner@lemmy.eco.br
          link
          fedilink
          arrow-up
          3
          ·
          1 year ago

          I give it a year, maybe two, for a fully synthetic video that couldn’t not be easily distinguish from reality. There’s already some very good AI that complete or replace backgrounds on videos that work really good, and completely synthetic videos that looks like nightmares for now.

          • mindbleach@sh.itjust.works
            link
            fedilink
            arrow-up
            3
            arrow-down
            1
            ·
            1 year ago

            I expected it to be here six months ago, but its continued absence hasn’t changed my estimate from “any day now, and suddenly.” All of this is so weirdly democratized (and pornography-motivated) that we’re seeing the cool stuff before all the scary disinformation concerns.

            And the underlying mechanisms are straight-up “the missile knows where it is, because it knows where it is not.” Stable Diffusion compares the noise estimate with and without a particular term, takes the difference, and then leaps outward along that vector.

  • LadyLikesSpiders@lemmy.ml
    link
    fedilink
    arrow-up
    105
    arrow-down
    8
    ·
    1 year ago

    Ah yes, Audio AI. I can’t wait for this rapidly-approaching future where you literally won’t be able to trust the validity of anything your senses tell you anymore

      • AVincentInSpace@pawb.social
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        But up until this point, you see, there has always been one medium that is difficult/expensive enough to convincingly fake that it can reasonably be used as proof that something actually happened. If technology advances to the point where a video of something happening is no more convincing than a text description that it happened, and no other more sophisticated, harder-to-fake medium steps in to replace it…

        I don’t want to live in a world where the truth is anything you can convince your friends of, you feel me?

        • mindbleach@sh.itjust.works
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          “Up until this point” meaning maybe eighty years where unexpected events had any chance of being on film or televised, and several decades where amateur video was even theoretically possible.

          And solid corroborating evidence still barely moved the needle whenever it was footage of cops trying to kill someone.

          And what’s going to make bodycams necessary regardless is chain-of-custody demonstrating (a) the footage matching what the victims said absolutely came from the camera strapped to the chest of the accused, or (b) some motherfucker orchestrated a cover-up that demonstrates consciousness of guilt.

    • AdmiralShat@programming.dev
      link
      fedilink
      English
      arrow-up
      36
      arrow-down
      1
      ·
      1 year ago

      Imagine the day when people post videos of the president saying literally anything with pitch perfect audio voice synth

      Imagine going to prison for a generated clip of you confessing to a crime.

      • FaceDeer@kbin.social
        link
        fedilink
        arrow-up
        27
        arrow-down
        2
        ·
        1 year ago

        Once the tech is that good, a recording of your confession will be useless as evidence in court.

        • AdmiralShat@programming.dev
          link
          fedilink
          English
          arrow-up
          13
          ·
          edit-2
          1 year ago

          …but it is already that good? The fact that celebrities are having to come out and say it wasn’t them in an ad is proof enough that it can fool people

          You only need to fool a jury

          • FaceDeer@kbin.social
            link
            fedilink
            arrow-up
            9
            ·
            1 year ago

            Then we’ll have to take more care with how jury trials are conducted. It’s always been possible to fool juries, that’s often a lawyer’s entire strategy.

        • Moneo@lemmy.world
          link
          fedilink
          arrow-up
          5
          ·
          1 year ago

          That got me thinking about when we’ll hear the first case of AI generated security camera footage used to frame someone. Which leads me to wonder when it will be standard procedure for cameras to digitally sign their footage.

        • xkforce@lemmy.world
          link
          fedilink
          arrow-up
          8
          arrow-down
          3
          ·
          1 year ago

          Everything will be useless in court. Audio evidence? Worthless. Video evidence? Worthless. Physical evidence? Prove that it wasnt planted. That kind of AI is a fucking nightmare and no one really understands the danger that kind of AI poses.

          • FaceDeer@kbin.social
            link
            fedilink
            arrow-up
            10
            arrow-down
            1
            ·
            1 year ago

            AI can’t tamper with physical evidence. It can’t fake financial records or witness testimony. Many kinds of audio and visual recordings will still have sufficient authentication and chain of custody to be worthwhile.

            The main kind of evidence that these AI generators makes untenable are the ones where someone just shows up and says “look at this video of X confessing to Y that I happen to have,” which was never a particularly good sort of evidence to base a court case on to begin with.

            • xkforce@lemmy.world
              link
              fedilink
              arrow-up
              7
              arrow-down
              1
              ·
              edit-2
              1 year ago

              Witness testimony is already a very unreliable source of evidence. And again, evidence can be planted. Hell there was doubt about the chain of custody before AI could just make up audio and video. The validity of the chain of custody boils down to the cops and government in general being trusted enough to not falsify it when it suits them.

              Sufficiently advanced AI can, and eventually will, be capable of creating deepfakes that cant reliably be proven to be false. Every test that can be done to authenticate that media can be used by the AI to select generated media that would pass scrutiny in principle.

              I love the optimism and I hope you’re right but I don’t think you are. I think that deepfake AI should scare people a whole lot more than it does.

              • FaceDeer@kbin.social
                link
                fedilink
                arrow-up
                2
                ·
                1 year ago

                The validity of the chain of custody boils down to the cops and government in general being trusted enough to not falsify it when it suits them.

                There are ways to cryptographically validate chain of custody. If we’re in a world where only video with valid chain of custody can be used in court then those methods will see widespread adoption. You also didn’t address any of the other kinds of evidence that I mentioned AI being unable to tamper with. Sure, you can generate a video of someone doing something horrible. But in a world where it is known that you can generate such videos, what jury would ever convict someone based solely on a video like that? It’s frankly ridiculous.

                This is very much the typical fictional dystopia scenario where one assumes all the possible negative uses of the technology will work fine but ignore all the ways of being able to counter those negative uses. You can spin a scary sci-fi tale from such speculation but it’s not really a useful way of predicting how the actual future is likely to go.

      • Shyfer@ttrpg.network
        link
        fedilink
        arrow-up
        19
        ·
        1 year ago

        Or imagine politicians like Trump saying the most heinous stuff and then denying it saying it’s fake or AI. How will people know? You won’t even be able to trust your eyes or ears anymore.

      • Helix 🧬@feddit.de
        link
        fedilink
        arrow-up
        4
        arrow-down
        1
        ·
        1 year ago

        Guss we’ll have to resort to digital watermarking with personal certificates then.

      • LadyLikesSpiders@lemmy.ml
        link
        fedilink
        arrow-up
        4
        ·
        1 year ago

        You know some people are just gonna generate that fucking locker room smell, the reek of hormones and axe body spray, to terrorize people

    • FooBarrington@lemmy.world
      link
      fedilink
      arrow-up
      4
      ·
      1 year ago

      Tech like this has been available for a number of years, and has most likely already been used against you. It’s now getting available for the broader masses, but that might just be a blessing in disguise, since increased awareness will hopefully also make you suspicious of those cases that are already happening.

      • LadyLikesSpiders@lemmy.ml
        link
        fedilink
        arrow-up
        9
        arrow-down
        3
        ·
        1 year ago

        Yes, but you could tell they weren’t real. They still needed real voice actors, real sound design, studios and stages and resources. Anyone with a halfway decent rig can fake shit to a very believable degree. Even with CGI you swear is fantastic, you see its fakeness once the novelty wears off

  • Bebo@lemm.ee
    link
    fedilink
    English
    arrow-up
    33
    ·
    1 year ago

    I want TTS made better with AI so that I won’t need huge audiobooks filling up my phone. The epubs that I already have would serve as audiobooks when needed.

    • bionicjoey@lemmy.ca
      link
      fedilink
      arrow-up
      9
      ·
      1 year ago

      If your phone is rendering TTS on the fly that’s probably going to be a drain on battery.

      • Bebo@lemm.ee
        link
        fedilink
        English
        arrow-up
        4
        ·
        1 year ago

        I have frequently used tts for listening to epubs. I have, however, not noticed much battery drain… And it’s not as enjoyable as listening to an audiobook read by a narrator you like but it kind of works to a certain extent. So I wish you tts would get better.

  • theskyisfalling@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    32
    arrow-down
    2
    ·
    1 year ago

    As someone who only consumes books in audiobook form this is great news for me, I tried to listen to some automatically generated audio books around 2 years ago and I found them horrible to listen to just because they sounded so off.

    I’d love to be able to copy in the text of a book and get actually listenable (is that a proper word?) audiobook out of the other side for some books that will just simply never be recorded by actual people due to being too old / obscure.

    I’ve been wanting to be able to listen to the Pelucidar books for years but they just don’t exist in audio format, is there somewhere publically available that I can do this?

    • not_a_bot_i_swear@lemmy.world
      link
      fedilink
      English
      arrow-up
      17
      ·
      1 year ago

      I would guess there is a LOT of work going into each voice. Playing with different parameters and prompts. I don’t think it’s as simple as just copying the text into a box. Not yet at least :)

      • Nukken@lemmy.world
        link
        fedilink
        English
        arrow-up
        7
        ·
        1 year ago

        That’s a good thought there though. Audiobooks could have each character voiced uniquely.

        • AdmiralShat@programming.dev
          link
          fedilink
          English
          arrow-up
          8
          ·
          1 year ago

          This is literally the only upside I see from this.

          One of the Dune audio books started off as multiple voices and then part way through it was finished by just one guy. Really impressed with it at first, and then really kind of debuffed by it. I had already read the book years before so it wasn’t a big deal, but like wtf?

      • theskyisfalling@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        2
        ·
        1 year ago

        I can hope! With the speed things are developing it may not be too long.

        I haven’t played around with or looked into much to do with AI at all but would be willing to put in some time into playing with prompts / parameters if it meant I could eventually create a reliable work flow to create things such as what I mentioned.

        I think I’ll have to do some research, I need some more old school hollow earth stories in my life xD

      • pretzelz@lemmy.world
        link
        fedilink
        arrow-up
        1
        ·
        edit-2
        1 year ago

        I don’t see why you couldn’t give a few examples and then grab the dialog of a person in along with their description (or just the whole book) and get the llm to generate the prompt for you

      • Nukken@lemmy.world
        link
        fedilink
        English
        arrow-up
        11
        ·
        1 year ago

        I can’t speak for OP but I do this as well. For me it’s because I listen to them on the drive to/from dropping my kids off at school and I’ll have it playing while I’m working or playing a game.

        • LadyLikesSpiders@lemmy.ml
          link
          fedilink
          arrow-up
          5
          ·
          1 year ago

          As someone who would like to do this, how well do you actually pay attention to what is going on? I’d do so much more reading if I didn’t have to go back and reread paragraphs several times over because I simply can’t pay attention, let alone if I’m doing something else entirely

          • milkisklim@lemm.ee
            link
            fedilink
            arrow-up
            4
            ·
            1 year ago

            If you’re interested further, check if your local library has a partnership with Libby. It’s an app that you can check out audiobooks from.

            • theskyisfalling@lemmy.dbzer0.com
              link
              fedilink
              arrow-up
              1
              arrow-down
              1
              ·
              1 year ago

              I think your success with these apps depends heavily on your country. I always hear good things about Libby in the US but the equivalent here in the UK I think is absolute dog shit.

              The selection is woefully small not even including really popular books like say lord of the rings for example, you can request books are added but they have a finite amount of books they can add in a month and of all the things I requested they never added any of them.

              On top of that often I would have to join a virtual queue because someone already "checked out " the audio book I wanted to listen to and so you have to wait for them to finish. Often as well they wouldn’t “return” it when they had finished so I’d have to wait until the standard amount of time was up and the system forced them to return it before I was able to listen to it.

              It cost a tenner a month to get access to this service but in the state it was in last I tried it I wouldn’t even say it was worth that.

          • AdmiralShat@programming.dev
            link
            fedilink
            English
            arrow-up
            4
            ·
            1 year ago

            It depends. It definitely is easy to get distracted and need to rewind but I found that happens much less often than with sitting down and reading in text form.

            Its a solid solution and I recommend you give it a try.

            AudiobookBay and youtube have tons of books

          • Kiosade@lemmy.ca
            link
            fedilink
            arrow-up
            3
            ·
            1 year ago

            I listen to audiobooks when driving as well and am PRETTY sure i have ADHD (haven’t gotten officially diagnosed yet). For me, it… “distracts” the part of my brain that wants to get frustrated at all the bad drivers/traffic slowdowns. Unless things get particularly hectic, like trying to make it to an exit in time in dense traffic, it usually works great, and if I find myself not taking in certain parts, I tap a button on my audiobook app that goes back 30 seconds so I can properly understand it.

            It’s a great combo, because like you, if I’m just sitting at home listening to an audiobook, I get “partially bored” and start looking at random stuff online. But when driving, well, that part of my brain is focused on driving, so I don’t get bored like that.

              • theskyisfalling@lemmy.dbzer0.com
                link
                fedilink
                arrow-up
                3
                ·
                1 year ago

                To weigh in on the concentrating part I find if I have something to do like when I am setting machines at work which does involve thinking about what I am doing then I actually concentrate well and take in what I am listening to and absorb it. Once I have finished setting the machine and start running it which requires little to not thought (until something goes wrong) that is when I won’t be able to concentrate on the book and will usually switch to music as my mind wanders off.

                So for things like driving, running, cleaning, cooking etc I will often put a book on and concentrate just fine on what is being said.

                With driving and running it does depend on my mood though as both those activities have a certain level of your brain switching off and running on auto pilot which is when I find myself starting to not concentrate.

                I’d definitely recommend giving it a try and seeing how you find it as it helps the time fly by if you can get into it :)

              • Kiosade@lemmy.ca
                link
                fedilink
                arrow-up
                2
                ·
                1 year ago

                I hate them too… to me, they drive like they walk around stores (Especially at Costco!). They’re either very slow and in the way, with no self awareness of that fact, or right on your ass, pressuring you. Almost no inbetween these days.

          • LoganNineFingers@lemmy.ca
            link
            fedilink
            arrow-up
            2
            ·
            1 year ago

            You’d be surprised how much stuff you can miss in books and still be clipping along ;)

            But in all seriousness, I have ADHD and sometimes my mind wanders and I have to rewind but it’s not often… anymore.

            I found audiobooks to be a learned habit. I started with books I knew well already (hello Harry Potter) or books I watched the movies for (Lord of the Rings). It helped if I tuned out because I wasn’t going to to miss anything I didn’t already know.

            A couple of pluses on audiobooks is that you can increase or decrease the reading speed depending on your comfortability (I usually sit between 1.4x - 1.6x, YMMV and it depends on the pacing of the recorded person). Also, you experience the authors work entirely without skipping things (which we often do as readers)

      • Bldck@beehaw.org
        link
        fedilink
        English
        arrow-up
        7
        ·
        1 year ago

        Not OP, but I almost exclusively read novels and non fiction via audiobooks. For context, I’m on pace for 70 books this year.

        My main reason for audiobooks is I having a driving commute. Two hours a day round trip. Audiobooks keep me sane in a way that podcasts or music do not. I also do audiobooks when doing chores around the house.

        Second, I struggle to focus on reading a book on my phone. Too many distractions and I think the reading experience is subpar. I do have an eInk reader, but I haven’t charged it in years because it’s easier to do audiobooks.

        Physical books are rare in my home, but that’s a self-reinforcing cycle since I enjoy audiobooks so much.

      • saigot@lemmy.ca
        link
        fedilink
        arrow-up
        5
        ·
        edit-2
        1 year ago

        I like to read books before bed, but need darkness for a while before I have any chance of going to sleep, so me and my wife listen to 45min of audio book a night before going to sleep. Plus when we listen together there is no need to worry about getting ahead of each other and spoiling stuff.

        I read books in other scenarios but that ritual is by the most time I have for reading and the most consistent as well.

      • Catoblepas@lemmy.blahaj.zone
        link
        fedilink
        arrow-up
        4
        ·
        1 year ago

        Personally I mostly use audio books instead of reading because I get eye strain a lot easier than I used to. I go to an eye specialist for unrelated issues yearly, so it’s not an issue with a wrong lens prescription. It’s not a problem when I’m doing a low attention task where I can look away frequently, but for reading it sucks.

      • theskyisfalling@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        4
        ·
        1 year ago

        Not rude at all, similar to the other responses people have given but it oa two fold really. Firstly I just don’t do well with sitting and reading a book, I get bored very quickly, can’t concentrate on what is happening and start re-reading sentences or pages over and over where I am not paying attention properly. Additionally after only a couple of pages it will start putting me to sleep, I guess my attention span is just not sufficient for this form of media.

        As a result I never read any books until I discovered audiobooks and my love for them, I honestly just disregarded books as a form of entertainment and thought they were a waste of time until discovering this way to consumer them which wasn’t until I was in my early 30s.

        On top of that I now listen to them mostly at work, I work with industrial machines and the work is repetitive as fuck and having a book to listen to makes the time go a lot faster and in a lot more interesting manner. Consequently I now love books and will listen to between 6 and 10 hours a day and now listen to them when I’m doing things like cooking, cleaning or running when I am not at work.

        • crank@beehaw.org
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 year ago

          Back in the 19th century when unions were powerful and innovative, a lot of people had jobs where they had to sit and do repetitive tasks in a room all day. A lot of it was handwork that didn’t have big loud machines.

          So one of the demands made by workers in such situations was that the employer would pay someone to come in and provide entertainment such as reading a book or giving talks on subjects of interest. The book or lecturer of course being selected by the workers via the democratic process of the union. And then of course the workers became way more educated because they suddenly had 8-12 hours daily to read books together. Since knowledge is power, the workers became stronger and more decisive in their collective actions.

          When you are listening to audiobook at work you can know you are in a long tradition of workers exercising power over their job conditions. Although now it is individualized in the implementation. The desire to have your mind even though the job has your body and some concentration is universal.

    • crank@beehaw.org
      link
      fedilink
      English
      arrow-up
      6
      arrow-down
      1
      ·
      1 year ago

      Well you can always pay someone to read it for you. Blind people do that.

      Are any of these books public domain? If so the print version could be eligible for inclusion at Project Guttenberg. PG has very specific docs about eligibility for this. You could probably get a scan from archive.org if you don’t have one. You would have to clean up the OCR by hand.

      Then it would eligible to be requested from the volunteer (human) readers who have been pumping out Libra audio books for years at LibriVox.

      Recently I saw Gutenberg has a collab. They are producing and distributing Libre guidebooks generated by AI. I believe I read on one of the pages they have 4000 done. I haven’t tried it out but I guess I should.

      Project Gutenberg, Microsoft, and MIT have worked together to create thousands of free and open audiobooks using new neural text-to-speech technology and Project Gutenberg’s large open-access collection of e-books. This project aims to make literature more accessible to (audio)book-lovers everywhere and democratize access to high quality audiobooks. Whether you are learning to read, looking for inclusive reading technology, or about to head out on a long drive, we hope you enjoy this audiobook collection.

      I assume this is also a great benefit as fertilizer down at the old AI content farm which is otherwise totally run over with reddit shitposts.

      If anyone tries it let me know how it goes.

      • theskyisfalling@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        4
        ·
        1 year ago

        The books I specifically mentioned are now public domain as they are old enough and librevox is where I actually started my audiobook (and books in general) journey. One of them is on there but it is only the second book of what is a 5 or more book series which is kinda frustrating.

        The volunteer readers are very hit and miss however and I find that more than half are just not listenable for me due to different reasons from poor actual recordings, poor reading ability by the reader with excessive pauses added “errs and ummms” to mispronunciation of words constantly. These are pedantic reasons maybe and I throw no shade over it to the people that have volunteered their time to read these books but I just can’t listen to them personally for the same reason I could never get through any amount of time with a robotic text to speech program of the past.

        I’ll look into the project Gutenberg thing however, thanks for making me aware of it and see what is up with that :)

        • crank@beehaw.org
          link
          fedilink
          English
          arrow-up
          2
          ·
          1 year ago

          Totally true about the librivox readers. They are doing their best. :) There are some total gems in there. But I have definitely given up on a few of them. OTOH I have given up on professionally read audiobooks too for all sorts of reasons.

          • theskyisfalling@lemmy.dbzer0.com
            link
            fedilink
            arrow-up
            1
            ·
            1 year ago

            Absolutely, I love some of the librevox readers and have found new books I enjoyed immensely just from seeing what other things the ones I enjoyed had read, i found it a good way to find new books for a while because usually they are reading other books they personally enjoy that are similar to the one I had looked for initially.

            Likewise just because they are “professionally read” doesn’t make them good by default. Some peoples voices or accents just don’t sit well with me trying to listen to them which is no fault of their own and personal preference on my part but some are just plain bad and I can’t believe someone paid them for that work and found it acceptable enough to release it into the wider world :D

    • jungle@lemmy.world
      link
      fedilink
      arrow-up
      3
      ·
      edit-2
      1 year ago

      I listen to a lot of audiobooks too, but I wouldn’t listen to something like this.

      Have you listened to the one OP posted? After a minute I’m sleeping. There’s no emotion, no tension, nothing.

      I can’t stress enough how bad OP’s sounds. Sure, it sounds natural when compared to what technology was capable of some time ago, but it’s dead inside.

      Good voice actors bring a book alive. This doesn’t.

      • theskyisfalling@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        2
        ·
        1 year ago

        I hadn’t actually when I originally wrote this comment as I wasn’t somewhere that I could listen to audio (without being an obnoxious cunt and I won’t be one of those people even if the majority of people seem to be OK with it xD)

        However I would agree, I couldn’t listen to a whole book with this monotonous drone of a voice, however like you said compared to what was produced by voice to text type systems of the past this is miles ahead and I certainly look forward to the technology progressing to a point where it is listenable for me.

        I agree with good voice actors being essential for enjoyment, I think that is one of the reasons I fell in love with graphic audio productions over the last couple of years.

    • WebTheWitted@beehaw.org
      link
      fedilink
      arrow-up
      3
      ·
      1 year ago

      I’m pretty sure that Amazon tried to do this with Kindle a few years ago and got sued by book publishers.

      Ahh, it was audible.

      It’s only a matter of time though before this sort of thing is ruled on and deals are inked. Open source is already getting pretty far too.

    • Terrasque@infosec.pub
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      Look at the description of the video. It’s not automatically generated. He made several voices and narrator and applied it to each character.

      While insanely cool, it’s not “put in book here, get audio book there”

      • theskyisfalling@lemmy.dbzer0.com
        link
        fedilink
        arrow-up
        2
        ·
        1 year ago

        Yes I realise that and was over simplifying in this response but as I stated in another comment I would be more than happy to work on prompts for myself if it could generate something satisfactory to listen to.

        The video posted by OP still sounds a bit “dead” so I don’t think the tech is quite there yet but it is promising for the future the way it is headed.

  • milicent_bystandr@lemm.ee
    link
    fedilink
    arrow-up
    19
    ·
    1 year ago

    That sounds pretty cool, though I’d be concerned it will suffer from the classic problem of current AI (…and humans, but that’s by the by) of confident incorrectness. Like an automatic transmission can miss meanings and types of context that a human will spot, programmatically generating speech can probably mess up punctuation and flow - even the way a human reader sometimes will get part way through a sentence and realise they need to start again for it to come out right.

    That said, I can’t see it being a big problem for most works, just unfortunate here and there. For once it seems an AI application short on downsides! (Except for the usual economic ones for many people previously trained in the field.)

  • rustyredox@lemmy.world
    link
    fedilink
    arrow-up
    14
    arrow-down
    2
    ·
    1 year ago

    There was a fairly big 40K lore channel on YouTube with a rather good AI impersonation of David Attenborough’s voice and narration style/scripting. However, I just went to check it, yet it must have recently gotten hit with a DMCA and taken down. A shame really. Though I never got into 40K lore before, or the 40K franchise in general, I am a big fan of David Attenborough, and so that ended up really drawing me in to a new literary universe. However, it was a big mistake by the YouTube creator to use the name and photo likeness of Attenborough in the branding, video titles, and thumbnail art on the channel. I think without pushing that line, the AI voice with a clear disclosure could have kept the channel under the legal radar.

    From the pinned comments made here, this looks to be the same creators new channel, now using a different voice, no longer based on any one real person:

  • maxprime@lemmy.ml
    link
    fedilink
    arrow-up
    10
    ·
    1 year ago

    I’ve been getting into audiobooks in a big way recently. This is interesting but somehow seems off to me. Maybe I’ll try listening to one and have my mind changed. We’ll see!

  • Gamma@beehaw.org
    link
    fedilink
    English
    arrow-up
    13
    arrow-down
    5
    ·
    1 year ago

    Because it has the potential to become actively harmful to the audiobook industry

  • chicken@lemmy.dbzer0.com
    link
    fedilink
    arrow-up
    12
    arrow-down
    5
    ·
    1 year ago

    Audiobooks are offputting to me and I strongly prefer to read text, but this seems like a great thing overall for making books more accessible to people. More people experiencing a wider range of books is good.

    • Zikeji@programming.dev
      link
      fedilink
      English
      arrow-up
      5
      ·
      1 year ago

      Audiobooks have been a great coping mechanism for my ADHD, they’ve also made me a better driver.

      For the latter, if I listen to my music I definitely feel a bit more aggressive, whereas if it’s an audiobook (and I’ve given myself sufficient room), I’m much more forgiving.

      For the former, I can mix them with menial tasks and it makes them so much more doable.

  • bonn2@lemm.ee
    link
    fedilink
    arrow-up
    4
    ·
    1 year ago

    There are also a few AI sung songs out there that are pretty good. Most of them sound pretty Autotuny, but to some extent, that can be a style. Aura, by Ghost, is a good example. If I didn’t know it was ai, I would just think it was autotune.

  • BlazingFlames6073@lemdro.id
    link
    fedilink
    English
    arrow-up
    4
    arrow-down
    1
    ·
    1 year ago

    This is amazing. I’m the future, I’'d like to try this on old books I’ve read in the past just to check

  • 𝕸𝖔𝖘𝖘@infosec.pub
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    It sounds like a generative model to me, but it’s probably the best one I’ve ever heard. Also, thanks for the link! I added it to my listen list!

    • rustyriffs@lemmy.world
      link
      fedilink
      arrow-up
      2
      arrow-down
      3
      ·
      1 year ago

      Ok? So what if you did consume them. Would you have any thoughts then, so that you can actually contribute a meaningful comment to this topic?

      • lightnsfw@reddthat.com
        link
        fedilink
        arrow-up
        3
        ·
        1 year ago

        The topic was “Why isn’t everyone talking about AI generated audiobooks?”. Which I answered. Maybe if you spent more time reading yourself you would have comprehended that.