From The Register:
A team within Google Brain – the web giant’s crack machine-learning research lab – has taught software to generate Wikipedia-style articles by summarizing information on web pages… to varying degrees of success.
As we all know, the internet is a never ending pile of articles, social media posts, memes, joy, hate, and blogs. It’s impossible to read and keep up with everything. Using AI to tell pictures of dogs and cats apart is cute and all, but if such computers could condense information down into useful snippets, that would be really be handy. It’s not easy, though.
A paper, out last month and just accepted for this year’s International Conference on Learning Representations (ICLR) in April, describes just how difficult text summarization really is.
A few companies have had a crack at it. Salesforce trained a recurrent neural network with reinforcement learning to take information and retell it in a nutshell, and the results weren’t bad.
However, the computer-generated sentences are simple and short; they lacked the creative flair and rhythm of text written by humans. Google Brain’s latest effort is slightly better: the sentences are longer and seem more natural.
. . . .
The model works by taking the top ten web pages of a given subject – excluding the Wikipedia entry – or scraping information from the links in the references section of a Wikipedia article. Most of the selected pages are used for training, and a few are kept back to develop and test the system.
The paragraphs from each page are ranked and the text from all the pages are added to create one long document. The text is encoded and shortened, by splitting it into 32,000 individual words and used as input.
This is then fed into an abstractive model, where the long sentences in the input are cut shorter. It’s a clever trick used to both create and summarize text. The generated sentences are taken from the earlier extraction phase and aren’t built from scratch, which explains why the structure is pretty repetitive and stiff.
Link to the rest at The Register
PG thinks an easier job would be to create an algorithm that would produce interview quotes from European publishing executives.
He suggests seeding the algorithm with words like stupid, price, protect, booksellers, kill, enriched, Amazon, obscene, amuck, insane, predatory, greedy, la fréquence, répugnant, sale américain, dégradé, goulu, aliéné and vorace.
PG spent a few minutes re-familiarizing himself with websites that generate random words, sentences, etc., to see if he could locate one to inspire him with potential quotes from European publishing executives.
He did not find exactly the right tool for that task, but he did discover InspiroBot, a lovely site to help you create beauteous and profound social media posts. (Yes, it can be addictive.)