Демонстрация выделения значимой части www страницы

Мы постоянно видим страницы по типу такой (розовым я отметил навигационную и рекламную части):

типичная новостная страница

Вместо этого при помощи моих алгоритмов можно автоматически получить такое:

The unemployed need to be "working and training, not claiming", the shadow work and pensions secretary, Liam Byrne, said, as he outlined Labour 's plans to protect itself from the politically damaging charge that it is soft on welfare claimants.
Labour is proposing that every adult aged over 25 and out of work for more than two years should be obliged to take up a government-provided job for six months, or lose benefits . The "compulsory work or lose benefits" announcement by the shadow chancellor, Ed Balls , and Byrne, comes ahead of what threatens to be a fraught second reading debate on Tuesday over Labour's refusal to back a government bill restricting increases in benefits and tax credits to 1% a year for the next three years в which is likely to represent a 4% cut in real terms.
Byrne said Labour's new policy would come as a "culture shock" to some but would act as a "lifeline" to others. In a sign that Labour is looking for its own mantra to challenge the Conservatives' "strivers not skivers" , repeating the phrase for emphasis, he told BBC Radio 4's Today programme: "If you haven't got a job, you need to be working and training, not claiming." He added: "It's a tough approach, but we think it's a fair approach. If people don't take the opportunity then we are saying you can't live a lifetime on welfare. Payments will stop."
Byrne said the Tories' welfare plans were in "disarray" and Labour's plans were about the best way to bring the welfare bill down. He defended Labour's decision to support an increase of benefits in line with inflation, saying: "We don't think that it's right to be attacking working families' tax credits to pay for the government's failure to get people back to work."
Iain Duncan Smith , the work and pensions secretary, has been campaigning this week on Labour's failure to take tough decisions on welfare to tackle the deficit. The coalition cites Institute for Fiscal Studies figures showing in-work earnings growing less quickly than prices since 2007, while out-of-work benefits have been rising faster.

Демонстрация выделения из html содержимого и вырезания навигации и рекламы. Нечто подобное сделано в Readability. Алгоритм извлечения значимой информации из html работает на основе анализа DOM дерева документа (конечный автомат). На данном этапе получение содержательной части плохо получается для страниц с коротким содержанием (<20 предложений), в ряде случаев может вырезаться заголовок и подпись. Программа тестировалась на небольшом числе страниц и поэтому зачастую вырезает неправильно.

Сообщить о проблеме с обработкой почтой.
Что еще есть у Владимира Чаплинского... (Back)
Дата последней модификации страницы: 4-янв-13

Автор страницыВладимир Чаплинский
Альтернативный адрес vladimir_c@mail.ru
Сайт создан в системе uCoz