What happens when one of those seasonal viruses comes that usually wouldn't even get reported as something unique, but the US has visibly fallen so far everyone is now racing to train their herds to Asian norms concerning seasonal respiratory viruses? What does it look like when your city is already has the right aesthetic for a Fallout film adaptation? (more...)
Archive for the ‘Exercises’ Category
I present Montevideo's premier by default online English language newspaper, The Montevideo Standard. With the closure leaving Bitcoin news a rather indefensible line1 and the question of what remains on the other side of the herd's present stampede uncertain, it is time to try something different. (more...)
- Or as Qntra coverage since roughly the time of Trump's election would show, it has been hard to do primarily Bitcoin news since the Republican edifice started showing wear. With the Republic Impossible, Qntra as such doesn't seem to have space for being of much other than historical interest. I could try to add to the conversation around the end of the Republic, but what can be said other than "In hindsight, it is clear that despite being the right thing it may me stupid. It may have even allowed me to quietly load up on still more stupid than I could otherwise carry." I was tired before the Closure. After I was and still somewhat am sad. What seems to have left is a lot of, at times rather paralyzing, anxiety. For being the right thing, the Republic opened a particularly dangerous "faith thinking" hole in my head. Just as the herds have frozen in panic, I feel a greater freedom from anxiety than I have in a very long time. [↩]
Substantial credit goes to Diana Coman for helping me to organize and see that I have been carrying around substantially more accreted stupidity and accompanying blindspots than I thought. And in that the need for planning exploded beyond simply "Make Marketing Big" into what is essentially a complete operational reboot of Qntra. On some of these points action has begun as the plan iterated through several drafts, but with the drafts having stabilized it is time to shift from planning and leap into action. I've still got dumb to unload and head breaches to fill, but when it comes to Qntra... This is the program. (more...)
A Homework Assignment From Diana_Coman: Trawling Ancient PMs Seeking What Worked For Early Qntra And Where I'm At On Scripting A Conversion EngineThursday, February 20th, 2020
diana_coman: BingoBoingo: so listen, do yourself a favour for starters, trawl those pms or whatever and write up a summary with what you 2 tried and what worked and what didn't, in what way, etc; write up somewhere in clear also what your current script does and what/where you're stuck + why; I honestly couldn't quite follow at that level of detail from the chans only.
Discussion in #trilema-hanbot this week lead to a rough outline for a two or three piece toolset for leveraging automation to assist in Qntra outreach efforts. In order of priority the toolset needs to consist of at least items one and two:
- A Blog Crawler: This tool needs to take as input a starting url. Using curl the crawler will grab the page, grab outbound urls1 from the starting point, and begin crawling in search of blogs with live comment boxes. In most cases2 This will mean going from the directly linked page to the top linked post on the blog and seeing if a comment box is there. Whether the crawler should crawl to complete the sweep out to a certain depth or crawl to accomplish a certain number of iteration per run is unclear to me at this time though I suspect the latter is the more manageable approach. At the end of each run, the crawler should produce a list of urls to blog posts with comment boxes and report the number of targets it found out of the number of total urls crawled.
- A Comment checker: Far simpler, this takes a list of urls of the sort produced by the above crawler and returns which of those urls contain one of several strings indicating a Qntra outreach comment successfully reached publication along with the number of successes out of the total urls checked per run.
- (Maybe?) A Comment Shitter: Unlike the two tools above, sample code for accomplishing this is common as are commercial advertisements by folks advertising they operate this sort of script. Depending on the post, a clearly human comment can be written in 3 to 8 minutes. By contrast finding a live blog that can take a comment using my own eyes runs anywhere from 2 to 20+ minutes biased towards the high end. On Google's blogspot platform, when the blogger decides to allow name/url identification for commenters there is a 100% success rate in comment publication after doing a 10-ish second task to help Google train their cars' eyes. The situation on Automattic's Wordpress fork is more complicated, but Automattic's bias towards preventing actual commucation between persons leaves a lot of open questions to be explored.
Thusly I have a problem. As I work on learning way to make the computer do more for me and plugging a skills deficit, the comment checker strikes me as the sort of limited scope problem that makes a fine sample problem for automating. The crawler however involves a greater deal of complexity, and is likely to bring in a larger number of tools. As wrestling with what the crawler should do and what needs to be done to implement it is taking substantial space in my head, I am inclined to lean on Auctionbot to see if anyone's up for being hired to produce an initial version of the crawler which I can then deploy, read, study, and learn from.
Proposed Crawler Specifications
I am seeking a web crawling script that does the following:
- Takes as input a url and optionally a number of iterations, i. 1000 is probably a safe initial default value for i3
- Grabs the url with curl collecting all outgoing links, writes each url as a line in a text file named churn
- Grabs the top url in churn with curl, follows the first link in an html div named "main" and checks to see if that page has a Wordpress or Blogspot comment box allowing comments with a name/url identity. If yes, that page's url is written as a line in a text file named targets. If there is no comment box, the page's url is added as a line to a file named pulp. Outgoing urls on the page are appended to the bottom of churn
- Adds the url from churn and the second url as lines into a file named churned and removes them from churn
- Checks new top item in churn against churned. If the url isn't in churned, it processes it as in 3, otherwise it adds the url to churned and removes it once again from churn.4
- After performing items 3, 4, and 5 for i iterations, writes a line at the bottom of targets stating "X potential targets identified this run." and at the bottom of pulp writes a line "Y monkeys scribbling on electric paper". X is to be the number of new lines added to targets during a run while Y is to be the number of new lines added to pulp after a run.
Notes On The Spec
The decision to run for a set number of iterations rather than walking a set depth from the first url was made after trying to chart out the additional complexity involved in charting this varied thing we call the internet out to specified degrees of depth. Going out to the first degree might be fine. The second may even be manageable. I've not been seeing many blogs that keep their blogrolls trim and limited at active blogs. I suspect that by the time a complete sweep of the third degree or fourth degree of seperation is completed, run times would be geological without re-inventing some sort of early Googlebot.5
If this project receives multiple bids, I may hire multiple bidders to each produce their own implementation.
- Initially I thought sidebar links specifically would be the thing to crawl, imitating the more productive method I found for manually crawling. After viewing the page source on a number of blogs, the is substantial variety in how sidebars are maked as such in the code. Identifying a blog roll is easy with human eyes, but much harder for computers. [↩]
- Though the logic of doing this in all cases is likely to be far simpler. [↩]
- Of course practice will inform. [↩]
- The reason to put duplicate lines into churned is that in looking at churned after a run, some popular things might be identified. [↩]
- This isn't to say a search engine using the original PageRank algorithm and similar, but seeded from a core of Republican blogs wouldn't be useful. It simply isn't useful for the task of churning through the muck of strangers looking for potential people. [↩]
Week 6 2020 Review - With Some Reflections On The Subject Of Feedback And Encountering Bots Blogging For Bots NestSunday, February 9th, 2020
This week Qntra published 13 pieces of which 10 came from myself, one came from nicoleci, two came from shinohai. Thimbronion was struck by the flu after presenting some article ideas, but he did leave a comment with a point he had considered fleshing out into an article before being struck by the flu. (more...)
Back in 2011 when I was at the University of Missouri, the campus was hit by several profound snowstorms. These photos are from one of intermediate magnitude that still managed shut down the campus for about a week as best as I can recall. As I can recall is what we are left with because the professional and intentional student journalists of the era seem to have their accounts of the event lacked behind paywalls or buried in unstable online archives. (more...)
Since last week, the number of contributors other than myself published on Qntra in recent memory increased from zero to three. I greatly appreciate the work done by those rallying to Make Qntra Great Again. On my part, I'm not counting this as a whole start for me as only one new byline was added to the database, but it is a start to a start for me. I've still got to push harder on the outreach until it's become something easy that I want to do, and then I've got to keep pushing it. (more...)
Qntra's top menu and side bar now feature links to a page descriptively titled Write For Qntra! This page contains complete, up to date information on how to contribute to Qntra. This means Mircea Popescu's 2015 style guide is linked and blockquoted as it is still the best style guide produced for any web publication to my knowledge. (more...)