diana_coman: BingoBoingo: so listen, do yourself a favour for starters, trawl those pms or whatever and write up a summary with what you 2 tried and what worked and what didn't, in what way, etc; write up somewhere in clear also what your current script does and what/where you're stuck + why; I honestly couldn't quite follow at that level of detail from the chans only.
Archive for the ‘Computing’ Category
A Homework Assignment From Diana_Coman: Trawling Ancient PMs Seeking What Worked For Early Qntra And Where I'm At On Scripting A Conversion EngineThursday, February 20th, 2020
Discussion in #trilema-hanbot this week lead to a rough outline for a two or three piece toolset for leveraging automation to assist in Qntra outreach efforts. In order of priority the toolset needs to consist of at least items one and two:
- A Blog Crawler: This tool needs to take as input a starting url. Using curl the crawler will grab the page, grab outbound urls1 from the starting point, and begin crawling in search of blogs with live comment boxes. In most cases2 This will mean going from the directly linked page to the top linked post on the blog and seeing if a comment box is there. Whether the crawler should crawl to complete the sweep out to a certain depth or crawl to accomplish a certain number of iteration per run is unclear to me at this time though I suspect the latter is the more manageable approach. At the end of each run, the crawler should produce a list of urls to blog posts with comment boxes and report the number of targets it found out of the number of total urls crawled.
- A Comment checker: Far simpler, this takes a list of urls of the sort produced by the above crawler and returns which of those urls contain one of several strings indicating a Qntra outreach comment successfully reached publication along with the number of successes out of the total urls checked per run.
- (Maybe?) A Comment Shitter: Unlike the two tools above, sample code for accomplishing this is common as are commercial advertisements by folks advertising they operate this sort of script. Depending on the post, a clearly human comment can be written in 3 to 8 minutes. By contrast finding a live blog that can take a comment using my own eyes runs anywhere from 2 to 20+ minutes biased towards the high end. On Google's blogspot platform, when the blogger decides to allow name/url identification for commenters there is a 100% success rate in comment publication after doing a 10-ish second task to help Google train their cars' eyes. The situation on Automattic's Wordpress fork is more complicated, but Automattic's bias towards preventing actual commucation between persons leaves a lot of open questions to be explored.
Thusly I have a problem. As I work on learning way to make the computer do more for me and plugging a skills deficit, the comment checker strikes me as the sort of limited scope problem that makes a fine sample problem for automating. The crawler however involves a greater deal of complexity, and is likely to bring in a larger number of tools. As wrestling with what the crawler should do and what needs to be done to implement it is taking substantial space in my head, I am inclined to lean on Auctionbot to see if anyone's up for being hired to produce an initial version of the crawler which I can then deploy, read, study, and learn from.
Proposed Crawler Specifications
I am seeking a web crawling script that does the following:
- Takes as input a url and optionally a number of iterations, i. 1000 is probably a safe initial default value for i3
- Grabs the url with curl collecting all outgoing links, writes each url as a line in a text file named churn
- Grabs the top url in churn with curl, follows the first link in an html div named "main" and checks to see if that page has a Wordpress or Blogspot comment box allowing comments with a name/url identity. If yes, that page's url is written as a line in a text file named targets. If there is no comment box, the page's url is added as a line to a file named pulp. Outgoing urls on the page are appended to the bottom of churn
- Adds the url from churn and the second url as lines into a file named churned and removes them from churn
- Checks new top item in churn against churned. If the url isn't in churned, it processes it as in 3, otherwise it adds the url to churned and removes it once again from churn.4
- After performing items 3, 4, and 5 for i iterations, writes a line at the bottom of targets stating "X potential targets identified this run." and at the bottom of pulp writes a line "Y monkeys scribbling on electric paper". X is to be the number of new lines added to targets during a run while Y is to be the number of new lines added to pulp after a run.
Notes On The Spec
The decision to run for a set number of iterations rather than walking a set depth from the first url was made after trying to chart out the additional complexity involved in charting this varied thing we call the internet out to specified degrees of depth. Going out to the first degree might be fine. The second may even be manageable. I've not been seeing many blogs that keep their blogrolls trim and limited at active blogs. I suspect that by the time a complete sweep of the third degree or fourth degree of seperation is completed, run times would be geological without re-inventing some sort of early Googlebot.5
If this project receives multiple bids, I may hire multiple bidders to each produce their own implementation.
- Initially I thought sidebar links specifically would be the thing to crawl, imitating the more productive method I found for manually crawling. After viewing the page source on a number of blogs, the is substantial variety in how sidebars are maked as such in the code. Identifying a blog roll is easy with human eyes, but much harder for computers. [↩]
- Though the logic of doing this in all cases is likely to be far simpler. [↩]
- Of course practice will inform. [↩]
- The reason to put duplicate lines into churned is that in looking at churned after a run, some popular things might be identified. [↩]
- This isn't to say a search engine using the original PageRank algorithm and similar, but seeded from a core of Republican blogs wouldn't be useful. It simply isn't useful for the task of churning through the muck of strangers looking for potential people. [↩]
Week 6 2020 Review - With Some Reflections On The Subject Of Feedback And Encountering Bots Blogging For Bots NestSunday, February 9th, 2020
This week Qntra published 13 pieces of which 10 came from myself, one came from nicoleci, two came from shinohai. Thimbronion was struck by the flu after presenting some article ideas, but he did leave a comment with a point he had considered fleshing out into an article before being struck by the flu. (more...)
Last we visited the aluminum hard disk platters they had been soaked, boiled, soaked some more, and boiled some more in Coca Cola Black, an excellent etching agent available at retailers near you. It was time consuming and visible changes to the platters beyond fouling during the initial tomato sauce boil and sanding were unimpressive. Here's a video of something faster: (more...)
In accordance with Republican doctrine that the only possible protected intellectual property is the secret, or the trade secret... the small pile of used disks left by Pizarro required destruction. Not being the Republic of "hurr durr secure erase," destruction means physical! Lacking space to do a good barrel burn as well as the lack of a suitable barrel at the ready, alternatives needed to be brought into use to provide the irreplaceable certainty physical destruction offers. (more...)
With the rack where UY1 lived going dark the call was made for to stand up an interim server "anywhere online" and so the interim host 'anyserver' was stood up. The server was sourced from a Kansas City based operation calling itself "Wholesale Internet" where I in the past have stoop up a box to run a trb node. Before getting to the recipe I want to collect a few general points about the host 'anyserver': (more...)
While I have spent the last four year contributing to and then later operating the Republic's wire service and writing up 1697 pieces doing so, writing copy for the purpose of marketing is new to me. This means having a hook, which I have little experience crafting. An early attempt at marketing Pizarro sent to the "Proud Boys" contact email was met with silence. The effort looked like this:
Subject: Considerations regarding censhorship and your fraternal organization's outlaw status
If you are interested in the future of online censorship resistance and
opposition to the totalitarian femstate, you are welcome to check out
PizarroISP. You may find us on the Freenode IRC network in the channel
#Pizarro and browse our website http://pizarroisp.net/ to see what we
offer. Our operating jurisdiction is The Most Serene Republic of
Bitcoin whose proceeding you may read here: http://btcbase.org/log/
while Pizarro's physical presence is in Montevideo, Uruguay.
In light of the outlaw status being forced upon your organization, it
is advisable to begin to operate accordingly.
There was no reply to this dispatch, but it is hard to tell of the inadequacy was on the part of the recipient or the absence of a great hook. I am inclined to believe both are at play. This sample begins with a polite invitation follows with links the politely invited may follow down the rabbit hole which in the best case may lead to a spyked type epiphany:
It occurs to me, after circa three years of following the unfolding history of The Most Serene Republic, and about a year and a half of just beginning to participate, that this is by far the most interesting, intellectually challenging (and demanding), fun, promising, etc. project one could think of, or in any case, the only (as far as I know) serious thing happening while the world's busy derping about.
Then there's the good cases where Pizarro gets a customer and in time the customer later comes to find that no, there really isn't anyone outside the Republic doing anything seriously. To this end I submit the following attempts at adding a hook for review, comment, and hope an informed in WoT recommendation for a copyrighting text to read comes forward.
Do you want to take your internet back from the Mommy State? In the face of online censorship resistance and opposition from the totalitarian femstate, you are welcome to check out webhosting and servers from PizarroISP. You may find us on the Freenode IRC network in the channel #Pizarro and browse our website http://pizarroisp.net/ to see what we offer. Our operating jurisdiction is The Most Serene Republic of Bitcoin whose proceedings you may read here: http://btcbase.org/log/ while Pizarro's physical presence is in Montevideo, Uruguay.
[optional]In light of the outlaw status being forced upon us all, it is advisable to begin to operate accordingly.
This may be a hook, but I am underwhelmed though it is stronger than the initial effort.
Do "they" keep banning all your favorite places online? Tired of Silicon Valley dicking you around? Need a place to plant your flag online? You are welcome to check out webhosting and servers from PizarroISP. You may find us on the Freenode IRC network in the channel #Pizarro and browse our website http://pizarroisp.net/ to see what we offer. Our operating jurisdiction is The Most Serene Republic of Bitcoin whose proceedings you may read here: http://btcbase.org/log/ while Pizarro's physical presence is in tranquil Montevideo, Uruguay.
Or for target audiences that are, in the contemporary pop lingo "Woke on the JQ" a variation could be tried with triple echo Jew quotes.
There's also a discard pile. An there's the knowledge I can spin my wheels unproductively without a deadline. Thoughts, comments, other angles to try hooking?
To continue the exercise started earlier:
The way laundry works here is you have a laundry machine in your residence and air dry your clothes. Not having a half dozen children to cloth and an aversion to fitting too many things in the habitation module that I cannot easily carry or roll to a new location, I started out availing myself of the local lavaderos. Back in old country there were businesses called laundry mats where you fed a coin operated machine anywhere from 5 to 16 quarters and it carried out its task. In Uruguay where consumer and capital goods are precious and no one has cared enough to develop a way to automate coin handling we have laverderos instead. Your lavadero rents a space that could have otherwise been a kiosko and stuffs it full of washing and drying machines. (more...)
Following last week's consumption of some serious food for the soul, much of my past week has been immersed reading Trilema, following links, and reading all manner of Pantsuitist produced histories. Some Trilema pieces I might have read recently before the re-reading, but others have been a while. Pantsuitist histories were read to get the cancer's perspective on the its own metastasis. For my own reference, several points about how the world works are collected below. (more...)